GEO 11 min read

Testing AI citability across four engines — why one isn't enough

A page cited by ChatGPT can be invisible on Perplexity. Every AI answer engine uses a different retrieval stack, and one-engine visibility numbers give you a misleading read on your real AI search presence. Here's what changed, why it matters, and how MarketOS now tests all four in a single audit.

The quiet assumption behind most "AI SEO" tooling in 2025 was that AI search is basically Google with extra steps. If you rank on Google, you show up in AI Overviews, which means ChatGPT probably cites you, and so do Perplexity and Claude. One signal, four surfaces.

That assumption was never quite right, and in 2026 it's visibly wrong. Each of those four engines is driven by a different retrieval pipeline. The content they weight, the domains they trust, and even the structure they prefer diverge more every quarter. If you're optimizing for one and assuming the rest follow, you're flying with a single instrument.

TL;DR. ChatGPT uses Bing. Claude uses Brave Search. Perplexity runs its own index. Google AI Overviews uses Google Search. Your page can be cited by one and ignored by three. MarketOS's Deep Audit now tests all four live in a single run and shows you the per-engine breakdown — so you stop optimizing blind.

Four engines, four retrieval stacks

It's easy to think of ChatGPT / Claude / Perplexity / Google AI as variations on a theme. They aren't. Each one has a distinct retrieval component that decides which pages make it into the context window the LLM actually reads. That retrieval component is the thing that decides whether your site is "visible" or not.

Retrieval · Google Search

Google AI Overviews

The one most brands over-optimize for. It uses Google's own index, so traditional SEO signals — authority, on-page structure, schema markup, freshness — translate pretty directly.

MarketOS tests it via Gemini with Google Search grounding, which uses the same retrieval pipeline under the hood.

Retrieval · Bing

ChatGPT

ChatGPT's web_search tool uses Bing Web Search under the covers. Bing's ranking signals overlap with Google's but not perfectly — Bing tends to weight domain age, server-side rendering, and cleaner markup more heavily.

Translation: rank first on Google and still have ChatGPT never mention you. Common.

Retrieval · Perplexity Sonar

Perplexity

Perplexity operates its own crawler and index, purpose-built for answer engines. It rewards short, declarative, bullet-formatted answers with explicit source attribution.

Pages that just rank well without a structured answer up top tend to get ignored. A page with a 40-word lead paragraph and a <ul> of key points often wins over a 2000-word essay on the same topic.

Retrieval · Brave Search

Claude

Anthropic's web_search tool uses Brave Search. Brave's index has a smaller footprint than Google or Bing, which can mean highly-linked content shows up disproportionately — but also means pages on newer domains sometimes punch above their weight.

Claude's new Dynamic Filtering (tool version web_search_20260209) further prunes results with a code-execution step before they hit the context window, so structural clarity matters even more.

The upshot: each retrieval stack has a different "personality," and getting cited by one isn't a reliable proxy for getting cited by the others.

Why single-engine testing is misleading

Imagine you run a B2B SaaS and track your "AI visibility" by asking ChatGPT eight buyer-intent queries about your category. You're cited on three of them. 37% visibility. Not bad.

Now run the same eight queries on Perplexity: 75%. On Claude: 12%. On Google AI Overviews: 50%.

That's a real pattern we see constantly. The brand has strong semantic content (Perplexity loves that), decent Google-era SEO (AI Overviews rewards it), OK presence in Bing's index (ChatGPT hits and misses), and basically zero mindshare in Brave's smaller index (Claude ignores it).

If that brand had only tested ChatGPT, they would have walked away thinking they needed more work — when actually their biggest blind spot is Claude, and their biggest strength is Perplexity. The fix-list ranks completely differently across those two reads.

One-engine coverage is one-engine blind. Your real AI search presence is the union of four visibilities, not the read on any single one.

How MarketOS runs the tests

When you run a Deep Audit, here's what actually happens on the citability tab:

  1. Query generation. Given your brand URL, target keywords, and optional brand summary, Gemini writes 8 realistic buyer-intent queries. Mix of informational ("what is X"), commercial ("best X", "X vs Y"), navigational, and transactional. 1-2 include your brand name as a baseline, the rest are unprompted.
  2. Parallel dispatch. For each query, we fan out to every engine you've configured — Gemini, OpenAI, Perplexity, Anthropic — in parallel. Each provider gets the same query with its native web-search tool enabled.
  3. Citation parsing. Each engine returns answer text plus a list of sources it consulted. We parse those sources out of each engine's response format (Gemini's groundingChunks, OpenAI's web_search_call.action.sources, Perplexity's top-level citations and search_results, Anthropic's web_search_tool_result blocks).
  4. Normalization. We extract the hostname from each citation URL and check whether your domain (or its subdomains) appears. Same check for the competitor domains you optionally supplied during the wizard.
  5. Per-engine scoring. Each engine gets its own visibility rate (queries where your domain appeared / queries tested), a list of the domains that dominated its answers, and a count of failed queries.
  6. Aggregate view. The top of the citability tab shows your aggregate visibility rate (average across running engines) plus the most-cited domains across all engines combined. Per-engine breakdown sits right below.
  7. Per-query matrix. At the bottom, every query is a row with four status dots — green/red for each engine where it ran, grey where you don't have a key. Click to expand and see each engine's verbatim answer plus the sources it cited.

You can watch all of this happen live during the scan. A per-engine progress block shows each engine's current query count with a mini-bar, updating as promises resolve. Gemini typically finishes first (its grounding pipeline is fast), OpenAI and Claude take longer because their web_search tools involve additional LLM round-trips.

Bring your own keys, pay nothing to us

Every non-Gemini engine is strictly BYOK. That means:

  • Your OpenAI key is used by the MarketOS app to call OpenAI. Your Perplexity key calls Perplexity. Your Anthropic key calls Anthropic. MarketOS servers never see those keys.
  • You pay each provider's normal rate — no MarketOS markup, no proxy fees. The provider bills go straight to your account.
  • You can use keys scoped to this machine or project, and rotate them any time. Nothing in our codebase persists them beyond your own device.

Gemini's pricing structure is different only because we offer a subscription tier that covers 15 audits a month on the MarketOS Gemini key — otherwise you can use your own Gemini key and we don't touch it either.

Before you run an audit, the wizard shows a live cost estimate based on which engines you have keys for, using current April 2026 provider pricing. A typical 4-engine deep audit runs between $0.20 and $0.30 in total provider spend. Skip engines you don't care about and the estimate shrinks accordingly.

What you can actually do with this

Numbers are only useful if they drive decisions. The per-engine breakdown unlocks a handful of specific plays:

  • Find your weakest engine. If you're at 60% on three engines and 10% on one, that one engine is your next 3-6 weeks of content work. The engine-specific fix hints in our issue list tell you which formats that engine rewards ("Perplexity favors bullet-structured answers with source attribution" etc.).
  • Audit competitor tactics engine-by-engine. Run the audit on a competitor's URL. Their per-engine dominance map tells you which engines they've optimized for. Often competitors are strong in exactly the engines where you're weakest — which gives you a clear list of their pages to study for what works.
  • Track whether changes actually move the needle. Make content changes, re-run the audit a week later. You see which engines responded. A change that adds a TL;DR block might move Perplexity from 40% to 70% while barely moving ChatGPT. That's useful signal.
  • Figure out when to ship a page. If your best-cited competitor in a category dominates Claude but not Perplexity, you know where you can plausibly take share with a better page. The aggregate most-cited-domains view makes this legible at a glance.

We keep the providers working for you

Provider APIs rotate. Models get renamed, tools get new version suffixes, pricing pages change. That kind of drift is one of the reasons other tools' "AI visibility" numbers quietly stop working — you never see the error, you just see bad data.

MarketOS pins all four provider model names and tool versions to a single config file and ships a drift canary (npm run canary:providers) that we run before every release. It hits each provider once with a canned query and confirms:

  • The endpoint still responds.
  • The model name is still accepted.
  • The tool version still works.
  • The response shape still matches what our adapters parse.

If any of that drifts, we catch it before you do, and the release ships with updated adapters. Current pinned versions as of April 2026: gemini-flash-latest for Gemini grounding, gpt-5.4-mini for OpenAI, claude-haiku-4-5 + web_search_20260209 for Anthropic, sonar for Perplexity.

Running your first multi-engine audit

If you're already a MarketOS user, the multi-provider citability feature is active out of the box — you just need to add API keys for the engines you want tested. Open Settings → API Keys and look for the "AI Search Engines (Citability Testing)" section. Paste whichever of OpenAI, Perplexity, and Anthropic keys you have; leave the rest blank if you don't want to test those engines.

Then click SEO Intelligence in the sidebar, paste a URL, walk the 4-step wizard (URL, keywords, competitors, goal), and run. The citability tab shows up in the Action Center once the scan finishes, with all the per-engine breakdowns described above.

If you're not a MarketOS user yet — grab the $19.99/mo subscription or the $199 lifetime license at marketos.org. Either tier includes the full SEO Intelligence suite. Lifetime holders get unlimited audits on their own keys forever.

One audit. Four engines. All the blind spots, visible.