Say Hello

SCHEMA MARKUP THAT
AI MODELS ACTUALLY USE.

Schema is the most ridden hobby horse in SEO. After 90 days of measured AI fetcher traffic across 47 sites, the picture is much narrower than the average optimisation checklist. Six types do 80% of the work. The other 700+ are theatre.

Bot Traffic
~9%
High-Signal Types
6
JSON-LD Share
~95%
Sites Audited
47

WHY SCHEMA MATTERS DISPROPORTIONATELY.

Schema is roughly 9% of all AI fetcher traffic across the network - much smaller than RSS (~40%) or HTML (~25%). But the citation share is far higher than that 9% suggests. When an AI model wants to extract a fact about your business, your product, or your author, it goes to the schema first. It is the highest-density-per-byte source of structured truth on most websites.

That is also why getting it wrong is expensive. Bad schema does not just fail to help - it can actively contradict the visible page content, and AI models will sometimes prefer the schema over the prose. If your Organization name in JSON-LD is misspelled, that misspelling can show up in citations.

Finding 01.
Format wars are over

JSON-LD WINS. MICRODATA IS DEAD. RDFA IS IRRELEVANT.

~95%

Of the schema requests AI fetchers made across the network, the format breakdown was unambiguous:

  • JSON-LD in a <script type="application/ld+json"> tag - ~95% of all AI-extracted structured data.
  • Microdata (the itemscope / itemtype / itemprop attribute soup) - ~4%, mostly on legacy WordPress and Shopify themes.
  • RDFa - ~1%, almost entirely on academic and government sites.

If your CMS or theme is still serving microdata, that is fine - you don't need to rip it out. But every new schema you add should be JSON-LD, and any time you touch a template, replace microdata with JSON-LD on the way through. The parsing path is cleaner, the validation tooling is better, and the AI fetchers strongly prefer it.

Finding 02.
Six types do most of the work

RANKED ORDER OF WHAT AI ACTUALLY EXTRACTS.

Organization + Person
~28%
Article + NewsArticle
~22%
Product + Offer
~18%
FAQPage + QAPage
~14%
BreadcrumbList
~10%
Recipe
~5%
Other 700+ types
~3%

Six types account for ~97% of all AI schema engagement in the network. The other 700+ schema.org types share the remaining 3%. This is the part of the data nobody wants to talk about, because it implies that 90% of the schema configuration in plugins like Yoast and RankMath is doing nothing useful.

Finding 03.
The schema types that are theatre

WHAT TO STOP IMPLEMENTING.

Some schema types get pitched constantly in SEO blogs but produce no measurable AI engagement on the sites where they have been deployed. These are the ones I would not spend any time on:

  • Speakable - originally proposed for voice assistants. Effectively unused by current AI surfaces. Adding it does nothing.
  • ClaimReview - only useful if you are a verified fact-checker (Snopes, PolitiFact, etc.). For a regular publisher, it is ignored or actively penalised by Google's verification requirements.
  • HowTo - Google deprecated the rich result. AI models read it inconsistently. The rare engagement does not justify the markup overhead.
  • VideoObject on a non-YouTube embed - mostly inert. If your video is on YouTube, the YouTube schema travels with the embed and your local VideoObject is redundant.
  • JobPosting on a non-aggregator site - useful on Indeed or LinkedIn, irrelevant on a single-company careers page that already has a few openings.
  • CourseInstance, EducationalOccupationalCredential, MonetaryGrant, and the long tail of micro-types - effectively no engagement.

The "schema everything" trap costs hours and produces nothing. If your audit consultant is recommending you add 15 schema types to every page, ask them to demonstrate the engagement. They cannot.

Finding 04.
Silent failures that nuke your schema

FOUR MISTAKES THAT MAKE GOOD SCHEMA INVISIBLE.

Even the high-signal schema types fail when they are deployed badly. The four most common mistakes I see in audits:

  • Multiple disjoint @graph fragments - several JSON-LD blocks on the same page that do not reference each other. AI fetchers struggle to merge them, and may pick one and ignore the rest. Always use a single @graph array with internal @id references.
  • Missing @id resolution - declaring an Organization without a stable @id URL means every page on your site presents a fresh, unconnected entity. Use canonical @id values like https://yoursite.com/#organization so AI models can deduplicate across pages.
  • Schema that contradicts visible content - JSON-LD says the price is $99, the page says $79. AI models notice. Some pick the schema, some pick the visible price, and the inconsistency itself can suppress citation. Schema should mirror the page, not invent it.
  • Dynamic schema injected after JS render - schema added by client-side JavaScript shows up in browser DevTools but not in raw HTML. Some AI fetchers do not execute JS. Server-render your schema or you are gambling on whose crawler shows up.

Most of these are silent failures. Your schema validates, your rich result test passes, and AI engagement is still zero. The validators check syntax, not semantics. The only real test is what the bots actually do.

Finding 05.
The minimum viable schema

FIVE TYPES, DEPLOYED CORRECTLY, COVER 80%.

If you implement nothing else, implement these five types correctly and you will cover roughly 80% of the AI schema engagement that any normal site can earn:

  • Organization at site root, with stable @id, name, URL, logo, and sameAs links to your social profiles.
  • Person for the site owner or each named author, also with stable @id, name, URL, and sameAs.
  • WebSite with a SearchAction potential action - signals you have an internal search and lets some AI surfaces deep-link queries.
  • BreadcrumbList on every non-root page. Cheap to generate, broadly consumed.
  • Article on every blog or research post, with headline, author (referencing your Person @id), datePublished, dateModified, and mainEntityOfPage.

That is it. Five types, deployed correctly, with internal @id resolution, server-rendered, mirroring the visible content. You do not need HowTo, you do not need Speakable, you do not need a separate schema for your favourite social network.

If you are running an e-commerce site, add Product and Offer on product pages. If you are publishing news, add NewsArticle instead of (or alongside) Article. That is the extension. The core stays the same.

FIVE TYPES DO 80% OF THE WORK.
THE OTHER 700+ ARE THEATRE.

WHAT ABOUT SCHEMA PLUGINS?

RankMath, Yoast, Schema Pro, and the Shopify schema apps all do roughly the same thing: they emit schema based on your CMS metadata. They tend to over-emit (lots of low-engagement types) but rarely under-emit. The fix is not to throw them out - it is to audit what they are emitting, suppress the noise, and tighten the high-engagement types.

Two specific things worth checking: (1) does your plugin emit a single @graph with internal @id references, or does it scatter multiple disjoint blocks across the page? (2) Does it server-render the schema, or inject it via JavaScript? If the answers are bad, that is the place to invest.

THE BOTTOM LINE.

Schema is one of the highest-leverage AI visibility interventions, but only when it is targeted. The default of "add every schema type your CMS supports" produces noise. The smart default is five types, deployed correctly, with stable identity references and consistent content.

If you want a useful test: open your site in the Google Rich Results Test, look at the structured data block, and ask yourself: which of these would I bet money that an AI model actually uses? If the answer is "none of them" or "I have no idea," you have a schema problem. The fix is rarely "more schema." The fix is usually "less, but better."

Audit Your Actual Schema Engagement

FIND OUT WHICH
OF YOUR SCHEMA AI ACTUALLY READS.

The audit pulls your real fetcher logs and shows which schema types AI bots are touching - and which are sitting in your HTML doing nothing.