WHY A SINGLE COMPOSITE SCORE.
Audit deliverables that present 40 separate numbers are harder to act on than a deliverable that presents one. The single composite forces a verdict: is this site broken (0-3), patchy (4-5), solid (6-7), or dominant (8-10)? Once you have the verdict, you can dig into the dimensions to understand why.
The 0-10 scale is deliberate. Scoring on 0-7 (matching the dimension count) would imply that dimensions are equal weight and binary pass/fail. They are not. Some dimensions can drop a site to zero visibility on their own (bot accessibility); others contribute incrementally (linking). The 0-10 composite preserves that asymmetry while staying intuitive.
SCHEMA.
What it measures: presence and correctness of JSON-LD structured data on every public page. Specifically: Organization and Person at the site root with stable @id values, BreadcrumbList on every non-root page, Article on every blog or research post with headline / author / datePublished / dateModified, and a single @graph array per page (not multiple disjoint blocks).
Why it matters: schema is roughly 9% of AI fetcher traffic across the research network, but its share of factual citations is far higher. When an AI model needs an entity name, an author, a price, or a publication date, it pulls from schema first. Bad schema can actively contradict your visible content and corrupt citations.
Common failure modes: JS-injected schema invisible to non-JS crawlers, missing @id resolution causing entity duplication, schema-content contradictions (price mismatch is the classic), the "schema everything" trap (15+ low-engagement types).
@graph, correct types, server-rendered.RSS.
What it measures: presence, validity, and richness of feed endpoints. Specifically: a valid RSS 2.0 or Atom feed at a discoverable URL, <link rel="alternate"> declared in HTML <head>, full content in <content:encoded> (not just titles or excerpts), accurate RFC-822 timestamps, and unrestricted access in robots.txt.
Why it matters: RSS is the single most-consumed endpoint by AI fetchers - roughly 40% of all logged requests across the 47-site network. Bots probe feeds first to understand the shape of your site, then decide what HTML to pull. A site with no feed gets crawled less. A site with a stripped feed (titles only) gets crawled even less.
Common failure modes: CMS defaults that strip feeds to titles + excerpt, malformed XML from plugin conflicts, dates in the feed that do not match dates in the page, feed paths blocked by overzealous robots.txt.
<head>.HTML.
What it measures: the quality of the rendered HTML that AI fetchers actually receive. Specifically: semantic HTML (proper <article>, <section>, <header>, headings hierarchy), unique meta descriptions per page, OpenGraph completeness, content present in the initial HTML response (not lazy-rendered by JS), reasonable content density (not infinite scroll skeleton).
Why it matters: AI fetchers consume roughly 25% of their traffic as raw HTML. Most of them either do not execute JavaScript or treat post-render content as second-class. If your homepage is a Single Page App skeleton with content injected after hydration, AI bots see an empty page.
Common failure modes: SPA frameworks without server-side rendering, missing or duplicate meta descriptions across pages, content rendered into <div> soup instead of semantic tags, OpenGraph that contradicts the page title.
BOT ACCESSIBILITY.
What it measures: can AI bots actually reach your content. Specifically: robots.txt with explicit Allow: rules for major AI user-agents (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, ChatGPT-User, Google-Extended, etc.), no Cloudflare bot fight mode silently dropping AI traffic, no WAF rules blocking by user-agent string, no rate-limiting that returns 429s on normal AI crawl patterns.
Why it matters: roughly 30% of sites in the broader research sample were silently blocking at least one major AI bot without the owner realising. Bot accessibility is the dimension where a single misconfigured rule can drop you from a 7 to a 0. If the bots cannot reach you, every other dimension is theoretical.
Common failure modes: robots.txt copy-pasted from a 2020 template that predates AI bots, Cloudflare's "Block AI crawlers on all pages" toggle accidentally enabled, WAF rules with user-agent contains "bot" blanket blocks, aggressive rate-limiters configured for human traffic that catastrophically fail on bot crawl bursts.
FRESHNESS.
What it measures: publishing cadence and date accuracy. Specifically: how recently the site has shipped new content, accuracy of lastmod dates in sitemap.xml, accuracy of dateModified in Article schema, distribution of content age across the site, presence of an active blog or news section.
Why it matters: sites publishing weekly get AI bot traffic at roughly 3x the rate of sites publishing less often. Sites that go silent for 60+ days see their AI bot traffic drop to near-baseline within roughly two weeks. The freshness signal compounds over time and decays quickly. It is the single biggest behavioural lever for ongoing AI visibility.
Common failure modes: sitemap.xml with lastmod dates that lie (set to today on every regenerate), schema dates that do not match visible dates, blog sections that look active but have not shipped in 6 months, "evergreen" content with no dateModified updates ever.
dateModified updates.LINKING.
What it measures: the structural health of internal links and discovery. Specifically: sitemap.xml completeness (every public page included), absence of orphaned pages (pages not linked from anywhere), internal link density per page, anchor text quality (descriptive, not "click here"), absence of broken internal links, breadcrumbs that match URL structure.
Why it matters: sitemap.xml is roughly 14% of AI fetcher traffic. Bots use it to triangulate site structure, freshness, and topical clustering. Internal links signal authority distribution and topic coherence. A site with a complete sitemap and clean internal linking is significantly easier for AI models to navigate than a site that relies on hub pages alone.
Common failure modes: sitemap.xml missing 30%+ of pages because of stale generation, internal links pointing to redirected or dead URLs, anchor text that is identical across hundreds of links ("read more"), orphaned pages that exist in the CMS but no page links to them.
CITATION PRESENCE.
What it measures: the only output-side dimension. The other 6 dimensions measure the conditions under which AI models can ingest your site. This dimension measures whether they actually do. Specifically: appearances in ChatGPT, Claude, Perplexity, and Google AI for 30+ target queries selected with the client, captured with screenshots, scored on whether your site is cited, in what context, and how the citation is rendered.
Why it matters: a site can score 8 on every input dimension and still earn no citations if the queries it would naturally answer are not the queries its audience is asking. Citation presence forces the audit to confront the gap between "technically optimised" and "actually cited." It is the dimension that ties methodology back to outcome.
Common failure modes: sites citing well in ChatGPT but never in Perplexity (different bot behaviour), sites cited only in adjacent topics that do not convert, sites cited but with stripped attribution (the "ghost citation" pattern), sites that earn citations but with outdated facts because freshness scored low.
HOW DIMENSIONS ROLL UP.
Each dimension is scored 0-10 and weighted into the composite. Approximate weights:
These weights are not pulled out of thin air. They are calibrated against the consumption hierarchy observed in the 47-site research network and the failure modes that correlate most strongly with low citation share.
WHAT THE SCORE ACTUALLY MEANS.
ONE NUMBER. NO HAND-WAVING.
WHAT THIS SCORE IS NOT.
A few honest caveats so the score is not over-interpreted:
- It is not a guarantee of future citations. AI models are opaque. The score measures the conditions under which citations become likely, not the citations themselves.
- It is not a Google ranking proxy. Traditional search and AI search behave differently. A site can rank well on Google and score 4 on the AI Visibility Score (and vice versa).
- It is not stable across audit cycles. AI bot behaviour shifts every few months. A site that scored 7 in Q1 may need a re-audit in Q4 because the underlying landscape moved.
- It is not the same as a Google Lighthouse score. Lighthouse measures performance and basic SEO. The AI Visibility Score measures whether AI models can ingest, parse, and cite your content.
THE BOTTOM LINE.
Seven dimensions, scored on a 0-10 composite, weighted by what the 47-site research network has actually measured to matter. The score is meant to be auditable: every number can be traced back to a specific finding, a specific log entry, or a specific citation screenshot. If a number in your audit does not line up with what you can verify on your own site, that is a defect, not a feature - and the engagement is refunded.
If you want to see the score applied to your site, the audit is the productized way to get it. The methodology behind the score is what you have just read. The dimensions are not proprietary - the discipline of measuring all seven consistently, against your real logs and your real citation data, is what makes the audit worth the price.