Say Hello

47 SITES. 9 INDUSTRIES.
YEAR ONE.

The flagship piece. 47 sites. 9 industries. 365 days of bot logs. The full network design, instrumentation, and seven findings - open methodology, reproducible, with the limitations and caveats stated honestly. This is what powers every audit I ship.

Sites
47
Industries
9
Days Logged
365
Findings
7

WHY PUBLISH THIS.

AI visibility research in 2026 is dominated by vendor-published claims with thin methodology and tool-published statistics with vendor incentives. There is very little independent measurement at scale. This network exists to fill that gap.

The network has run for 12 months. The findings have informed every audit I ship. The methodology is documented here in enough detail for anyone to replicate it on their own infrastructure. The data caveats are stated honestly - this is research, not marketing.

THE NETWORK DESIGN.

47 production websites across 9 industries, deliberately chosen for variation:

INSTRUMENTATION.

Logging is at the CDN layer for full coverage including blocked requests. Cloudflare Logpush on most sites; Fastly on a few. Logs ship to R2 / S3 hourly, get parsed by a stdlib Python pipeline (article N. 17), and land in DuckDB for analysis.

Every AI user-agent gets classified into one of: bot family (OpenAI, Anthropic, Perplexity, Google, ByteDance, Meta, Apple, Microsoft, Common Crawl), verified vs spoofed (via reverse-DNS round-trip on a 1% sample, see article N. 14), and request type (RSS, sitemap, HTML, schema endpoint, robots.txt, llms.txt, image, PDF).

Citation tracking runs separately. 100 queries per site per quarter, run across ChatGPT, Claude, Perplexity, Google AI Overviews. Manual screenshot capture (per article N. 31). Citations extracted to a tracking spreadsheet, joined back to the bot-log data for input-output correlation analysis.

Finding 01.
Recap

FINDING 01: LLMS.TXT IS NEVER REQUESTED.

Across all 47 sites, across 365 days, across every major AI user-agent: zero requests for /llms.txt. Not low - zero. The most replicable finding in the project; verifiable by anyone with server logs in 10 minutes. Detailed in article N. 02.

Finding 02.
Recap

FINDING 02: RSS IS THE TOP ENDPOINT.

Approximately 40% of AI fetcher requests went to RSS / Atom endpoints. Detailed in article N. 16.

Finding 03.
Recap

FINDING 03: THE CONSUMPTION HIERARCHY.

Bot consumption ranked: RSS ~40%, HTML ~25%, sitemap.xml ~14%, schema endpoints ~9%, images ~6%, PDF ~4%, robots.txt + llms.txt ~2%. The discovery layer (feeds, sitemaps, robots.txt) is roughly half of all AI bot traffic.

Finding 04.
Recap

FINDING 04: THE BOTS BEHAVE DIFFERENTLY.

ChatGPT (across its 3 bots), Claude (across its 3 bots), Perplexity, Bytespider, and Google all have distinct crawl personalities. Detail in articles N. 06, N. 07, N. 13.

Finding 05.
Recap

FINDING 05: FRESHNESS COMPOUNDS.

Sites publishing weekly get AI bot traffic at roughly 3x the rate of sites publishing less often. Sites silent for 60+ days drop to baseline within two weeks.

Finding 06.
Recap

FINDING 06: SILENT BLOCKS ARE COMMON.

Approximately 30% of sites in the broader audit sample (beyond the instrumented 47) had at least one major AI bot silently blocked without the owner realising. Cloudflare bot-fight mode, WAF rules, copy-pasted robots.txt are the usual culprits. Detail in article N. 09.

Finding 07.
Recap

FINDING 07: AI VISIBILITY IS MOSTLY TECHNICAL.

Roughly 70% of what determines AI visibility is technical infrastructure (the six input dimensions); roughly 30% is content quality. This inverts the standard SEO content-vs-technical ratio.

47 SITES. NINE INDUSTRIES.
ONE YEAR. PUBLISHED.

LIMITATIONS.

Honesty about what this research does NOT prove:

WHAT'S NEXT (YEAR TWO).

Planned expansions for the next 12 months:

OPEN METHODOLOGY.

The instrumentation stack (article N. 17) is open. The audit methodology (article N. 15) is open. The citation tracking methodology (article N. 31) is open. The 7-Dimension AI Visibility Score (article N. 03) is open.

What is not open: the specific 47 sites in the network (most are client sites; client privacy is non-negotiable). The pooled findings are public; individual site identities are not.

If you want to apply the methodology to your own site or a competitor set, the playbook is here. If you want the work done with the calibration of the broader network applied, that is what the audit is for.

THE BOTTOM LINE.

One year of measurement, seven findings, methodology open. This is the foundation everything else on this site is built on. The research will continue in year two; expect findings to refine, pivot, and occasionally reverse as the AI bot landscape moves. That is what research looks like: a living methodology, not a marketing claim frozen in time.

Stop Guessing What AI Sees

MEASURE THE LEVERS
THAT ACTUALLY EXIST.

If you want this methodology applied to your specific site - your real logs, your real citation data, your real fix list - the audit is the productized way to do it.