Say Hello

ROBOTS.TXT FOR
AI BOTS - 2026.

Most existing robots.txt articles have outdated bot lists. This is the current 2026 configuration with four ready-to-paste templates depending on your policy: allow all, allow search/citation only, allow named bots only, block all.

Active Bots
18
Deprecated
4
Configs
4
Update Cadence
Quarterly

WHY MOST ROBOTS.TXT FILES ARE WRONG.

Most robots.txt files I see in audits are some combination of: copied from a 2023 template (so missing 2024-2026 bots), copy-pasted with the wrong indentation (so silently invalid), referencing deprecated bot names (so doing nothing), or carrying a blanket User-agent: * + Disallow: / from a staging environment that nobody updated for production.

The current 2026 list of AI-relevant user-agents is straightforward, but it has changed enough since 2024 that any older template is partially out of date. Here is the active list, the deprecated list, and four ready-to-paste configurations.

THE CURRENT BOT LIST.

The user-agents that matter in 2026, organised by purpose:

THE DEPRECATED IDENTIFIERS.

Still showing up in copy-pasted templates everywhere. Remove these from your config; they do nothing:

Finding 01.
Default recommendation

CONFIG 01 - ALLOW ALL.

If you want maximum AI visibility and accept training-data exposure as the cost, this is the simplest correct config. Explicit allows make the intent obvious; the default for unmatched bots is also allow.

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

This is the right default for content-driven sites whose business model benefits from being cited.

Finding 02.
Search-only policy

CONFIG 02 - SEARCH/CITATION ONLY.

Allow the bots that drive citations, block the bots that train models. This is the most common policy for sites whose content is editorial-protected or licensed.

  • Allow: OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, Bingbot, Applebot-Extended
  • Disallow: GPTBot, ClaudeBot, cohere-ai, Bytespider, Meta-ExternalAgent, CCBot
Finding 03.
Maximum control

CONFIG 03 - NAMED BOTS ONLY.

Block everything by default; allow only an explicit named list. Highest control, requires maintenance every time a new AI bot launches.

  • User-agent: * + Disallow: /
  • Then explicit User-agent: <name> + Allow: / blocks for each bot you want to allow.
  • This is the right policy for sites with strict content-licensing constraints (paywalled news, regulated industries with publication restrictions).
Finding 04.
Maximum exclusion

CONFIG 04 - BLOCK ALL AI.

Block every AI bot explicitly. Note: this also blocks AI search citations entirely.

  • Disallow each named AI bot explicitly. Wildcard * patterns are unreliable for AI bots; use named identifiers.
  • Keep User-agent: Bingbot and traditional search engines explicitly allowed if you still want Google/Bing search traffic.
  • Right for sites with explicit no-AI policies for legal or licensing reasons. Rare. Verify the business case before applying.

MAINTENANCE SCHEDULE.

AI bots change quarterly. Set a calendar reminder every 90 days to: re-check the active bot list, remove deprecated identifiers from your config, add any new named bots, re-run the curl detection from article N. 09 to verify the policy is actually being enforced. Five minutes of work; saves you the kind of silent-failure debt that kills AI visibility over time.

Stop Guessing What AI Sees

MEASURE THE LEVERS
THAT ACTUALLY EXIST.

If you want this methodology applied to your specific site - your real logs, your real citation data, your real fix list - the audit is the productized way to do it.