robots.txt for AI Bots: The 2026 Configuration

WHY MOST ROBOTS.TXT FILES ARE WRONG.

Most robots.txt files I see in audits are some combination of: copied from a 2023 template (so missing 2024-2026 bots), copy-pasted with the wrong indentation (so silently invalid), referencing deprecated bot names (so doing nothing), or carrying a blanket User-agent: * + Disallow: / from a staging environment that nobody updated for production.

The current 2026 list of AI-relevant user-agents is straightforward, but it has changed enough since 2024 that any older template is partially out of date. Here is the active list, the deprecated list, and four ready-to-paste configurations.

THE CURRENT BOT LIST.

The user-agents that matter in 2026, organised by purpose:

Search / citation: OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, Applebot-Extended, Bingbot, DuckDuckBot
Training: GPTBot, ClaudeBot, cohere-ai, Bytespider, Meta-ExternalAgent, Amazonbot
Mixed / general: FacebookBot, CCBot (Common Crawl, feeds many training datasets)

THE DEPRECATED IDENTIFIERS.

Still showing up in copy-pasted templates everywhere. Remove these from your config; they do nothing:

Claude-Web - replaced by Claude-User in early 2026.
anthropic-ai - replaced by ClaudeBot + Claude-User split.
Google-Extended applied as googlebot - separate identifier, do not confuse.
Generic aibot, llmbot - never were real identifiers; just pattern-matching attempts that catch nothing.

Finding 01.

Default recommendation

CONFIG 01 - ALLOW ALL.

If you want maximum AI visibility and accept training-data exposure as the cost, this is the simplest correct config. Explicit allows make the intent obvious; the default for unmatched bots is also allow.

User-agent: * Allow: / Sitemap: https://yoursite.com/sitemap.xml

This is the right default for content-driven sites whose business model benefits from being cited.

Finding 02.

Search-only policy

CONFIG 02 - SEARCH/CITATION ONLY.

Allow the bots that drive citations, block the bots that train models. This is the most common policy for sites whose content is editorial-protected or licensed.

Allow: OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, Bingbot, Applebot-Extended
Disallow: GPTBot, ClaudeBot, cohere-ai, Bytespider, Meta-ExternalAgent, CCBot

Finding 03.

Maximum control

CONFIG 03 - NAMED BOTS ONLY.

Block everything by default; allow only an explicit named list. Highest control, requires maintenance every time a new AI bot launches.

User-agent: * + Disallow: /
Then explicit User-agent: <name> + Allow: / blocks for each bot you want to allow.
This is the right policy for sites with strict content-licensing constraints (paywalled news, regulated industries with publication restrictions).

Finding 04.

Maximum exclusion

CONFIG 04 - BLOCK ALL AI.

Block every AI bot explicitly. Note: this also blocks AI search citations entirely.

Disallow each named AI bot explicitly. Wildcard * patterns are unreliable for AI bots; use named identifiers.
Keep User-agent: Bingbot and traditional search engines explicitly allowed if you still want Google/Bing search traffic.
Right for sites with explicit no-AI policies for legal or licensing reasons. Rare. Verify the business case before applying.

MAINTENANCE SCHEDULE.

AI bots change quarterly. Set a calendar reminder every 90 days to: re-check the active bot list, remove deprecated identifiers from your config, add any new named bots, re-run the curl detection from article N. 09 to verify the policy is actually being enforced. Five minutes of work; saves you the kind of silent-failure debt that kills AI visibility over time.

ROBOTS.TXT FOR
AI BOTS - 2026.

WHY MOST ROBOTS.TXT FILES ARE WRONG.

THE CURRENT BOT LIST.

THE DEPRECATED IDENTIFIERS.

CONFIG 01 - ALLOW ALL.

CONFIG 02 - SEARCH/CITATION ONLY.

CONFIG 03 - NAMED BOTS ONLY.

CONFIG 04 - BLOCK ALL AI.

MAINTENANCE SCHEDULE.

MEASURE THE LEVERS
THAT ACTUALLY EXIST.

WHY MOST ROBOTS.TXT FILES ARE WRONG.

THE CURRENT BOT LIST.

THE DEPRECATED IDENTIFIERS.

CONFIG 01 - ALLOW ALL.

CONFIG 02 - SEARCH/CITATION ONLY.

CONFIG 03 - NAMED BOTS ONLY.

CONFIG 04 - BLOCK ALL AI.

MAINTENANCE SCHEDULE.

MEASURE THE LEVERSTHAT ACTUALLY EXIST.

KEEP READING.

OAI-SearchBot vs GPTBot vs ChatGPT-User

ClaudeBot vs Claude-User vs Claude-SearchBot

Bytespider: The Overlooked AI Bot

Bot Access Policy: Allow vs Tier vs Block

MEASURE THE LEVERS
THAT ACTUALLY EXIST.