THERE IS NO SINGLE RIGHT ANSWER.
The default SEO advice is "allow all AI bots." The default privacy advice is "block all AI bots." Both are wrong as universal answers; both are right for specific business types. The correct policy depends on five inputs, weighed in combination.
This is the framework I use in audits when a client asks "should we open up to AI?" The decision is rarely obvious until the inputs are explicit.
BUSINESS TYPE.
Content publishers (media, blogs, research, B2B SaaS marketing): AI citations are top-of-funnel traffic. Allow is the default.
E-commerce: AI citations drive consideration-stage traffic. Allow is the default for marketing pages; block / disallow on cart, checkout, account paths.
Service businesses (law, medical, accounting, consulting): AI citations help discovery. Allow content pages; block / disallow client-portal areas.
Paywalled / subscription content: complex. Most major paywalled publishers (NYT, Bloomberg, FT) negotiate licensing deals separately and block training while allowing search citation. Pattern: allow OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User; block GPTBot, ClaudeBot, CCBot, Bytespider.
Regulated content: legal/medical/financial advice with publication restrictions. Often requires explicit licensing review before allowing AI training. Default to block training; allow citation if professional rules permit.
CONTENT SENSITIVITY.
Different sensitivity tiers map to different policies:
- Public marketing / educational: low sensitivity. Allow all by default.
- Customer data, account information: never AI-accessible. Block all bots; serve auth-required pages with proper auth, not just
noindex. - Pricing, internal tools, beta features: medium sensitivity. Disallow in robots.txt; rely on access control as the real protection.
- Forum / user-generated content: depends on TOS with users. Most large platforms allow AI scraping; some explicitly opt out per user.
TRAINING-DATA TOLERANCE.
How comfortable are you with your content training future AI models?
Comfortable: allow GPTBot, ClaudeBot, CCBot, Bytespider, Meta-ExternalAgent. Default for most content publishers.
Mixed (allow citation, block training): the most common policy for premium content. Block training-only bots, allow citation/search bots. See article N. 10 Config 02 for the specific lines.
Block all training: required for content with strict licensing, regulated content, content under unresolved litigation. Use named-bot blocklist (article N. 10 Config 03).
CITATION OUTCOME.
What do you want to happen when an AI cites you?
- Maximum citation visibility: allow all citation/search bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot, ChatGPT-User, Claude-User, Perplexity-User, Google-Extended, Applebot-Extended). Even if blocking training.
- Selective citation: allow only the surfaces your audience uses. (Western B2B audience? Likely ChatGPT + Perplexity matter most. East Asian audience? Bytespider + Doubao matter more.)
- Zero citation visibility: rare. Block all AI bots including citation bots. Verify there is a real reason; "we don't want our content cited" is sometimes a misunderstanding of how AI citations work.
BANDWIDTH AND COST.
AI bots can be aggressive crawlers, especially Bytespider and PerplexityBot. For high-traffic sites this is fine; for low-traffic sites on tight bandwidth budgets, the cost matters.
The fix is rarely "block them." The fix is rate-limiting at Cloudflare / origin level: 60-120 requests/minute per IP is plenty for any AI bot's normal crawl. Hard 429s on bursts cause the bot to back off without cutting visibility.
THE FOUR POLICIES.
Synthesised from the five inputs:
- Allow All: default for content-driven businesses, no licensing constraints. (article N. 10, Config 01)
- Allow Citation, Block Training: default for premium publishers, regulated industries with citation tolerance. (Config 02)
- Named-Bot Allowlist: paranoid baseline. Allow only the bots you have explicitly evaluated. Maintenance overhead per quarter. (Config 03)
- Block All AI: rare. Strict licensing, litigation, or business model that explicitly excludes AI surfaces. (Config 04)
THE BOTTOM LINE.
There is no universal correct policy. Run the five inputs as a checklist before changing your robots.txt. "Allow all" is the right default for ~70% of sites; "allow citation, block training" is right for ~20%; the named-allowlist and block-all policies are correct for the remaining ~10% combined. Knowing which 10% you are in is the entire question.