WHY VERIFICATION MATTERS.
User-agent strings are trivially spoofed. Anything can claim to be GPTBot in a single header. Real verification is done at the network level via reverse DNS lookup or IP range matching. Most published articles on AI bots skip this step entirely.
Practical reasons to verify: you want to grant special treatment to real AI bots (e.g., bypass rate-limits) without granting it to spoofers; you want to filter your bot logs for accurate measurement; you want to detect adversaries probing your site under a friendly disguise.
REVERSE DNS.
The cleanest method. Each major bot's IPs reverse-resolve to the operator's domain. Spoofed IPs do not.
- GPTBot, OAI-SearchBot, ChatGPT-User reverse to
*.openai.com - ClaudeBot, Claude-User, Claude-SearchBot reverse to
*.anthropic.com - PerplexityBot, Perplexity-User reverse to
*.perplexity.ai - Bingbot reverses to
*.search.msn.com - Googlebot, Google-Extended reverse to
*.googlebot.comor*.google.com - Bytespider reverses to
*.bytedance.com
THE VERIFICATION COMMAND.
From any terminal, given an IP from your access log:
host <ip>- returns the reverse DNS name. Should match the expected operator domain.host <reverse-dns-name>- returns the forward IPs. Should include the original IP. (This is the round-trip check; matters because reverse DNS can be set arbitrarily, but forward DNS must be controlled by the actual domain owner.)- If both checks pass: real bot. If either fails: spoofer.
PUBLISHED IP RANGES.
Most operators publish their bot IP ranges. Faster than reverse DNS for high-volume verification.
- OpenAI:
https://openai.com/gptbot.json- JSON list of CIDR ranges per bot. - Anthropic:
https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/security#crawler-and-bot-allowlist- published per bot. - Google:
https://www.gstatic.com/ipranges/goog.json+https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot. - Bing:
https://www.bing.com/toolbox/bingbot.json. - Perplexity: published in their docs.
PRACTICAL USAGE.
In production, the typical flow:
- 1. Receive a request claiming to be GPTBot via user-agent header.
- 2. Check the source IP against the published GPTBot IP ranges. If it matches: trust the bot.
- 3. If it does not match the published ranges: do reverse DNS round-trip. If it passes: trust.
- 4. If it fails both: do not extend bot-specific privileges. Treat as a regular request, apply human rate-limiting, etc.
THE BOTTOM LINE.
User-agent verification is mandatory if you are doing anything special for AI bots (bypassing rate-limits, returning cached responses, serving optimised HTML). Otherwise spoofers get the same privileges and the discrimination is meaningless. The verification commands are simple, the IP-range files are public, and there is no excuse for treating user-agents as ground truth in 2026.