Verifying Real GPTBot from Spoofers

WHY VERIFICATION MATTERS.

User-agent strings are trivially spoofed. Anything can claim to be GPTBot in a single header. Real verification is done at the network level via reverse DNS lookup or IP range matching. Most published articles on AI bots skip this step entirely.

Practical reasons to verify: you want to grant special treatment to real AI bots (e.g., bypass rate-limits) without granting it to spoofers; you want to filter your bot logs for accurate measurement; you want to detect adversaries probing your site under a friendly disguise.

Finding 01.

Method 1

REVERSE DNS.

The cleanest method. Each major bot's IPs reverse-resolve to the operator's domain. Spoofed IPs do not.

GPTBot, OAI-SearchBot, ChatGPT-User reverse to *.openai.com
ClaudeBot, Claude-User, Claude-SearchBot reverse to *.anthropic.com
PerplexityBot, Perplexity-User reverse to *.perplexity.ai
Bingbot reverses to *.search.msn.com
Googlebot, Google-Extended reverse to *.googlebot.com or *.google.com
Bytespider reverses to *.bytedance.com

THE VERIFICATION COMMAND.

From any terminal, given an IP from your access log:

host <ip> - returns the reverse DNS name. Should match the expected operator domain.
host <reverse-dns-name> - returns the forward IPs. Should include the original IP. (This is the round-trip check; matters because reverse DNS can be set arbitrarily, but forward DNS must be controlled by the actual domain owner.)
If both checks pass: real bot. If either fails: spoofer.

Finding 02.

Method 2

PUBLISHED IP RANGES.

Most operators publish their bot IP ranges. Faster than reverse DNS for high-volume verification.

OpenAI: https://openai.com/gptbot.json - JSON list of CIDR ranges per bot.
Anthropic: https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/security#crawler-and-bot-allowlist - published per bot.
Google: https://www.gstatic.com/ipranges/goog.json + https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot.
Bing: https://www.bing.com/toolbox/bingbot.json.
Perplexity: published in their docs.

PRACTICAL USAGE.

In production, the typical flow:

1. Receive a request claiming to be GPTBot via user-agent header.
2. Check the source IP against the published GPTBot IP ranges. If it matches: trust the bot.
3. If it does not match the published ranges: do reverse DNS round-trip. If it passes: trust.
4. If it fails both: do not extend bot-specific privileges. Treat as a regular request, apply human rate-limiting, etc.

THE BOTTOM LINE.

User-agent verification is mandatory if you are doing anything special for AI bots (bypassing rate-limits, returning cached responses, serving optimised HTML). Otherwise spoofers get the same privileges and the discrimination is meaningless. The verification commands are simple, the IP-range files are public, and there is no excuse for treating user-agents as ground truth in 2026.

VERIFYING REAL
GPTBOT.

WHY VERIFICATION MATTERS.

REVERSE DNS.

THE VERIFICATION COMMAND.

PUBLISHED IP RANGES.

PRACTICAL USAGE.

THE BOTTOM LINE.

MEASURE THE LEVERS
THAT ACTUALLY EXIST.

WHY VERIFICATION MATTERS.

REVERSE DNS.

THE VERIFICATION COMMAND.

PUBLISHED IP RANGES.

PRACTICAL USAGE.

THE BOTTOM LINE.

MEASURE THE LEVERSTHAT ACTUALLY EXIST.

KEEP READING.

The Cloudflare AI Bot Block

OAI-SearchBot vs GPTBot vs ChatGPT-User

ClaudeBot vs Claude-User vs Claude-SearchBot

robots.txt for AI Bots: The 2026 Configuration

MEASURE THE LEVERS
THAT ACTUALLY EXIST.