Say Hello

OAI-SEARCHBOT VS
GPTBOT VS CHATGPT-USER.

OpenAI restructured its crawler infrastructure in late 2025 into three named bots. Most articles still treat them as one. They are not the same - different crawl patterns, different impact on AI visibility, different correct policies.

OpenAI Bots
3
Restructured
Late 2025
Robots.txt
Honoured
Patterns
Distinct

THE 2025 SPLIT.

Through 2024 OpenAI fronted most of its web fetching with a single bot, GPTBot. In late 2025 the architecture changed: the work split across three named bots, each with a documented purpose, distinct user-agent strings, and separate IP ranges. Most SEO articles have not caught up.

Why does this matter? Because the correct robots.txt policy is no longer a single line. If you block GPTBot to protect your training-data exposure but leave OAI-SearchBot allowed, you preserve ChatGPT visibility while opting out of training. If you block all three, you lose ChatGPT citations entirely. If you block none, you accept training exposure as the cost of citation visibility.

Finding 01.
The training crawler

GPTBOT - TRAINING DATA.

Purpose: bulk crawling for model training. Archives content for inclusion in foundation-model training datasets. Operates on a slow, broad sweep similar to traditional search-engine archive crawlers.

Crawl pattern: predictable, lower-frequency revisits, broad coverage of newly-published URLs. Respects robots.txt. Tends to hit RSS feeds and sitemaps to discover new content, then samples HTML.

Correct policy: allow if you want your content in training data, block if you don't. Blocking GPTBot does not block ChatGPT from citing you - that is a different bot.

Finding 02.
The search index crawler

OAI-SEARCHBOT - CHATGPT SEARCH.

Purpose: builds and maintains the index used by ChatGPT's search and citation features. This is the bot whose visits correlate with ChatGPT citations.

Crawl pattern: aggressive on freshly-published URLs (it cares about recency for citation freshness), heavy on RSS and sitemap polling, returns to high-value pages on a faster cadence than GPTBot.

Correct policy: allow, unless you specifically want zero ChatGPT citation visibility. This is the bot most practitioners actually want crawling their site.

Finding 03.
The on-demand fetcher

CHATGPT-USER - ON-DEMAND.

Purpose: fetches a specific URL when a ChatGPT user clicks a link or asks ChatGPT to read a specific page. Per-request, not bulk.

Crawl pattern: bursty and unpredictable. A page may go months without ChatGPT-User visits, then get hit dozens of times in an hour because a viral ChatGPT answer linked to it. Respects robots.txt but not aggressively rate-limited.

Correct policy: allow. Blocking ChatGPT-User means ChatGPT users who click through to your page get a 403. There is essentially no good reason to block it.

GPTBOT
Training crawler. Slow, broad, predictable. Block if you want to opt out of training. Does NOT control ChatGPT citations.
OAI-SEARCHBOT
Search index. Aggressive on fresh URLs, drives ChatGPT citations. Allow unless you actively don't want to be cited.
CHATGPT-USER
On-demand. Bursty, user-triggered. Hits pages users click. No good reason to block.

THE COMMON MISCONFIGURATION.

The most common mistake I see in audits: a robots.txt file that blocks GPTBot but says nothing about OAI-SearchBot or ChatGPT-User. The owner thinks they have "opted out of OpenAI." In reality, they are still being crawled by the two bots that matter most for visibility, and only the training crawler is blocked.

The opposite mistake is also common: a blanket User-agent: * with Disallow: /, copy-pasted from an older template. This blocks everything OpenAI runs, plus everything else. The site disappears from AI surfaces.

If your goal is "opted out of training, opted in to citations," the correct configuration is:

THE BOTTOM LINE.

Three bots, three jobs, three policies. Treating them as one is the easiest configuration mistake to make in 2026, and the easiest to fix once you know the structure. Verify your current robots.txt against the three names today; the fix is a five-minute edit.

Stop Guessing What AI Sees

MEASURE THE LEVERS
THAT ACTUALLY EXIST.

If you want this methodology applied to your specific site - your real logs, your real citation data, your real fix list - the audit is the productized way to do it.