THE 2025 SPLIT.
Through 2024 OpenAI fronted most of its web fetching with a single bot, GPTBot. In late 2025 the architecture changed: the work split across three named bots, each with a documented purpose, distinct user-agent strings, and separate IP ranges. Most SEO articles have not caught up.
Why does this matter? Because the correct robots.txt policy is no longer a single line. If you block GPTBot to protect your training-data exposure but leave OAI-SearchBot allowed, you preserve ChatGPT visibility while opting out of training. If you block all three, you lose ChatGPT citations entirely. If you block none, you accept training exposure as the cost of citation visibility.
GPTBOT - TRAINING DATA.
Purpose: bulk crawling for model training. Archives content for inclusion in foundation-model training datasets. Operates on a slow, broad sweep similar to traditional search-engine archive crawlers.
Crawl pattern: predictable, lower-frequency revisits, broad coverage of newly-published URLs. Respects robots.txt. Tends to hit RSS feeds and sitemaps to discover new content, then samples HTML.
Correct policy: allow if you want your content in training data, block if you don't. Blocking GPTBot does not block ChatGPT from citing you - that is a different bot.
OAI-SEARCHBOT - CHATGPT SEARCH.
Purpose: builds and maintains the index used by ChatGPT's search and citation features. This is the bot whose visits correlate with ChatGPT citations.
Crawl pattern: aggressive on freshly-published URLs (it cares about recency for citation freshness), heavy on RSS and sitemap polling, returns to high-value pages on a faster cadence than GPTBot.
Correct policy: allow, unless you specifically want zero ChatGPT citation visibility. This is the bot most practitioners actually want crawling their site.
CHATGPT-USER - ON-DEMAND.
Purpose: fetches a specific URL when a ChatGPT user clicks a link or asks ChatGPT to read a specific page. Per-request, not bulk.
Crawl pattern: bursty and unpredictable. A page may go months without ChatGPT-User visits, then get hit dozens of times in an hour because a viral ChatGPT answer linked to it. Respects robots.txt but not aggressively rate-limited.
Correct policy: allow. Blocking ChatGPT-User means ChatGPT users who click through to your page get a 403. There is essentially no good reason to block it.
THE COMMON MISCONFIGURATION.
The most common mistake I see in audits: a robots.txt file that blocks GPTBot but says nothing about OAI-SearchBot or ChatGPT-User. The owner thinks they have "opted out of OpenAI." In reality, they are still being crawled by the two bots that matter most for visibility, and only the training crawler is blocked.
The opposite mistake is also common: a blanket User-agent: * with Disallow: /, copy-pasted from an older template. This blocks everything OpenAI runs, plus everything else. The site disappears from AI surfaces.
If your goal is "opted out of training, opted in to citations," the correct configuration is:
User-agent: GPTBot+Disallow: /User-agent: OAI-SearchBot+Allow: /User-agent: ChatGPT-User+Allow: /
THE BOTTOM LINE.
Three bots, three jobs, three policies. Treating them as one is the easiest configuration mistake to make in 2026, and the easiest to fix once you know the structure. Verify your current robots.txt against the three names today; the fix is a five-minute edit.