Free Tool

Block AI Bots from Your Website

Stop AI crawlers from harvesting your content for training and answer engines. Here's every major AI bot — what it does, who runs it, and a copy-ready robots.txt rule to block it.

Block every AI bot at onceCopy the full block below, or open the generator with all AI crawlers pre-blocked.
Block all AI bots in the Generator
Block all AI crawlers — paste into robots.txtrobots.txt

Every AI crawler, explained

GPTBotOpenAI

OpenAI's training crawler. Blocking it stops your content being used to train future ChatGPT / GPT models. Does not affect ChatGPT's live browsing or search citations.

User-agent: GPTBot Disallow: /
ChatGPT-UserOpenAI

Fetches a page in real time when a ChatGPT user clicks a link or asks it to browse. Blocking removes your pages from those on-demand fetches.

User-agent: ChatGPT-User Disallow: /
OAI-SearchBotOpenAI

Indexes the web for ChatGPT Search. Block this if you do not want to appear as a cited source in ChatGPT's search results.

User-agent: OAI-SearchBot Disallow: /
ClaudeBotAnthropic

Anthropic's training crawler for Claude. Blocking it keeps your content out of Claude model training data.

User-agent: ClaudeBot Disallow: /
Claude-SearchBotAnthropic

Indexes pages to improve Claude's search answers. Block to stay out of Claude's search index.

User-agent: Claude-SearchBot Disallow: /
Claude-UserAnthropic

Fetches pages on demand when a Claude user asks it to browse a URL.

User-agent: Claude-User Disallow: /
anthropic-aiAnthropic

Older Anthropic user-agent. Keep it in your block list for full coverage of legacy crawls.

User-agent: anthropic-ai Disallow: /
Google-ExtendedGoogle

Controls whether your content trains Gemini and Vertex AI. Critically, blocking it does NOT affect Googlebot or your Google Search rankings.

User-agent: Google-Extended Disallow: /
CCBotCommon Crawl

Common Crawl's bot builds the open dataset that many LLMs train on. Blocking it cuts off a major upstream training source.

User-agent: CCBot Disallow: /
PerplexityBotPerplexity

Builds Perplexity's answer index. Block to avoid being indexed as a Perplexity source.

User-agent: PerplexityBot Disallow: /
Perplexity-UserPerplexity

Fetches pages live when a Perplexity user's query needs them.

User-agent: Perplexity-User Disallow: /
BytespiderByteDance

ByteDance's aggressive crawler feeding TikTok / Doubao AI. Often high-volume — many sites block it to save crawl budget.

User-agent: Bytespider Disallow: /
AmazonbotAmazon

Amazon's crawler powering Alexa and Amazon AI features.

User-agent: Amazonbot Disallow: /
Applebot-ExtendedApple

Governs whether your content trains Apple Intelligence. Separate from Applebot — blocking it does NOT affect Siri or Spotlight.

User-agent: Applebot-Extended Disallow: /
Meta-ExternalAgentMeta

Meta's crawler for training Meta AI / Llama models.

User-agent: Meta-ExternalAgent Disallow: /
cohere-aiCohere

Cohere's crawler for model training.

User-agent: cohere-ai Disallow: /
DiffbotDiffbot

Extracts structured data to build knowledge graphs sold to AI products.

User-agent: Diffbot Disallow: /
TimpibotTimpi

Crawler for Timpi's decentralized search and AI index.

User-agent: Timpibot Disallow: /

Own your content in the AI era

Every week brings a new crawler scraping the open web to train models and feed answer engines. Most generic robots.txt tools have not kept up — they still list dead bots from 2010 and none of the crawlers that actually matter now. This page documents the full modern list: GPTBot, ClaudeBot, Google-Extended, CCBot, PerplexityBot, Bytespider, Applebot-Extended, Meta-ExternalAgent and more.

A smart strategy usually blocks the training crawlers (so your work isn\'t used to build models for free) while allowing the answer/search crawlers (so you still earn citations and referral traffic). Pick per category — and remember robots.txt is honored by the legitimate operators but is not a hard firewall. For guaranteed blocking, add server-side or WAF rules too.

Verify in Google Search Console

Frequently Asked Questions