Free Tool

Block AI Bots from Your Website

Stop AI crawlers from harvesting your content for training and answer engines. Here's every major AI bot — what it does, who runs it, and a copy-ready robots.txt rule to block it.

Block every AI bot at onceCopy the full block below, or open the generator with all AI crawlers pre-blocked.

Block all AI bots in the Generator

Block all AI crawlers — paste into robots.txtrobots.txt

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

User-agent: Claude-User
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Perplexity-User
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: Timpibot
Disallow: /

Every AI crawler, explained

GPTBotOpenAI

OpenAI's training crawler. Blocking it stops your content being used to train future ChatGPT / GPT models. Does not affect ChatGPT's live browsing or search citations.

User-agent: GPTBot
Disallow: /

ChatGPT-UserOpenAI

Fetches a page in real time when a ChatGPT user clicks a link or asks it to browse. Blocking removes your pages from those on-demand fetches.

User-agent: ChatGPT-User
Disallow: /

OAI-SearchBotOpenAI

Indexes the web for ChatGPT Search. Block this if you do not want to appear as a cited source in ChatGPT's search results.

User-agent: OAI-SearchBot
Disallow: /

ClaudeBotAnthropic

Anthropic's training crawler for Claude. Blocking it keeps your content out of Claude model training data.

User-agent: ClaudeBot
Disallow: /

Claude-SearchBotAnthropic

Indexes pages to improve Claude's search answers. Block to stay out of Claude's search index.

User-agent: Claude-SearchBot
Disallow: /

Claude-UserAnthropic

Fetches pages on demand when a Claude user asks it to browse a URL.

User-agent: Claude-User
Disallow: /

anthropic-aiAnthropic

Older Anthropic user-agent. Keep it in your block list for full coverage of legacy crawls.

User-agent: anthropic-ai
Disallow: /

Google-ExtendedGoogle

Controls whether your content trains Gemini and Vertex AI. Critically, blocking it does NOT affect Googlebot or your Google Search rankings.

User-agent: Google-Extended
Disallow: /

CCBotCommon Crawl

Common Crawl's bot builds the open dataset that many LLMs train on. Blocking it cuts off a major upstream training source.

User-agent: CCBot
Disallow: /

PerplexityBotPerplexity

Builds Perplexity's answer index. Block to avoid being indexed as a Perplexity source.

User-agent: PerplexityBot
Disallow: /

Perplexity-UserPerplexity

Fetches pages live when a Perplexity user's query needs them.

User-agent: Perplexity-User
Disallow: /

BytespiderByteDance

ByteDance's aggressive crawler feeding TikTok / Doubao AI. Often high-volume — many sites block it to save crawl budget.

User-agent: Bytespider
Disallow: /

AmazonbotAmazon

Amazon's crawler powering Alexa and Amazon AI features.

User-agent: Amazonbot
Disallow: /

Applebot-ExtendedApple

Governs whether your content trains Apple Intelligence. Separate from Applebot — blocking it does NOT affect Siri or Spotlight.

User-agent: Applebot-Extended
Disallow: /

Meta-ExternalAgentMeta

Meta's crawler for training Meta AI / Llama models.

User-agent: Meta-ExternalAgent
Disallow: /

cohere-aiCohere

Cohere's crawler for model training.

User-agent: cohere-ai
Disallow: /

DiffbotDiffbot

Extracts structured data to build knowledge graphs sold to AI products.

User-agent: Diffbot
Disallow: /

TimpibotTimpi

Crawler for Timpi's decentralized search and AI index.

User-agent: Timpibot
Disallow: /

Own your content in the AI era

Every week brings a new crawler scraping the open web to train models and feed answer engines. Most generic robots.txt tools have not kept up — they still list dead bots from 2010 and none of the crawlers that actually matter now. This page documents the full modern list: GPTBot, ClaudeBot, Google-Extended, CCBot, PerplexityBot, Bytespider, Applebot-Extended, Meta-ExternalAgent and more.

A smart strategy usually blocks the training crawlers (so your work isn\'t used to build models for free) while allowing the answer/search crawlers (so you still earn citations and referral traffic). Pick per category — and remember robots.txt is honored by the legitimate operators but is not a hard firewall. For guaranteed blocking, add server-side or WAF rules too.

Verify in Google Search Console

Frequently Asked Questions

How do I block AI bots from my website?

Does blocking AI bots affect my Google or Bing rankings?

Does robots.txt actually stop AI crawlers?

Should I block all AI bots?

What is the difference between training bots and answer bots?

Where does the robots.txt file go?

Utah SEO Services

Custom Web Development

More Services

RedTools Platform

SEO Chrome Extension

Popular Tools

SEO Insights & Analysis

Featured Articles

Industries

Block AI Bots from Your Website

Every AI crawler, explained

Own your content in the AI era

Frequently Asked Questions