Which AI bots crawl your content, why you might want to block them, and the exact robots.txt code to do it.
Since 2022, AI companies have deployed crawlers to collect training data from the web. Unlike Google's Googlebot (which helps your SEO), these crawlers consume bandwidth and content without providing ranking benefits in return. As of 2025, the major AI crawlers are:
| Crawler | Company | Respects robots.txt? |
|---|---|---|
| GPTBot | OpenAI | Yes |
| Google-Extended | Yes | |
| CCBot | Common Crawl | Yes |
| anthropic-ai | Anthropic | Yes |
| FacebookBot | Meta | Yes |
| Applebot-Extended | Apple | Yes |
User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: FacebookBot Disallow: / User-agent: Applebot-Extended Disallow: / # Keep Google and Bing crawling normally User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / User-agent: * Allow: / Sitemap: https://yourdomain.com/sitemap.xml