ToolsCourt
BlogComplete robots.txt Guide: Every Directive Explained with Examples
SEO8 min read·January 2025

Complete robots.txt Guide: Every Directive Explained with Examples

User-agent, Disallow, Allow, Sitemap, Crawl-delay — all directives explained with real-world examples and common mistakes.

Try the free tool
No signup. Runs in your browser. Takes 10 seconds.
Open Robots.txt Generator

What robots.txt Actually Does

robots.txt tells crawlers which URLs they should not visit. It does not prevent indexing — a page can be indexed without being crawled if Google finds links to it. For true removal from search results, use the noindex meta tag. robots.txt is for managing crawl budget and protecting sensitive URLs from showing up in search results.

Every Directive Explained

# User-agent: which bot this applies to
User-agent: *          # All bots
User-agent: Googlebot  # Only Google

# Disallow: paths the bot should not visit
Disallow: /admin/      # Blocks /admin/ and all sub-paths
Disallow: /            # Blocks the entire site
Disallow:              # Allows everything (empty = allow all)

# Allow: exceptions to Disallow rules
Disallow: /private/
Allow: /private/public-page.html  # This specific page is OK

# Crawl-delay: seconds between requests (Googlebot ignores this)
Crawl-delay: 10

# Sitemap: where to find your sitemap
Sitemap: https://example.com/sitemap.xml

The 5 Most Common robots.txt Mistakes

  • Blocking your entire site: Disallow: / under User-agent: * prevents ALL crawling. This happens during staging and never gets reverted. Check with Google Search Console immediately.
  • Blocking CSS and JavaScript: Google needs these to render your pages. Never block /wp-content/, /assets/, or other resource directories.
  • Using it as a security measure: Bad actors ignore robots.txt. It is not a security tool.
  • Wrong syntax: robots.txt is case-sensitive and whitespace-sensitive. Disallow:/admin (no space) may not work on all crawlers.
  • Forgetting the sitemap: Include Sitemap: https://yourdomain.com/sitemap.xml at the end of every robots.txt.

Recommended robots.txt for a Next.js Site

User-agent: *
Allow: /
Disallow: /api/
Disallow: /_next/static/

Sitemap: https://yourdomain.com/sitemap.xml
💡 Use the ToolsCourt Robots.txt Generator to build your file visually with templates for WordPress, Next.js, and AI crawler blocking. Download and upload in 60 seconds.
Ready to try it?
Free, instant, no signup required.
Open Robots.txt Generator Free →