User-agent, Disallow, Allow, Sitemap, Crawl-delay — all directives explained with real-world examples and common mistakes.
robots.txt tells crawlers which URLs they should not visit. It does not prevent indexing — a page can be indexed without being crawled if Google finds links to it. For true removal from search results, use the noindex meta tag. robots.txt is for managing crawl budget and protecting sensitive URLs from showing up in search results.
# User-agent: which bot this applies to User-agent: * # All bots User-agent: Googlebot # Only Google # Disallow: paths the bot should not visit Disallow: /admin/ # Blocks /admin/ and all sub-paths Disallow: / # Blocks the entire site Disallow: # Allows everything (empty = allow all) # Allow: exceptions to Disallow rules Disallow: /private/ Allow: /private/public-page.html # This specific page is OK # Crawl-delay: seconds between requests (Googlebot ignores this) Crawl-delay: 10 # Sitemap: where to find your sitemap Sitemap: https://example.com/sitemap.xml
Disallow: / under User-agent: * prevents ALL crawling. This happens during staging and never gets reverted. Check with Google Search Console immediately.Disallow:/admin (no space) may not work on all crawlers.Sitemap: https://yourdomain.com/sitemap.xml at the end of every robots.txt.User-agent: * Allow: / Disallow: /api/ Disallow: /_next/static/ Sitemap: https://yourdomain.com/sitemap.xml