Free SEO Tool

Robots.txt Generator — Smart Defaults & AI Bot Blocking

Pick a preset or build your own, and this free Robots.txt Generator creates a clean robots.txt file for your site, including WordPress and Shopify defaults, sitemap support, and one-click blocking of GPTBot, ClaudeBot, and other AI crawlers.

Robots.txt Settings
Generated robots.txt

Pick a preset or customize, then click Generate robots.txt to see your file here.

The robots.txt Mistake That Cost Me 60% of My Traffic

I once watched a client's site lose 60% of its organic traffic in a week. The cause was four characters in their robots.txt file: Disallow: /. A developer had pushed a staging robots.txt to production by mistake and nobody noticed until the rankings collapsed.

That is the thing about robots.txt. It is the smallest, most boring file on your site, and also the one that can wipe you off Google in a single deploy. Get it right and you barely think about it. Get it wrong and you can be invisible for weeks before you figure out what happened.

What robots.txt Actually Does

The robots.txt file sits at the root of your domain (yoursite.com/robots.txt) and tells search engine crawlers which parts of your site they can and cannot access. It uses a simple syntax: User-agent says which crawler the rule applies to, Disallow says what it cannot crawl, and Allow says what it explicitly can.

Important: robots.txt is a request, not a force field. Most well-behaved crawlers (Googlebot, Bingbot) respect it. Bad actors ignore it completely. So it is a tool for managing legitimate crawlers, not a security mechanism.

The Two Things You Actually Need

For 90% of sites, you only need two things in your robots.txt. First, a single rule that tells crawlers what to skip — usually admin pages, internal search results, and things like cart/checkout for e-commerce. Second, a Sitemap directive pointing to your XML sitemap so crawlers can find your important pages quickly.

Everything else is optimisation on top. The WordPress preset in this tool covers the standard exclusions (admin, login, search, feeds, plugin/theme directories) plus a Sitemap line. Drop it in, and you have a sensible default.

The AI Crawler Question

Since 2023, every robots.txt conversation has shifted to one question: do you let AI companies train on your content? Tools like GPTBot, ClaudeBot, Google-Extended, CCBot, and Bytespider crawl the open web to train their models. If you do not want your articles used as training data, you have to explicitly block these bots in robots.txt.

I am ambivalent about this. Some sites benefit from AI exposure (visibility in AI answers can drive traffic). Others lose because their content gets summarised and repackaged. There is no right answer. The checkbox approach in this tool lets you decide bot by bot, instead of an all-or-nothing setting.

How I Use This Tool

Every new site I launch starts with this generator. I pick the WordPress preset, add the site URL and sitemap, decide which AI bots to block, and download the file. Total time: under a minute.

For e-commerce sites I switch to the e-commerce preset, which adds blocks for cart, checkout, and account pages plus the URL parameter patterns that cause duplicate content issues on most platforms.

For staging sites I use the "Block all" preset. It generates Disallow: / which keeps the staging copy out of search until I am ready to launch. The day I push to production, I regenerate with the proper preset and replace the staging robots.txt before the DNS change goes live.

The Common Disasters

Three mistakes I see over and over. First, blocking /wp-content/ entirely on WordPress, which prevents Googlebot from rendering your pages properly because CSS and JS live in there. Second, using robots.txt to hide a private page (it does not work — robots.txt files are public, so you have just told everyone where your private page is). Third, leaving the staging robots.txt on production after launch, which is the disaster that cost my client 60% of their traffic.

None of these are subtle problems. All of them are 100% preventable with a quick robots.txt audit before you deploy.

One Last Thing

Test your robots.txt before you trust it. Google Search Console has a robots.txt tester (under Settings) that shows you exactly which URLs are blocked and which are allowed. The Test button on this generator opens it. Use it. Two minutes of testing has saved me from rankings disasters more times than I can count.

Need help with technical SEO on your site?

I help bloggers and small businesses fix robots.txt issues, recover from accidental de-indexing, set up XML sitemaps, and audit crawl budget — without the agency price tag.

Robots.txt Generator – FAQs

Common questions about robots.txt files, search engine crawlers, and using this generator.

Where does the robots.txt file go?

It must be at the root of your domain: https://yoursite.com/robots.txt. It does not work in subdirectories. For WordPress sites, you can upload it via FTP to your site root, or use a plugin like Rank Math or Yoast SEO that lets you edit robots.txt from the WordPress admin.

What happens if I do not have a robots.txt file?

Crawlers will assume everything is allowed. For most small sites, that is fine. Robots.txt becomes important when you have admin areas, internal search pages, or duplicate content that should not be crawled, or when you want to point crawlers to your sitemap.

Should I block AI crawlers like GPTBot and ClaudeBot?

This is a personal choice. Blocking them prevents your content from being used to train AI models, but may also reduce your visibility in AI-powered answer engines (which now drive a meaningful share of traffic for some sites). Decide based on whether AI exposure helps or hurts your business.

Can I use robots.txt to hide private content?

No. Robots.txt is publicly visible at yoursite.com/robots.txt, so listing a path there announces its existence. Use server-side authentication, password protection, or noindex meta tags for private content — not robots.txt.

What is the difference between Disallow and noindex?

Disallow in robots.txt prevents crawling — Google may still index the URL if other sites link to it (it just will not see the content). Noindex (in a meta tag or HTTP header) prevents indexing entirely. To fully exclude a page, use noindex, not Disallow.

Why does the WordPress preset allow /wp-content/uploads/?

Because Google needs access to your images and media files to display them in image search and to render your pages properly. Blocking /wp-content/ entirely is a common mistake that hurts rankings.

Should I use Crawl-delay?

Usually no. Googlebot ignores Crawl-delay (use crawl rate settings in Search Console instead). Bingbot, Yandex, and some smaller crawlers respect it. Only set a Crawl-delay if you have specific server load issues — otherwise leave it off.

How often should I update my robots.txt?

Rarely. Once you have it set up correctly, you only update when you change site structure (new admin area, new e-commerce platform, new private sections) or when you change your stance on AI crawlers. A typical robots.txt does not change for years.

What is the User-agent line?

It specifies which crawler the rules below apply to. User-agent: * means "all crawlers." You can target specific bots with User-agent: Googlebot or User-agent: Bingbot. The most specific user-agent rule that matches a crawler is the one it follows.

Can I have multiple Sitemap directives?

Yes. List one Sitemap line per sitemap, anywhere in the file (top or bottom). Most large sites have multiple sitemaps (posts, pages, products, images), and you can list them all. The generator supports multiple sitemap URLs — just put one per line.