The robots.txt Mistake That Cost Me 60% of My Traffic
I once watched a client's site lose 60% of its organic traffic in a week. The cause was four characters in their robots.txt file: Disallow: /. A developer had pushed a staging robots.txt to production by mistake and nobody noticed until the rankings collapsed.
That is the thing about robots.txt. It is the smallest, most boring file on your site, and also the one that can wipe you off Google in a single deploy. Get it right and you barely think about it. Get it wrong and you can be invisible for weeks before you figure out what happened.
What robots.txt Actually Does
The robots.txt file sits at the root of your domain (yoursite.com/robots.txt) and tells search engine crawlers which parts of your site they can and cannot access. It uses a simple syntax: User-agent says which crawler the rule applies to, Disallow says what it cannot crawl, and Allow says what it explicitly can.
Important: robots.txt is a request, not a force field. Most well-behaved crawlers (Googlebot, Bingbot) respect it. Bad actors ignore it completely. So it is a tool for managing legitimate crawlers, not a security mechanism.
The Two Things You Actually Need
For 90% of sites, you only need two things in your robots.txt. First, a single rule that tells crawlers what to skip — usually admin pages, internal search results, and things like cart/checkout for e-commerce. Second, a Sitemap directive pointing to your XML sitemap so crawlers can find your important pages quickly.
Everything else is optimisation on top. The WordPress preset in this tool covers the standard exclusions (admin, login, search, feeds, plugin/theme directories) plus a Sitemap line. Drop it in, and you have a sensible default.
The AI Crawler Question
Since 2023, every robots.txt conversation has shifted to one question: do you let AI companies train on your content? Tools like GPTBot, ClaudeBot, Google-Extended, CCBot, and Bytespider crawl the open web to train their models. If you do not want your articles used as training data, you have to explicitly block these bots in robots.txt.
I am ambivalent about this. Some sites benefit from AI exposure (visibility in AI answers can drive traffic). Others lose because their content gets summarised and repackaged. There is no right answer. The checkbox approach in this tool lets you decide bot by bot, instead of an all-or-nothing setting.
How I Use This Tool
Every new site I launch starts with this generator. I pick the WordPress preset, add the site URL and sitemap, decide which AI bots to block, and download the file. Total time: under a minute.
For e-commerce sites I switch to the e-commerce preset, which adds blocks for cart, checkout, and account pages plus the URL parameter patterns that cause duplicate content issues on most platforms.
For staging sites I use the "Block all" preset. It generates Disallow: / which keeps the staging copy out of search until I am ready to launch. The day I push to production, I regenerate with the proper preset and replace the staging robots.txt before the DNS change goes live.
The Common Disasters
Three mistakes I see over and over. First, blocking /wp-content/ entirely on WordPress, which prevents Googlebot from rendering your pages properly because CSS and JS live in there. Second, using robots.txt to hide a private page (it does not work — robots.txt files are public, so you have just told everyone where your private page is). Third, leaving the staging robots.txt on production after launch, which is the disaster that cost my client 60% of their traffic.
None of these are subtle problems. All of them are 100% preventable with a quick robots.txt audit before you deploy.
One Last Thing
Test your robots.txt before you trust it. Google Search Console has a robots.txt tester (under Settings) that shows you exactly which URLs are blocked and which are allowed. The Test button on this generator opens it. Use it. Two minutes of testing has saved me from rankings disasters more times than I can count.