Robots.txt Generator

By · Updated

Build a valid robots.txt from a site-type template, block AI crawlers or individual search engines, and add your sitemap — with a live preview and a warning for every common mistake. Everything runs in your browser; nothing is uploaded.

Robots.txt builder

Environment
AI crawlers
Block search engines

Leave all unchecked to let search engines crawl normally.

Block common paths

robots.txt

# robots.txt — generated by devkraken.com/tools/robots-txt-generator/

User-agent: *
Allow: /

Sitemap: https://devkraken.com/sitemap.xml

Site-type templates

Each template seeds the User-agent: * group with sensible defaults. You can refine them with the path and custom controls above.

Site type Default rules Why
Blog / content site Allow: / Crawl everything — content sites want maximum indexing.
SaaS app Disallow: /dashboard/ Disallow: /settings/ Disallow: /account/ Disallow: /api/ Keep the marketing site indexable, hide the authenticated app and API.
Ecommerce Disallow: /cart/ Disallow: /checkout/ Disallow: /account/ Disallow: /*?add-to-cart= Index products and categories; hide cart, checkout, and account flows.
Documentation Allow: / Index all docs so search and AI answer engines can cite them.
WordPress Allow: /wp-admin/admin-ajax.php Disallow: /wp-admin/ Block wp-admin but keep admin-ajax.php, which themes and plugins need.

AI crawler reference

The user-agents the generator can block. “Training” bots feed model training; “search” bots index for AI answer engines; “assistant” bots fetch a page on demand when a user asks.

User-agent Operator Purpose
GPTBot OpenAI GPTBot (training) training
OAI-SearchBot OpenAI SearchBot search
ChatGPT-User ChatGPT-User (on-demand) assistant
ClaudeBot Anthropic ClaudeBot (training) training
Claude-User Claude-User (on-demand) assistant
Claude-SearchBot Claude SearchBot search
anthropic-ai anthropic-ai (legacy) training
Google-Extended Google-Extended (Gemini training) training
PerplexityBot PerplexityBot search
Perplexity-User Perplexity-User (on-demand) assistant
CCBot Common Crawl CCBot training
Meta-ExternalAgent Meta-ExternalAgent training
Amazonbot Amazonbot training
Applebot-Extended Applebot-Extended training
Bytespider ByteDance Bytespider training

What robots.txt does — and does not — do

It does

  • Tell compliant crawlers which paths to skip.
  • Reduce crawl load on sections you don't want indexed.
  • Advertise your sitemap to every crawler at once.
  • Set a crawl-delay for Bing and Yandex.

It does not

  • Protect content — the file and the paths are public.
  • Stop crawlers that choose to ignore it.
  • Guarantee a page stays out of search — a blocked URL can still be indexed if linked elsewhere. Use a noindex meta tag for that.
  • Remove content already crawled or trained on.

For anything genuinely private, require authentication or block at the server or CDN — never rely on Disallow.

Wildcards and pattern matching

Two special characters are widely supported by Google and Bing:

  • * matches any run of characters. Disallow: /*?replytocom= blocks every URL containing that query string.
  • $ anchors the end of the path. Disallow: /*.pdf$ blocks URLs ending in .pdf.

When more than one rule matches, the most specific rule (the longest pattern) wins, and an Allow beats a Disallow of equal length. Confirm any rule with the robots.txt tester.

Frequently asked

What is a robots.txt file and where does it go?
robots.txt is a plain-text file that tells crawlers which parts of your site they may request. It must live at the root of your domain — https://example.com/robots.txt — and apply to that host and protocol only. A file at a subpath (like /blog/robots.txt) is ignored. Generate the file here, then upload it to your web root or have your framework serve it at /robots.txt.
How do I block ChatGPT, Claude, and other AI crawlers?
Add a group that disallows the AI user-agents: GPTBot (OpenAI training), ClaudeBot (Anthropic training), Google-Extended (Gemini training), CCBot (Common Crawl), and others. Choose “Block AI training crawlers” to stop training bots while keeping AI search and on-demand fetch bots, or “Block all AI crawlers” to disallow every AI user-agent. The generator writes the correct User-agent groups for you.
Does robots.txt actually stop AI from using my content?
Only for crawlers that choose to obey it. robots.txt is a voluntary standard: well-behaved bots (Googlebot, Bingbot, GPTBot, ClaudeBot) respect it, but it is not enforced. It cannot stop a crawler that ignores the rules, and it does not remove content already trained on or indexed. For guaranteed control, gate content behind authentication or block user-agents at the server or CDN.
Is robots.txt a security feature?
No. robots.txt is a crawler instruction, not access control. Listing a path under Disallow tells compliant crawlers to skip it, but the path is still publicly reachable — and the robots.txt file itself is public, so you are effectively publishing a list of the directories you consider sensitive. Protect private content with authentication or server-side rules, never with Disallow.
What should a WordPress robots.txt contain?
Disallow /wp-admin/ but explicitly Allow /wp-admin/admin-ajax.php, because themes and plugins call admin-ajax.php on the front end and blocking it can break functionality and rendering. You generally do not need to block /wp-includes/ on modern WordPress. Pick the WordPress template here to get those rules.
Why does the staging option block everything?
A staging, preview, or development site should never be indexed — duplicate content competing with production hurts SEO, and you rarely want unfinished pages public. The staging template emits User-agent: * with Disallow: / and omits the sitemap. Swap to your production rules (or remove the blanket block) before launch.
Does Crawl-delay work for Google?
No. Googlebot ignores Crawl-delay entirely; control its crawl rate in Google Search Console instead. Bing and Yandex do honour Crawl-delay, so it is still useful for those engines. The generator adds the directive when you set a value, and the validator reminds you of the Google caveat.
Do I need to list my sitemap in robots.txt?
It is optional but recommended. A Sitemap: line gives every crawler an absolute URL to your sitemap without you having to submit it in each search engine's tools. You can list more than one Sitemap line. Submitting the sitemap in Search Console as well does no harm.
Does this generator send my configuration to a server?
No. The file is assembled by JavaScript in your browser — nothing is uploaded, logged, or stored. You can build the file offline once the page has loaded and download it directly.
All tools →