Robots.txt Generator
By Dev Kraken · Updated
Build a valid robots.txt from a site-type
template, block AI crawlers or individual search engines, and add your
sitemap — with a live preview and a warning for every common mistake.
Everything runs in your browser; nothing is uploaded.
Robots.txt builder
robots.txt
# robots.txt — generated by devkraken.com/tools/robots-txt-generator/
User-agent: *
Allow: /
Sitemap: https://devkraken.com/sitemap.xml
Site-type templates
Each template seeds the User-agent: *
group with sensible defaults. You can refine them with the path and
custom controls above.
| Site type | Default rules | Why |
|---|---|---|
| Blog / content site | Allow: / | Crawl everything — content sites want maximum indexing. |
| SaaS app | Disallow: /dashboard/ Disallow: /settings/ Disallow: /account/ Disallow: /api/ | Keep the marketing site indexable, hide the authenticated app and API. |
| Ecommerce | Disallow: /cart/ Disallow: /checkout/ Disallow: /account/ Disallow: /*?add-to-cart= | Index products and categories; hide cart, checkout, and account flows. |
| Documentation | Allow: / | Index all docs so search and AI answer engines can cite them. |
| WordPress | Allow: /wp-admin/admin-ajax.php Disallow: /wp-admin/ | Block wp-admin but keep admin-ajax.php, which themes and plugins need. |
AI crawler reference
The user-agents the generator can block. “Training” bots feed model training; “search” bots index for AI answer engines; “assistant” bots fetch a page on demand when a user asks.
| User-agent | Operator | Purpose |
|---|---|---|
| GPTBot | OpenAI GPTBot (training) | training |
| OAI-SearchBot | OpenAI SearchBot | search |
| ChatGPT-User | ChatGPT-User (on-demand) | assistant |
| ClaudeBot | Anthropic ClaudeBot (training) | training |
| Claude-User | Claude-User (on-demand) | assistant |
| Claude-SearchBot | Claude SearchBot | search |
| anthropic-ai | anthropic-ai (legacy) | training |
| Google-Extended | Google-Extended (Gemini training) | training |
| PerplexityBot | PerplexityBot | search |
| Perplexity-User | Perplexity-User (on-demand) | assistant |
| CCBot | Common Crawl CCBot | training |
| Meta-ExternalAgent | Meta-ExternalAgent | training |
| Amazonbot | Amazonbot | training |
| Applebot-Extended | Applebot-Extended | training |
| Bytespider | ByteDance Bytespider | training |
What robots.txt does — and does not — do
It does
- Tell compliant crawlers which paths to skip.
- Reduce crawl load on sections you don't want indexed.
- Advertise your sitemap to every crawler at once.
- Set a crawl-delay for Bing and Yandex.
It does not
- Protect content — the file and the paths are public.
- Stop crawlers that choose to ignore it.
-
Guarantee a page stays out of search — a blocked URL can still
be indexed if linked elsewhere. Use a
noindexmeta tag for that. - Remove content already crawled or trained on.
For anything genuinely private, require authentication or block at the
server or CDN — never rely on Disallow.
Wildcards and pattern matching
Two special characters are widely supported by Google and Bing:
-
*matches any run of characters.Disallow: /*?replytocom=blocks every URL containing that query string. -
$anchors the end of the path.Disallow: /*.pdf$blocks URLs ending in.pdf.
When more than one rule matches, the most specific rule (the longest
pattern) wins, and an Allow beats
a Disallow of equal length.
Confirm any rule with the
robots.txt tester.
Frequently asked
- What is a robots.txt file and where does it go?
- robots.txt is a plain-text file that tells crawlers which parts of your site they may request. It must live at the root of your domain — https://example.com/robots.txt — and apply to that host and protocol only. A file at a subpath (like /blog/robots.txt) is ignored. Generate the file here, then upload it to your web root or have your framework serve it at /robots.txt.
- How do I block ChatGPT, Claude, and other AI crawlers?
- Add a group that disallows the AI user-agents: GPTBot (OpenAI training), ClaudeBot (Anthropic training), Google-Extended (Gemini training), CCBot (Common Crawl), and others. Choose “Block AI training crawlers” to stop training bots while keeping AI search and on-demand fetch bots, or “Block all AI crawlers” to disallow every AI user-agent. The generator writes the correct User-agent groups for you.
- Does robots.txt actually stop AI from using my content?
- Only for crawlers that choose to obey it. robots.txt is a voluntary standard: well-behaved bots (Googlebot, Bingbot, GPTBot, ClaudeBot) respect it, but it is not enforced. It cannot stop a crawler that ignores the rules, and it does not remove content already trained on or indexed. For guaranteed control, gate content behind authentication or block user-agents at the server or CDN.
- Is robots.txt a security feature?
- No. robots.txt is a crawler instruction, not access control. Listing a path under Disallow tells compliant crawlers to skip it, but the path is still publicly reachable — and the robots.txt file itself is public, so you are effectively publishing a list of the directories you consider sensitive. Protect private content with authentication or server-side rules, never with Disallow.
- What should a WordPress robots.txt contain?
- Disallow /wp-admin/ but explicitly Allow /wp-admin/admin-ajax.php, because themes and plugins call admin-ajax.php on the front end and blocking it can break functionality and rendering. You generally do not need to block /wp-includes/ on modern WordPress. Pick the WordPress template here to get those rules.
- Why does the staging option block everything?
- A staging, preview, or development site should never be indexed — duplicate content competing with production hurts SEO, and you rarely want unfinished pages public. The staging template emits User-agent: * with Disallow: / and omits the sitemap. Swap to your production rules (or remove the blanket block) before launch.
- Does Crawl-delay work for Google?
- No. Googlebot ignores Crawl-delay entirely; control its crawl rate in Google Search Console instead. Bing and Yandex do honour Crawl-delay, so it is still useful for those engines. The generator adds the directive when you set a value, and the validator reminds you of the Google caveat.
- Do I need to list my sitemap in robots.txt?
- It is optional but recommended. A Sitemap: line gives every crawler an absolute URL to your sitemap without you having to submit it in each search engine's tools. You can list more than one Sitemap line. Submitting the sitemap in Search Console as well does no harm.
- Does this generator send my configuration to a server?
- No. The file is assembled by JavaScript in your browser — nothing is uploaded, logged, or stored. You can build the file offline once the page has loaded and download it directly.
Related tools
All tools →-
Robots.txt Tester
Paste a robots.txt and a URL to see whether a crawler is allowed or blocked — with the exact rule that decides it.
-
Meta Tag Generator
Build title, description, Open Graph, and Twitter card tags with live Google, Facebook, and X previews.
-
Schema Markup Generator
Generate valid JSON-LD structured data for articles, products, reviews, local businesses, FAQs, and breadcrumbs — with a live preview.