What is Robots.txt? A Blogger-Friendly Guide
You’ve probably heard the term robots.txt tossed around in SEO chats, or you’ve seen it mentioned in a plugin setting and thought, “Am I supposed to touch that?” You are, at least enough to understand what it does.
Robots.txt is a simple text file that gives search engine bots instructions about what they should and shouldn’t crawl on your site. That matters because crawling is how Google discovers pages, images, and files to consider for search. But here’s the big catch: robots.txt doesn’t hide private content. It’s not a password and it’s not a vault, it’s a set of crawl requests.
In this guide, you’ll learn what robots.txt is, where it lives, what it can (and can’t) do, the rules bloggers actually use, common mistakes, and a quick checklist you can run before you hit save.
What is robots.txt, and what does it actually do?
So, What is Robots.txt in plain English? Think of it like a “sign on the door” for bots. It tells crawlers where they’re welcome to go on your site, and where you’d rather they don’t wander.
To make sense of it, you need two simple ideas:
- Crawl means a bot visits URLs and reads what’s there (HTML, images, CSS, JavaScript, PDFs, and so on).
- Index means the search engine decides to store that page in its database and potentially show it in search results.
Robots.txt is mostly about crawling, not indexing. If you block /thank-you/ in robots.txt, you’re saying, “Hey bot, please don’t fetch this page.” You’re not guaranteeing it won’t appear in search.
Another important point: robots.txt is a request, not a lock. Good bots usually listen (Googlebot, Bingbot), but it’s not a security feature. A bad actor doesn’t have to follow it.
Here are a few quick “feel it in your bones” examples:
- If you disallow
/wp-admin/, you’re asking bots not to waste time in your WordPress admin area. - If you disallow
/search/, you’re asking bots not to crawl internal search result pages that can create a ton of low-value URLs. - If you disallow
/, you’re telling bots not to crawl anything at all (the online version of locking your keys in the car).
Used well, robots.txt helps keep crawlers focused on your real content, like posts, category pages you care about, and key images. Used poorly, it can quietly kneecap your traffic.
Where it lives on your site and how bots find it
Robots.txt usually lives at the root of your domain, like https://yoursite.com/robots.txt. Bots check this file early, often before they crawl much else, because it tells them the ground rules.
A few details that trip bloggers up:
- Each subdomain has its own robots.txt. If you have
blog.yoursite.com, it needs its own file separate fromyoursite.com. - If the file is missing, bots usually crawl normally. No robots.txt does not mean “blocked,” it usually means “no special instructions.”
- The rules apply by path. Blocking
/folder/affects URLs under that folder, but not other areas of your site.
If you ever want a quick reality check, just type /robots.txt after your domain and see what loads.
What robots.txt can and cannot protect
Robots.txt is not a way to keep secrets. If you block a URL, that page can still show up in search results if other sites link to it (Google may show the URL without crawling the content). That’s why “blocked” doesn’t always mean “invisible.”
For bloggers, common “I want this hidden” situations are:
- Draft-like pages you published by accident
- Thank-you pages behind email signups
- Admin pages or plugin pages
- Staging sites that accidentally went public
If you truly want something not to appear in search, use tools designed for that job: password protection, proper permissions, or a noindex directive on the page. If you need a clear explanation of how noindex works, this guide helps: Understanding the noindex tag for SEO.
Robots.txt is best when you’re managing crawl focus, not trying to hide content.
The few rules you need to know (with blogger friendly examples)
Most robots.txt files are short, and yours can be too. The core directives you’ll see are:
User-agent: which bot the rule is forDisallow: what path you want bots to avoidAllow: what path is allowed (usually used to carve out exceptions)Sitemap: where your sitemap lives
You might also see pattern matching in some setups. At a high level:
*is a wildcard that can match “anything”$can mean “end of the URL”
You don’t need to memorize fancy patterns to get value from robots.txt. What you do need is caution, because one sloppy line can block parts of your blog you actually want found, like posts, category pages, images, or even the CSS/JS files that help Google render your pages correctly.
A practical mindset: robots.txt is a scalpel, not a paint roller. Block only what you mean to block, and double check the result.
User-agent, Disallow, and Allow, the basics that control crawling
User-agent is how you target rules. You can speak to all bots at once using User-agent: *, or you can write rules for specific bots (like Googlebot). Most bloggers stick with * unless there’s a clear reason not to.
A classic WordPress example is blocking the admin area while still allowing one needed file:
- Block:
/wp-admin/ - Allow:
/wp-admin/admin-ajax.php
That keeps crawlers out of your dashboard, but still lets them access the Ajax endpoint used by some themes and features.
Other blogger-friendly examples you might choose:
- Block internal search results:
/ ?s=or/search/(depends on your setup) - Block login pages:
/wp-login.php - Block utility pages you don’t want crawled:
/cart/,/checkout/(more common on stores)
One warning deserves its own spotlight: never disallow / unless you mean “crawl nothing.” That single character can stop search engines from crawling your entire site, and that can lead to pages dropping out of search over time.
Also, be careful blocking tag pages or category pages. Sometimes those pages help discovery and internal linking. Don’t block them just because someone said “archives are bad.” Make the choice based on your site.
Sitemap lines and why they make discovery easier
A sitemap is a file that lists the URLs you want search engines to find. Adding a Sitemap: line in robots.txt is like putting a map next to your front door.
It won’t force rankings, but it helps crawlers discover your important pages faster, especially when you publish new posts regularly.
If you want a simple breakdown of sitemaps, this is worth reading: Comprehensive sitemap guide for bloggers.
In practice, you add a line that points to your sitemap URL (many sites use /sitemap.xml or WordPress’s default /wp-sitemap.xml). The key is making sure the sitemap actually loads, and that it includes the pages you care about.
How to create, test, and fix robots.txt without hurting traffic
Robots.txt edits are small, but the impact can be big, so use a boring, repeatable workflow:
- Find your current robots.txt by visiting
/robots.txt. - Copy it into a backup (even a simple text file on your computer is fine).
- Edit carefully (one change at a time, if possible).
- Test and monitor in your analytics and search tools after you publish.
If you notice weird SEO symptoms after an edit, robots.txt is a prime suspect. Common signs include:
- Important pages stop getting crawled
- Posts slowly drop impressions and clicks
- Images stop appearing in image search
- Google reports “Blocked by robots.txt” in indexing tools
If you need to check whether Google can crawl and index a specific URL (especially after a fix), this walkthrough is handy: Submit a URL to Google Search Console.
A quick checklist before you hit save
- Confirm you’re not blocking posts or pages you want to rank.
- Confirm CSS/JS and images are crawlable (don’t block theme or uploads folders unless you have a strong reason).
- Block only what you mean to block (avoid broad rules that catch too much).
- Add a sitemap line if it’s missing.
- Keep staging sites blocked (and ideally password protected too).
- Recheck robots.txt after a theme or plugin change, because settings can overwrite it.
Conclusion
Robots.txt is your site’s way of telling crawlers, “Start here, skip that.” Used well, it helps search engines spend time on the pages that matter most. Used carelessly, it can block key content and quietly chip away at your traffic. And no matter how tempting it is, robots.txt is not a security tool.
Your next step is simple: check your current robots.txt today, confirm you aren’t blocking important blog content, and add a sitemap line if it’s missing. If you find anything confusing in there, that’s your cue to simplify, not to add more rules.
Does robots.txt stop a page from showing in Google?
No. Robots.txt tells search bots not to crawl a page, but it does not promise that the page will stay out of Google. A blocked URL can still appear in search if other pages link to it.
If you want a page kept out of search results, use a noindex tag on the page or protect it with a password. That is a better choice for thank-you pages, test pages, or content you published by mistake.
If you want to learn the difference, read this simple guide to the noindex tag. It explains why blocking crawl and blocking index are not the same thing.
What should bloggers block in robots.txt?
Bloggers should usually block low-value areas, not their main content. Good examples are /wp-admin/, login pages, internal search results, and store pages like /cart/ or /checkout/.
Be careful with category pages, tag pages, images, CSS, and JavaScript files. Those can help Google find and understand your content, so blocking them can hurt traffic.
A good robots.txt file is short and focused. It also helps to include a sitemap, and this blogger-friendly sitemap guide shows why that matters.
Can one bad robots.txt rule hurt my whole site?
Yes. One bad rule can block important parts of your site, and Disallow: / can stop bots from crawling everything. Over time, that can lead to fewer indexed pages and less search traffic.
Broad rules can also block images, scripts, or folders your site needs to work well in search. If Google cannot fetch those files, it may not see your pages the right way.
Before you save changes, back up your current file and test one edit at a time. If rankings or traffic drop, use this guide to check a URL in Google Search Console to confirm Google can crawl the page again.
Where is robots.txt, and how do I test it safely?
Robots.txt usually lives at the root of your domain, like yoursite.com/robots.txt. Each subdomain needs its own file, so blog.example.com and example.com do not share the same one.
Start by opening your current file in a browser and saving a copy before you edit it. Then make one small change, publish it, and watch your crawl and traffic data for the next few days.
If your file does not include a sitemap line, add one that points to the right sitemap URL. That small step can help search engines discover new posts faster, especially if you publish often.
How can RightBlogger help when I update robots.txt?
RightBlogger can help you spot SEO issues around a robots.txt change. With SEO Reports that highlight weak pages and optimization gaps, you can review important URLs faster and catch problems early.
This is useful after you unblock a folder, add a sitemap, or fix pages that were hidden from crawlers. Instead of guessing what to improve next, you get a clearer view of what needs attention.
RightBlogger is most helpful after the technical fix is done. A clean robots.txt and stronger content work together, which gives your site a better chance to be crawled, indexed, and ranked well.
New:Autoblogging + Scheduling
Automated SEO Blog Posts That Work
Try RightBlogger for free, we know you'll love it.
- Automated Content
- Blog Posts in One Click
- Unlimited Usage




