WordPress is now one of the most popular cms solution out there majority of blogs and website out there uses it. It is particularly a very seo-friendly platform which you can use to achieve big result any time any day, here is a quick tips on how you can easily add better robots.txt configuration / settings to your wp-powered cms website or blog.
Adding robots.txt settings to a wordpress blog / website
This is one of the most ignored part by majority of wp users who just placed their hope on the fact that wordpress is a near-perfect cms solution that deals with search engine friendly metrics and as such do not need any form of SEO tweaks to serve better, This was the kinda mentality I nurtured initially when I started out on the wp platform till I stumbled on the robots.txt settings of the old mashable.com which elevated my personal curiosity for a perfect wordpress robots.txt settings. After searching till now 2013 I realized there is no such thing as perfect robots.txt configuration for any website at all irrespective of the platform it is built on since each website needs be configured individually based on how we want them indexed by search engines. Adding a working robots.txt to any blog / website including those on the wordpress platform will greatly improve the site’s performance on any popular search engine out there since you will be instructing the search engines on how to index the blog / website, If you ask for my opinion on this I will authoritatively explain this as one of those wordpress seo performance tweaks out there you wouldn’t want to miss.
How to add robots.txt settings / configuration to a wordpress blog / website :
This is dead easy for any one to implement doesn’t matter if you are a newbie wordpress user or pr0-wp user all you need do is this;
- open a blank notepad
- copy and paste this simple robots.txt configuration to it bearing in mind that each site must be configured individually based on how you want them indexed in search engines
here is the text configs to be copied for a wordpress blog
User-agent: IRLbot Crawl-delay: 3600 User-agent: Alexibot Disallow: / User-agent: Aqua_Products Disallow: / User-agent: asterias Disallow: / User-agent: b2w/0.1 Disallow: / User-agent: BackDoorBot/1.0 Disallow: / User-agent: BlowFish/1.0 Disallow: / User-agent: Bookmark search tool Disallow: / User-agent: BotALot Disallow: / User-agent: BotRightHere Disallow: / User-agent: BuiltBotTough Disallow: / User-agent: Bullseye/1.0 Disallow: / User-agent: BunnySlippers Disallow: / User-agent: CheeseBot Disallow: / User-agent: CherryPicker Disallow: / User-agent: CherryPickerElite/1.0 Disallow: / User-agent: CherryPickerSE/1.0 Disallow: / User-agent: Copernic Disallow: / User-agent: CopyRightCheck Disallow: / User-agent: cosmos Disallow: / User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 Disallow: / User-agent: Crescent Disallow: / User-agent: DittoSpyder Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: EmailWolf Disallow: / User-agent: EroCrawler Disallow: / User-agent: ExtractorPro Disallow: / User-agent: FairAd Client Disallow: / User-agent: Flaming AttackBot Disallow: / User-agent: Foobot Disallow: / User-agent: Gaisbot Disallow: / User-agent: GetRight/4.2 Disallow: / User-agent: Harvest/1.5 Disallow: / User-agent: hloader Disallow: / User-agent: httplib Disallow: / User-agent: HTTrack 3.0 Disallow: / User-agent: humanlinks Disallow: / User-agent: InfoNaviRobot Disallow: / User-agent: Iron33/1.0.2 Disallow: / User-agent: JennyBot Disallow: / User-agent: Kenjin Spider Disallow: / User-agent: Keyword Density/0.9 Disallow: / User-agent: larbin Disallow: / User-agent: LexiBot Disallow: / User-agent: libWeb/clsHTTP Disallow: / User-agent: LinkextractorPro Disallow: / User-agent: LinkScan/8.1a Unix Disallow: / User-agent: LinkWalker Disallow: / User-agent: LNSpiderguy Disallow: / User-agent: lwp-trivial/1.34 Disallow: / User-agent: lwp-trivial Disallow: / User-agent: Mata Hari Disallow: / User-agent: Microsoft URL Control - 5.01.4511 Disallow: / User-agent: Microsoft URL Control - 6.00.8169 Disallow: / User-agent: Microsoft URL Control Disallow: / User-agent: MIIxpc/4.2 Disallow: / User-agent: MIIxpc Disallow: / User-agent: Mister PiX Disallow: / User-agent: moget/2.1 Disallow: / User-agent: moget Disallow: / User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) Disallow: / User-agent: MSIECrawler Disallow: / User-agent: NetAnts Disallow: / User-agent: NICErsPRO Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Openbot Disallow: / User-agent: Openfind data gatherer Disallow: / User-agent: Openfind Disallow: / User-agent: Oracle Ultra Search Disallow: / User-agent: PerMan Disallow: / User-agent: ProPowerBot/2.14 Disallow: / User-agent: ProWebWalker Disallow: / User-agent: psbot Disallow: / User-agent: Python-urllib Disallow: / User-agent: QueryN Metasearch Disallow: / User-agent: Radiation Retriever 1.1 Disallow: / User-agent: RepoMonkey Bait & Tackle/v1.01 Disallow: / User-agent: RepoMonkey Disallow: / User-agent: RMA Disallow: / User-agent: searchpreview Disallow: / User-agent: SiteSnagger Disallow: / User-agent: SpankBot Disallow: / User-agent: spanner Disallow: / User-agent: suzuran Disallow: / User-agent: Szukacz/1.4 Disallow: / User-agent: Teleport Disallow: / User-agent: TeleportPro Disallow: / User-agent: Telesoft Disallow: / User-agent: The Intraformant Disallow: / User-agent: TheNomad Disallow: / User-agent: TightTwatBot Disallow: / User-agent: toCrawl/UrlDispatcher Disallow: / User-agent: True_Robot/1.0 Disallow: / User-agent: True_Robot Disallow: / User-agent: turingos Disallow: / User-agent: TurnitinBot/1.5 Disallow: / User-agent: TurnitinBot Disallow: / User-agent: URL Control Disallow: / User-agent: URL_Spider_Pro Disallow: / User-agent: URLy Warning Disallow: / User-agent: VCI WebViewer VCI WebViewer Win32 Disallow: / User-agent: VCI Disallow: / User-agent: Web Image Collector Disallow: / User-agent: WebAuto Disallow: / User-agent: WebBandit/3.50 Disallow: / User-agent: WebBandit Disallow: / User-agent: WebCapture 2.0 Disallow: / User-agent: WebCopier v.2.2 Disallow: / User-agent: WebCopier v3.2a Disallow: / User-agent: WebCopier Disallow: / User-agent: WebEnhancer Disallow: / User-agent: WebSauger Disallow: / User-agent: Website Quester Disallow: / User-agent: Webster Pro Disallow: / User-agent: WebStripper Disallow: / User-agent: WebZip/4.0 Disallow: / User-agent: WebZIP/4.21 Disallow: / User-agent: WebZIP/5.0 Disallow: / User-agent: WebZip Disallow: / User-agent: Wget/1.5.3 Disallow: / User-agent: Wget/1.6 Disallow: / User-agent: Wget Disallow: / User-agent: wget Disallow: / User-agent: WWW-Collector-E Disallow: / User-agent: Xenu's Link Sleuth 1.1c Disallow: / User-agent: Xenu's Disallow: / User-agent: Zeus 32297 Webster Pro V2.9 Win32 Disallow: / User-agent: Zeus Link Scout Disallow: / User-agent: Zeus Disallow: / User-agent: Adsbot-Google Disallow: User-agent: Googlebot Disallow: User-agent: Mediapartners-Google Disallow: User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-content/cache/ Disallow: /wp-content/themes/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /feed/ Disallow: /comments/ Disallow: /author/ Disallow: /archives/ Disallow: /20* Disallow: /trackback/ Sitemap: http://myblogurl.com/index.xml
enter your mobile site url [ User-agent: Googlebot-Mobile
Allow: /] after the last / here if it is different from your parent blog url . eg [ User-agent: Googlebot-Mobile
Allow: /] changed to [ User-agent: Googlebot-Mobile
Allow: /?mobile] using mine as example .
you can remove the lines where you have;
Disallow: /archives/
Disallow: /author/
Disallow: /feed/
Disallow: /20*
Disallow: /images*
If you want them indexed by search engines, they were added in a bid to prevent duplicate contents in search engine results . For others who want only their post topics and contents indexed by search engines you can add other lines in your robots.txt to prevent it eg.
Disallow: /tag/
Disallow: /category/
Disallow: /next/
For others whose server load is suffering and so wishes to reduce the rate or frequency which search engine visits their blog, you can tweak this lines
User-agent: IRLbot
Crawl-delay: 3600
with any figure of your choice in seconds. Note that 3600 is equal to 1hr, so you can simply use this format to either reduce the frequency or increase it to days, weeks months etc. depending on how frequent your blog is updated.
here is the text configs to be copied for a wordpress cms powered website :
User-agent: IRLbot Crawl-delay: 3600 User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-content/themes/ Disallow: /wp-content/cache/ Disallow: /wp-login.php Disallow: /wp-register.php Sitemap: http://mywbsiteurl.com/index.xml
tweak accordingly following the previous mentions above. Also note that you can actually upload your images in two folder for sites that has images that are private and the ones that should be indexed adding the folder that contains the private images to your disallowed list.
- Save your notepad content as robots.txt
- log in to your control panel and locate the file manager ==> upload the notepad content to your root directory where you have the wordpress installed
- close and enjoy your new active robots.txt settings
Please remember to replace Sitemap: http://myblogurl.com/index.xml with your blog / website current sitemap link.
If you do not have access to your cpanel and wishes to add the robots.txt settings to your wordpress cms powered blog or website using a wordpress plugin, you can easily do so by visiting the wordpress plugin repository for WP Robots Txt that gives you the super privilege to add and edit your wp robots.txt configuration from your wordpress admin dashboard.
Also see:
- Why skip Installing this 7 most important wordpress Plugins? and
- How to add Numeric Pagination to Mobilepress Default Theme
The robot.txt and word press, both are well known elements in the world of online business. By using these source a web based business become easy to control and also get desired task from business promotion.
Nice tutorial, I think am gonna tweak mine as well so I can curb all these bots visiting my site. Thumbs up! Man
wow Obasi! your knowledge is incredible! You know so much about coding! I hope to understand coding more in the future to help with my blogs:))
You really know your coding … did you go to school for this or are you self taught?
Thanks for dropping by Cheryl, you really did found this very wowing! but this is nothing much but a little research on malicious bots that are known to cause harm on any website coupled with the regular settings on wordpress official codex. Thanks for the compliments though
Oh wow … this content is so above where I am at, will have to come back to this when it sounds less Latin-like :)
Now Kasey, your last lines got me really cracking
Wow such great advice, thanks so much for sharing this knowledge. I”m off to update my blogs right now! I have a question: is WordPress a generally good program to use? I am using it and find it very good, but I have also heard that it is more difficult to place paid advertising on it.
Thanks for dropping by Emily, This part [ it is more difficult to place paid advertising on it ] is most funniest piece I have read in a very long time frankly speaking I will pledge to the fact that wp is the best platform for accepting and placing private advertisement. Premium Plugins like ads-rotator even makes that pretty easier
that’s okay Don, hope you replaced Sitemap: http://myblogurl.com/index.xml with your own sitemap url ?
yea…. this is a great tutorial dude …. i just changed mine to this….. am loving it in here
How lovely would it be if it can work on blogger platform… Love this !!!
Thanks for dropping by bro, it can actually work on blogger with just a little more tweak. But the question is “why bother when master google is taking care of it for you” ?
Nice one bro, but i dont know most of those bots. This is great, i think am going to use it.
Yeah there are many other wicked bots out there a hacker friend of mine actually gave me some tools with most of this bots including the “User-agent: HTTrack 3.0” that downloads your website files without your knowledge