Robots Exclusion standard

Robots Exclusion standard, a forum discussion on RagePank SEO. Join us for more discussions on Robots Exclusion standard on our General SEO forum.

Back to Forum Index : Back to General SEO   RSS
admin

15 Feb 2006
Posts: 10

The robots.txt file is a simple text file placed in the root of your website, which search engine bots use to determine which parts of your website to visit, and which parts to ignore.

It can also exclude certain search engine bots from accessing the site, although the standard does rely on the bot following the standard.

Most webmasters use robots.txt to prevent bots from indexing the admin or private sections of their website. Care must be taken to get the format correct, or you can end up telling the bots not to index your entire website. Surely this has been the cause of some poorly ranking websites in the past.

typical usage...
User-agent: *
Disallow: /admin/

The White House

Surely the most interesting robots.txt file has to be that of The White House. Their robots.txt file is very comprehensive, excluding spider access to a range of topics including 9/11, the first lady and their news releases. Some speculate that the purpose of this is to prevent Search Engines from caching their content, so they can change it when they like without anyone noticing.

There are much better resources available on the topic. The Wikipedia is a good place to start.
User Profile
Back to Forum Index : Back to General SEO   RSS
You must be logged in to post a reply


You are not logged in
You need to Register or Log In before posting on these forums.

Debug Mode currently enabled.
This has an impact on performance and should be turned off before the site is made live.