How To Block Those Search Engine Robots
Copyright © by Gauher Chaudhry, All Rights Reserved.

How do you stop the major search engines from indexing web pages
on your site that you don't want made available to the public?

One easy method is to create a robots.txt file that resides on
your web server in the root directory.

It is actually quite easy to do and all you need is a text
editor.

The basic syntax of a robots.txt file looks like the following:

User-Agent: [Spider Name]
Disallow: [File Name]

For example, Google's spider is named googlebot.  So if you
didn't want googlebot to index your thankyou.html file, your
robots.txt file would look like this:

User-Agent: googlebot
Disallow: /thankyou.html

If you want to prevent all robots from spidering the file named
thankyou.html, you can use the "*" which is the wildcard
character in the User-Agent line.  For example it would be
written like this:

User-Agent: *
Disallow: /thankyou.html

You may also specify directories:

Disallow: /cgi-bin/

This one bans googlebot from all files on the server:

User-agent: googlebot
Disallow: /

Unfortunately, you cannot use the wildcard character for a file
in the "Disallow" statement.

The robots.txt file is also useful if you are creating
multiple web pages to be indexed for a particular search
engine (i.e. Google, Lycos, etc.), you could be penalized if the
searchbot indexes all the pages.

These multiple web pages tend to be similar and the major
search engines have the ability to detect when a site is doing
this.

The searchbot might label your web site as spam and you could be
permanently banned from that search engine.

By using a robots.txt file, you can tell googlebot to avoid
indexing a web page that you created especially for Lycos.com.

When you put the above lines in the robots.txt file, you
instruct each search engine not to spider the files meant for
the other search engines.

For more information on robots.txt files and more complicated
examples, I suggest going to:

http://www.searchengineworld.com/robots/robots_tutorial.htm