When you have a website and you want it to be noticed by the huge search engines, then a Robots.txt file is a given in every site you have.
A robots.txt file tells the robots who scan or "crawl" your site where they can and where they cannot go. It is also the first thing a robot looks for when it gets to your site. If it does not find it, that's strike one for you.
There are basically two kinds of Robots.txt files you can have.
You can have one that looks like this:
Which means that the the robot is allowed to look in every directory in your website.
Or you have one that looks like this:
That kind blocks certain directories from being scanned and thusly having certain "priviliged material leaked to the Internet at large.
There is a third, less common type which involves the actual blocking of certain crawlers from your site. Such an example would be:
That simply means you do not want google to crawl your site at all. This is rarely used anymore as the robots that were an issue are long since discontinued.
For more information, you can go to..
http://www.searchengineworld.com/robots/robots_tutorial.htm - For the basics
http://www.searchengineworld.com/cgi-bin/robotcheck.cgi - To validate the file.
I hope you have enjoyed this and found it informative.
A robots.txt file tells the robots who scan or "crawl" your site where they can and where they cannot go. It is also the first thing a robot looks for when it gets to your site. If it does not find it, that's strike one for you.
There are basically two kinds of Robots.txt files you can have.
You can have one that looks like this:
| Code: |
|
User-agent: * Disallow: |
Which means that the the robot is allowed to look in every directory in your website.
Or you have one that looks like this:
| Code: |
|
User-agent: * Disallow: /cgi-bin/ |
That kind blocks certain directories from being scanned and thusly having certain "priviliged material leaked to the Internet at large.
There is a third, less common type which involves the actual blocking of certain crawlers from your site. Such an example would be:
| Code: |
|
User-agent: googlebot Disallow: * |
That simply means you do not want google to crawl your site at all. This is rarely used anymore as the robots that were an issue are long since discontinued.
For more information, you can go to..
http://www.searchengineworld.com/robots/robots_tutorial.htm - For the basics
http://www.searchengineworld.com/cgi-bin/robotcheck.cgi - To validate the file.
I hope you have enjoyed this and found it informative.
