FRIHOSTFORUMSSEARCHFAQTOSBLOGSCOMPETITIONS
You are invited to Log in or Register a free Frihost Account!


Robots.txt





sameerseo
Do you know anything about robots.txt?
Peterssidan
Yes.

Is that all you want to know?
busaboss
Robots.txt is usually found in your website. It manages the access of your site of whom to block and whom to allow. It also has other functions which you can learn by reading this article from wikipedia. http://en.wikipedia.org/wiki/Robots_exclusion_standard
cfvergara
Yes indeed. What's it you need? It's basically a txt file that spiders who obey it can use as guidance to index or not different parts of your website
netcommlabs
Robots txt is file where one can restrict web page,directories,images for indexing by search crawlers.
sysna
actually i find out that robot.txt is a lot useful when you don't want some certain pages to be indexed in google. before using it i was always facing some unrelated staffs in my search about a certain topic in my website domain,
seoprovideproxy
Hi

Robots.txt is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all.
Further you can read from:-http://www.javascriptkit.com/howto/robots.shtml
roseberryjai
Hi

Robots.txt is the one way of onpage technique.
Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines such as Google use them to index the web content.
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
evilryu530
it's a text file that you put with your web files, to "hide" any pages you don't want the search bots to crawl. Google doesn't recommend you use robot.txt files anymore. if you need to hide pages, do the noref syntax
imagefree
Pretty much explained, but I want to add a few more words.

robots.txt is file that is placed at the root of your web server. For example, if you have a site example.com, then robots.txt must be located at http://example.com/robots.txt

robots.txt is the first file requested by a web crawler (like google search engine crawler, google adsense crawler, and thousands more). robots.txt further tells the crawlers what contents in your site should be accessed, and some good crawlers honor it.

robots.txt does not limit capabilities of a crawler, it instead tells the crawler the limit its capabilities (for example: a specific crawler or all crawlers can be instructed to not to visit a specific page, or avoid anything inside a particular directory etc).


robots.txt should not be used to prevent crawlers from accessing secret information. Excluding some well known crawlers, mostly crawlers do not obey robots.txt, so, alternative means should be used to restrict access to sensitive resources.



robots.txt can reveal secret information to malicious users. For example, you may tell crawlers not to visit /dir/secret/ but this tells there is something secret in it. Such definitions in robots.txt can catch the attention of hackers.


Ideally, robots.txt should be used to only guide the behaviour of web crawlers towards public contents only: the contents that anyone can see, but would not be much useful to crawlers, so crawlers should not see it. For example, for Adsense crawler, images are not meaningful, so allowing adsense read image files again and again would be just a waste of bandwidth.

Hope this helps you.
imagefree
This document details how Google handles the robots.txt file that allows you to control how Google's website crawlers crawl and index publicly accessible websites.

https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
darthrevan
sysna wrote:
actually i find out that robot.txt is a lot useful when you don't want some certain pages to be indexed in google. before using it i was always facing some unrelated staffs in my search about a certain topic in my website domain,


But also if you don't want a person to know that a file or directory is there, it would be best not to list it on your robots.txt file and to protect it from the search engine's 'eye'; it would be best to password protect it or possibly encrypt it.
Dialogist
It's best to not have one, in my experience (your mileage may differ). If one is not found, the desired result is still achieved (only quicker) and those conditions that you faithfully define to stop the bandwidth-devouring Asian and Russian spiders don't seem to have any verifiable effect anyway. Most them have been designed to be malicious anyway, so you expecting them to obey your robots.txt was perhaps a little over-confident before you even went to the trouble. I happen to know some of those Baidu bots blatantly disregard your conditions anyway. You can (and eventually have to) get them with the .htaccess anyway so the second step kind of renders the robots.txt pretty much redundant. The good bots don't require it and the bad ones laugh in its face.
apolice9
sameerseo wrote:
Do you know anything about robots.txt?



Robots.txt is a text (not html) file you put on your root directory to tell search robots which files to ignore (or alternatively) which files to crawl. It also helps Search Engines to locate the Sitemap of the website and hence crawl the entire website in depth... helping in your rankings and traffic. The correct format will depend on what you want to hide from the bots.
rjraaz
So, How it is different from .htaccess . Any idea Question
Peterssidan
rjraaz wrote:
So, How it is different from .htaccess . Any idea Question

They are not at all similar. robots.txt is just a text file that bots are supposed to read and respect. .htaccess is a server side configuration file that is handled by the server. The visitors will never be able to see the content of .htaccess.
martinsherman
The robots.txt file is a text file that tells search engine crawlers which portions of your website they should NOT index. If you don't want to restrict search engine crawlers, you should simply create an empty robots.txt file (e.g., touch robots.txt) or one that looks like this:

User-agent: *
Disallow:

Once you have created a robots.txt file, you store it in the root directory of your Web server.

Hope this help you!! Smile
aizelcaroline
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit.
iliyasshaikh
Robots.txt is usually found in your website. It manages the access of your site of whom to block and whom to allow.

robots text one of the best tool in search engine whom allow and dont allow

reduce crawing error

suppos in blogspot

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Allow: /

Sitemap: http://www.shayrikiduniya.com/feeds/posts/default?orderby=UPDATED


so
disallow/search
this is crwiling and seqarch engine reduce errir of search engline

and allow sitemap

this all post allow in google search engine
dolfinchris
Robots.txt is a text file you put on your site to tell search robots which pages you would like them not to visit.
tamilparks
i also learned a lot for the above thanks
zimmer
busaboss wrote:
Robots.txt is usually found in your website. It manages the access of your site of whom to block and whom to allow. It also has other functions which you can learn by reading this article from wikipedia. http://en.wikipedia.org/wiki/Robots_exclusion_standard


Yes, i agree... it is correct. If i may add it usually used by PHP website application to view your site on the browsers.
Arrogant
Cool stuff robots.txt
How does that work?
tamilparks
great i have also learned about robots.txt thanks for the sharing
Related topics
Robots.txt
robots.txt
Robots.txt.... how important is it?
Robots.txt
get to top in SEO ranking
about robots.txt
How to prepare a "robots.txt" to get crawled by Se
Google released robots.txt generator
Please check this robots.txt file its getting complex for me
Evitar conteúdo duplicado com robots.txt
Question about robots.txt
Robots.txt and Meta Robots
Brett Tabke's robot.txt blog
Sitemaps, Robots and Pageranking
Reply to topic    Frihost Forum Index -> Webmaster and Internet -> SEO and Search Engines

FRIHOST HOME | FAQ | TOS | ABOUT US | CONTACT US | SITE MAP
© 2005-2011 Frihost, forums powered by phpBB.