FRIHOSTFORUMSSEARCHFAQTOSBLOGSCOMPETITIONS
You are invited to Log in or Register a free Frihost Account!


Robots.txt.... how important is it?





Phil
I've been trying to do some SEO research and the term "robots.txt" and "indexing" keep coming up.
How important is the robots.txt file when it comes to search engines? Is there something else that should be applied to get search engines to index a site?
Also, Do search engines see the whole site, or just the index.htm page?
cambridge
when SE gets your domain, it starts from default one (index), then tries to crawl internal links. google says their 'site maps' feature helps their robot properly index your site (google page map is xml file generated from free google tool you can find at: https://www.google.com/webmasters/sitemaps/docs/en/about.html

more info about google robot is: http://www.google.pl/webmasters/bot.html

robot.txt can help all specific robots (different ones can work differently) to index only specific part of your site (you can exclude folders and/or files for indexing)
shabda
Robots.txt is used to stop spiders from crawling parts of your site. So iof you donot have any non-crawlable stuff, then robots.txt is pretty useless.
lockwolf
robots.txt really is needed if your site has some sort of login process or an administrative section.

I'm going to take my site for example, I'm using php-nuke and the main directory has the following files (not listing all)

Folders
admin
blocks
db
images
includes
(and a few others)

Files
admin.php
config.php
index.php
modules.php
mainfile.php

I odviously dont want the search engines putting the folders admin, db and include and the files admin.php, config.php and mainfile.php because these are key files.

I would first start by putting this at the top of the page to ignore all robots/spiders.

Code:
User-agent: *


Then I would use the following code for areas that should not be searched

Code:
Disallow: (file to not scan)


So my robots.txt would look something like this



Code:
User-agent: *


Disallow: /admin/
Disallow: /db/
Disallow: /includes/
Disallow: admin.php
Disallow: config.php
Disallow: mainfile.php


Thats all I really have to say about robots.txt[/b]
Related topics
Robots.txt
robots.txt
Robots.txt
Flash Site or No?!
get to top in SEO ranking
Brett Tabke's robot.txt blog
about robots.txt
How to prepare a "robots.txt" to get crawled by Se
Google released robots.txt generator
Please check this robots.txt file its getting complex for me
Evitar conteúdo duplicado com robots.txt
Question about robots.txt
Robots.txt
Robots.txt and Meta Robots
Reply to topic    Frihost Forum Index -> Webmaster and Internet -> SEO and Search Engines

FRIHOST HOME | FAQ | TOS | ABOUT US | CONTACT US | SITE MAP
© 2005-2011 Frihost, forums powered by phpBB.