FRIHOSTFORUMSSEARCHFAQTOSBLOGSCOMPETITIONS
You are invited to Log in or Register a free Frihost Account!


Googlebot problems on server 2?





Peterssidan
I think that maybe googlebot has been banned from server 2. In that case it is not all the ip:s that googlebot used because it has been crawling some pages later.

In Google Webmaster Tools they have reported many pages under “Unreachable URLs”. With the details “robots.txt unreachable” on most of them. Some pages has fallen out of the index and it is pages that has the most links. New pages has not been crawled.
I started a thread before at Web Hosting Support: http://www.frihost.com/forums/vt-83140.html (read it for more details)
Bondings doesn't think that it is the problem and I can be wrong.

So is there any of you that have noticed that googlebot has problems to crawl your sites on server 2? To me it looks like many it started after the 9th October.

If you think that there is something else that cause the problems please tell me.
takashiro
really? BUT Googlebot just visited my website several days ago. My site is on server 2, too. Very Happy Check your robot.txt to make sure that the webpages are allowed to cache.
Peterssidan
I have removed the robots.txt file to see if it helps. I don't know yet if it helps.
Googlebot are from several ip:s so it can be that it's only some of them that got banned. What I have seen in my log files it looks like googlebot gets the page ok with 200 status and I don't know if it should show something different if it's banned.

There is a option in Google webmaster tools to set the crawl rate and my thought that if people set the crawl rate to fast it might crawl as fast as it looks like a attack or something bad.

I think I have to wait and see what happens.
GSIS
Peterssidan wrote:
I have removed the robots.txt file to see if it helps. I don't know yet if it helps.
Googlebot are from several ip:s so it can be that it's only some of them that got banned. What I have seen in my log files it looks like googlebot gets the page ok with 200 status and I don't know if it should show something different if it's banned.

There is a option in Google webmaster tools to set the crawl rate and my thought that if people set the crawl rate to fast it might crawl as fast as it looks like a attack or something bad.

I think I have to wait and see what happens.


I'm quite sure the Frihost people know how to recognize Googlebot and wouldn't ban it. It is possible that Google might have taken a dim view of Frihost, for whatever reason, and decided not to crawl frih.net sites either very often or at all.

Check out iwebtool.com

There's a tool for checking most, if not all, of the Google datacentres (Google Datacentre Search). If you enter site:whateveryoursiteis.com it'll show you what pages each of the datacentres has indexed. It's surprising how it can differ from one centre to the others.
the-guide
Hi Peterssidan

I think you should not too much worry about Google webmaster tool reports, they alway report all of troubles (even bit) they found when they crawled, as GSIS informed, you can enter site:whateveryoursiteis.com and it'll show your pages indexed.
Peterssidan
the-guide wrote:
you can enter site:whateveryoursiteis.com and it'll show your pages indexed.

I know how to use the "site:" command in google. Twisted Evil It still shows the most of the pages but some of them has disappeared.

Today my Verification status is NOT VERIFIED on one site. I know the file is there and when I try to validate it says:
Quote:
Verification status: NOT VERIFIED
Last attempt Nov 5, 2007: We were unable to verify your file due to a server timeout.


I can't see any other reason than it is the server but if there is no one else that have problem it can't be server. I guess everyone will realize that they also have problems but not yet.
SlowWalkere
I've noticed a similar problem.

According to Google Webmaster Tools, my index hasn't been updated since Oct. 9. I've got an "Unreachable URLs" error for "robots.txt."

I've also determined that the site hasn't been crawled/cached at all since that date by Google. If I search for a recent article in Google, I only find secondary references to it - like at blogcatalog and zimbio. The actual articles on my site haven't been indexed.

I figured Google got mad because I stopped updating for a couple weeks. Either that or Google had some bad luck, tried to access the site during one of the recent outages on server 2, and just left.

The latter makes sense to me. Googlebot would try to access robots.txt first. If that is inaccessible, I'd think the bot would just leave. So if the server is down when the bot tries to show up, it'll try to access robots.txt and leave.

Just some thoughts...
- Walkere
GSIS
SlowWalkere wrote:
I've noticed a similar problem.

The latter makes sense to me. Googlebot would try to access robots.txt first. If that is inaccessible, I'd think the bot would just leave. So if the server is down when the bot tries to show up, it'll try to access robots.txt and leave.

- Walkere


If there is no robots.txt Googlebot continues crawling the site.

I suspect what is most likely is the simplest explanation. Frequent performance problems cause Google to take sites less seriously and try to index them less frequently. Eventually Googlebot stops trying to crawl unreliable sites altogether. Google aims to provide relevant content and does not want to tell its users about unreliable sites. Also, Google does not want its bots and servers wasting time and money trying to crawl pages that are very likely to never get served.

I moved my site from Server 2 (with a frih.net domain name) to another host (with a co.uk domain name) and very few other changes. I see Googlebot crawling the site almost continuously. Neither version of the site had/has a robots.txt file.
the-guide
Peterssidan wrote:

I can't see any other reason than it is the server but if there is no one else that have problem it can't be server. I guess everyone will realize that they also have problems but not yet.


Ummm... I have no doubt about that before, may be I need to check my site as you've done too, thanks a lot for your information Peterssidan Rolling Eyes
Peterssidan
SlowWalkere wrote:
I've noticed a similar problem.

According to Google Webmaster Tools, my index hasn't been updated since Oct. 9. I've got an "Unreachable URLs" error for "robots.txt."

I've also determined that the site hasn't been crawled/cached at all since that date by Google. If I search for a recent article in Google, I only find secondary references to it - like at blogcatalog and zimbio. The actual articles on my site haven't been indexed.

Finally there is more than me that has this problem. the date is what I have seen too and what it looks like to be when looking on the cache on many of the frih sites.

GSIS wrote:
If there is no robots.txt Googlebot continues crawling the site.

But it looks like googlebot only got a timeout error and not a 404 error so it can't know if it exist or not.

GSIS wrote:
I suspect what is most likely is the simplest explanation. Frequent performance problems cause Google to take sites less seriously and try to index them less frequently. Eventually Googlebot stops trying to crawl unreliable sites altogether. Google aims to provide relevant content and does not want to tell its users about unreliable sites. Also, Google does not want its bots and servers wasting time and money trying to crawl pages that are very likely to never get served.

But why have googlebot tried to reach the robots.txt file several times if it doesn't want to crawl the site?

For me it's proved that googlebot has problem to crawl server 2. It doesn't look like it's google that have banned the server. It looks like it the server has something against googlebot. Brick wall
Peterssidan
Some time has now passed so it's time to give a fresh update.
The problem is not solved yet. To remove the robots.txt file has no effect at all.

I moved one of the sites to another host and now everything works very good for that site. There is no new errors in the google webmaster tools and I can see that the cache is new. It's back to normal as it was before this problem came.

The other site is still here on frihost(server 2) and I have seen no improvements. Still new errors in webmaster tools and no new pages has been crawled.

In webmaster tools you can see the "Googlebot activity in the last 90 days" under "Set crawl rate". I know it is not a very accurate tool but it gives a good view.
Here is the statistics for the site that is still on server 2:

As you can see there is almost no crawl at the end.

Here is the statistics for the site that is moved to another server:

As you can see there is more activity at the end. I think it is because I moved it to another host.

What should I say more? I wish the problem to be solved that's all.
the-guide
Hi Peterssidan,

After rechecked my site's stat precisely I've found the same problem like you, it's seem that this trouble start on October as you shown. Not only Google but the same problem has been found on Technorati too, all other of my blog have been updated but only my blog hosted on frih.net not. Have anyone else here found the same trouble?
asim
Hello buddy,

im going good with google bot and yahoo bot..... maybe you should post problem in server 2 section would help.
Peterssidan
asim wrote:
im going good with google bot and yahoo bot..... maybe you should post problem in server 2 section would help.

I don't think there is any server 2 section. I have posted a thread about this in Web Hosting Support that is more for the big guys. The reason you have no problems is that you are not on server 2.
caio
Hello, excuse by the translation

Yes, obviously there is a problem in the Server 2 is not a problem robots.txt but I think it is. Htaccess server, maybe it is a mistake unwanted thus ask you settle.
I am not going to put images that already have been placed in the low frequency tracking Google, I will just give you an example:

This is what gives me the editor Amaya when I want to see a site on server2

Quote:
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, webmaster@puntoargento.com.ar and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.


Nor opens:
Http://www.frih.net/

But it is perfectly server 1, ie Frihost
Http://www.frihost.com/

Obviously we have a proble that I hope it resolved but we finish without sa pages in the index of google.
Peterssidan
Googlebot have started to crawl the site again. The problem seem to be over. Very Happy

The problem caio are talking about must be the recent problems with the server.
caio
Peterssidan wrote:
Googlebot have started to crawl the site again. The problem seem to be over. Very Happy

The problem caio are talking about must be the recent problems with the server.


If, pado to php5 has brought its problems, but has me very worried that I will not let see the web with Amaya, because as I said I just pull error with the server2, with the one who is frihost I open it perfectly, just we have no waiting. Rolling Eyes
the-guide
That a good news for me all pages of my site are came back again after they have been de-indexed for two months.

Thanks.
Related topics
Checking Server Status, Ping a Server (lots of purposes)
server doent's work well
frih.net server problems?
Problems with the server
Server 2 Problems
Directadmin Server 2 Problems
Problems today with server??
Server 1 has same problems than server 2??
We are aware of the problems on Server 2 (Frih.net)
FTP Problems on Server 2
Googlebot banned from server 2?
Server 4 problems (17-20-01-2012)
Server 1 problems
Problems with Server 1
Reply to topic    Frihost Forum Index -> Webmaster and Internet -> SEO and Search Engines

FRIHOST HOME | FAQ | TOS | ABOUT US | CONTACT US | SITE MAP
© 2005-2011 Frihost, forums powered by phpBB.