Categories: Sitemaps :

GoogleBot request /sitemap.xml up to 14,800 times a day, across 375 subdomains

Showing 1-9 of 9 messages
GoogleBot request /sitemap.xml up to 14,800 times a day, across 375 subdomains WebTravelLog 4/13/10 6:36 AM
I have read the FAQs and checked for similar issues: YES
My site's URL is: *.traveljournal.net & *.webtravellog.com

Description:

Google bot access /sitemap.xml for an excessive amount of times:

+--------------+-------------------+-----------------+-------------+
| Date         | Domain            | Number of times | Total Bytes |
|              |                   | requested       |             |
+--------------+-------------------+-----------------+-------------+
| Apr 08, 2010 | WebTravellog.com  |          13,393 | 110,976,171 |
| Apr 09, 2010 | WebTravellog.com  |          10,441 |  87,420,998 |
| Apr 10, 2010 | WebTravellog.com  |           1,247 | 111,592,770 |
| Apr 11, 2010 | WebTravellog.com  |          13,396 | 112,310,984 |
| Apr 12, 2010 | WebTravellog.com  |          13,340 | 111,391,084 |
+--------------+-------------------+-----------------+-------------+
| Apr 08, 2010 | TravelJournal.net |          14,686 | 127,971,995 |
| Apr 09, 2010 | TravelJournal.net |          11,574 |  98,508,365 |
| Apr 10, 2010 | TravelJournal.net |           1,219 | 129,021,700 |
| Apr 11, 2010 | TravelJournal.net |           1,212 | 127,179,098 |
| Apr 12, 2010 | TravelJournal.net |          14,844 | 129,012,441 |
+--------------+-------------------+-----------------+-------------+


The sitemap is dynamically generated for each public sub-domain (roughly 375 sub-domains in total)
Typical response headers:

HTTP/1.1 200 OK
Date: Tue, 13 Apr 2010 13:32:26 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.2.6
Content-Length: 1331
Keep-Alive: timeout=3, max=100
Connection: Keep-Alive
Content-Type: application/xml

Could somebody please look into this?
Re: GoogleBot request /sitemap.xml up to 14,800 times a day, across 375 subdomains luzie 4/13/10 7:26 AM
Hi WebTravelLog,

hmm, looks weird ... ^^

1. Are you sure it's really Googlebot who's doing this? (Do a reverse IP lookup to find out)

2. When you say, "sitemaps are dynamically generated" - how often does this happen? Do you (or does your software) send a ping to the search engines as a sitemap submission every time the sitemap is generated anew?

-luzie-
Re: GoogleBot request /sitemap.xml up to 14,800 times a day, across 375 subdomains WebTravelLog 4/13/10 8:07 AM
The request where from (543 IPs in total):
66.249.65.*
66.249.66.*
66.249.67.*
66.249.68.*
66.249.71.*
those are all within the Google netblock: 66.249.64.0 - 66.249.95.255

The sitemaps are generated on request; all information (including timestamps) is from database entries.
We do not ping any search engines (or other service) with update notifications (or anything else for that matter). Instead, the sitemap url is advertised at very bottom of each sub-domain's /robots.txt file.

Other search engines do request the sitemap file, but only a modest amount of times. Only GoogleBot makes the excessive amount of requests.
Re: GoogleBot request /sitemap.xml up to 14,800 times a day, across 375 subdomains luzie 4/13/10 8:20 AM
Ok, I think we have all necessary information now to ask a Googler what's wrong here.

-luzie-
Re: GoogleBot request /sitemap.xml up to 14,800 times a day, across 375 subdomains JohnMu 4/13/10 9:16 AM
Hi WebTravelLog
Thanks for posting -- I'll check with the team and get back to you as soon as I know more :-)

Cheers
John
Re: GoogleBot request /sitemap.xml up to 14,800 times a day, across 375 subdomains WebTravelLog 4/19/10 11:07 PM
bump

The problem persists.
Re: GoogleBot request /sitemap.xml up to 14,800 times a day, across 375 subdomains JohnMu 4/20/10 4:53 AM
Hi WebTravelLog
Google's Sitemaps crawler usually reacts to the update frequency of your Sitemap files. If we find new content there every time we crawl, we may choose to crawl more frequently. If you can limit updates of the Sitemap files to daily (or whatever suits your sites best), that may help. Similarly, if you create a shared Sitemap file for these subdomains, that could help by limiting the number of requests we have to make for each subdomain -- you could let us know about the Sitemap file by mentioning it in your robots.txt file using a "Sitemap:" directive (the Sitemap file does not have to be on the same host or domain as the site itself). If we're generally crawling your sites too frequently, you can also set the crawl rate in Webmaster Tools for those sites.

Hope it helps!
John
Re: GoogleBot request /sitemap.xml up to 14,800 times a day, across 375 subdomains WebTravelLog 4/20/10 9:53 AM
No, this does not help.

The sitemap is advertised in the robots.txt, and it is not a problem that the sitemap is fetched separately for each subdomain.
My _site_ is not crawled to frequently, only our sitemap is.

The problem is that /sitemap.xml (less than 9 kB) is requested so many times that there is more than 1 GB of data traffic for this resource each month from googlebot.

Let me repeat that other services consuming our sitemaps exhibit no problems. Only GoogleBot makes the excessive amount of requests.
Re: GoogleBot request /sitemap.xml up to 14,800 times a day, across 375 subdomains ColinMcDermott 4/21/10 9:24 AM
That does seem very excessive if true...

http://www.webtravellog.com/Sitemap.xml

The last mod date is the same on each file in the sitemap, and - if I am correct is displaying ... now?

So everytime Google visits the sitemap, it is told that every webpage on the site should have just changed?