Categories: Crawling, indexing & ranking :

Googlebot found an extremely high number of URLs on your site: http://www.usracecalendar.com/

Showing 1-5 of 5 messages
Googlebot found an extremely high number of URLs on your site: http://www.usracecalendar.com/ dashaver 12/25/12 6:08 AM
I've read the FAQs and searched the help center. 
My URL is: http://www.usracecalendar.com/

Googlebot found an extremely high number of URLs on your site: http://www.usracecalendar.com/


It has taken some time to track this down but I believe the message above to be referring to what Google calls Duplicate title tags these are the same page, they have the same page creation time. Google calls them duplicate because it assigns them multiple urls. This problem is caused because Googlebot only crawls old (worthless) not new and usable content and it does not use the clean urls assigned to each page.


See the example below:

oxford-athletic-club-freaky-5k-wyndham-grand-pittsburgh-downtown-terrifying-10k-costume-run

/oxford-athletic-club-freaky-5k-wyndham-grand-pittsburgh-downtown-terrifying-10k-costume-run?page=622


These are both the same page same creation time and everything. Question? How to get Googlebot to stop using the question marked urls and then claiming duplicate content? To put it another way, how to get Googlebot only use new content and use the actual clean urls that are assigned to that page at creation time?


If I could get get Googlebot just to stop visiting the question marked urls it would help greatly in what webmaster tools calls a high number of urls



Re: Googlebot found an extremely high number of URLs on your site: http://www.usracecalendar.com/ dashaver 12/25/12 6:57 AM
I also wrote an article which might explain the problem better. http://dashaver.net/googlebot-does-not-recognize-clean-urls
Re: Googlebot found an extremely high number of URLs on your site: http://www.usracecalendar.com/ JohnMu 12/25/12 7:33 AM
Hi dashaver

This message means that we discovered a high number of new URLs while crawling your site. If you're seeing URLs with bogus URL parameters like that, then that would generally be because we found links to those URLs while crawling your site. The best way to resolve that is to make sure that these links are no longer on your site (it looks like at least some of the high "page=" parameters are older, and no longer linked on your site). Once you've done that, you can use the usual canonicalization methods to make sure that we can focus on your preferred URLs: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066 (alternately, if we're crawling invalid URLs, it's a good practice to return 404 to let us know). 

Hope it helps!
John
Re: Googlebot found an extremely high number of URLs on your site: http://www.usracecalendar.com/ Robbo 12/25/12 7:35 AM



Google requests URLs from your domain.  Where it found them or why it is asking for them is not very important.  The response to those  requests is in your hands - you decide what your want to happen, and then action that.   Simple as that.



>>> If I could get get Googlebot just to stop visiting the question marked urls it would help greatly in what webmaster tools calls a high number of urls


So, why are you publishing URLs that you don't want crawling?   If you really don't want them crawling, why not use the industry-standard robots.txt disallows.

And if there are certain URL parameters that you want ignoring by google, why not use the Webmaster Tools option where you tell google to ignore certain parameters?

And have you used [ rel="canonical" ] to indicate the preferred URL for indexing when several actual URLs produce the same content.


>>> This problem is caused because Googlebot only crawls old (worthless) not new and usable content..

How have you indicated that those old URLs are "worthless"?  Are they still accessible on your site? 


Why don't you use industry-standard 301 permanent redirection so that requests for any of the old/dead URLs are redirected to the closest equivalent new/current URL.


Robbo

Re: Googlebot found an extremely high number of URLs on your site: http://www.usracecalendar.com/ dashaver 12/25/12 5:02 PM
To block access to all URLs that include a question mark (?) (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):
User-agent: Googlebot
Disallow: /*?


http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449