|Product crawl issues: All URLs blocked by robots.txt - why?||pduddy52||7/20/11 6:52 AM|
Issue: Submitted our first Google Merchant feed yesterday, about 4,500 products. Data quality report today said all URLs crawled (~4000) are blocked by robots.txt.
Example product URL: http://www.peedeetoys.com.au/The-World-s-Most-Beautiful-Jigsaw-Puzzles-The-Pi-p/CRN-839581.htm
1. Searched for the "peedee toys CRN-839581" in Google - page is returned.
2. Used Google Webmaster Tools > Crawler access to verify access to that page - success.
What could be preventing a successful crawl? What else can I check?
Any suggestions appreciated.
Peter & Allison,
|Re: Product crawl issues: All URLs blocked by robots.txt - why?||MisterEd43||7/20/11 6:59 AM|
|Re: Product crawl issues: All URLs blocked by robots.txt - why?||pduddy52||7/20/11 8:01 PM|
Thanks for the reply, MisterEd.
Not sure whether the pointer you've provided is relevant, though. The title of that help topic is "Our attempt to download your data feed was denied by your site's robots.txt file".
Google is not downloading our data feed - we're submitting it via FTP. It is being processed successfully and our products are listed in our Merchant Products list.
The problem is when Google attempts to crawl the URLs we've provided in the feed - all of them are returning "blocked by robots.txt". Our original post provides details of a supposedly-blocked product URL that we've verified is OK.
Any suggestions as to why our Merchant feed URL crawls are failing?
|Re: Product crawl issues: All URLs blocked by robots.txt - why?||Celebird||7/20/11 9:16 PM|
google-products regularly crawls websites --
product and other pages, for quality issues and
to verify the site meets all google's policies.
google must be able to crawl your site and
the site must respond in both a timely and
proper manner at all times.
what's seen in webmaster tools or a web-browser
at any particular moment may not reflect how, what,
or when google is trying to crawl and may not indicate
the response-time or access google requires.
details should be here:
a robots.txt googlebot entry should be similar for both a
crawl and scheduled-fetch and should be indicated here:
otherwise, please contact google directly here: