Categories: Crawling, indexing & ranking :

Googlebot trying to crawl Javascript strings as URLs

Showing 1-4 of 4 messages
Googlebot trying to crawl Javascript strings as URLs BugSlayer 5/20/10 11:50 AM
I have read the FAQs and checked for similar issues: YES
My site's URL is: priacta.com/trog
Description:

We had a strange 404 error in our logs today. The user agent was googlebot, and the IP checks out:
  Type          : 404
  Page          : www.priacta.com/downloads/download.com
  Referred from : Unknown
  Time          : 20/05/2010 03:34:57
  From IP       : 66.249.65.175
  User Agent    : Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Some of our pages contain the following (in a script tag), which I think is how Googlebot got confused:

$('div.downloaddotcom a').click(function(){
    urchinTracker ('/downloads/download.com');
});

Googlebot obviously decided that the string looked like a URL, and tried to crawl it. There are two problems with this. First, it isn't a URL, and crawling it pollutes our logs. If you start crawling arbitrary javascript strings, where will the madness stop?

Second, it is common practice for web masters to use javascript to create links that work for users, but aren't observed by spiders, and Googlebot is obviously trying aggressively to work around this. I think I have a problem with that.
Re: Googlebot trying to crawl Javascript strings as URLs Ricky Roma 5/20/10 12:04 PM
If your unhappy with Google crawling your site/pages - Im sure you are aware of methods to prevent the bot crawling.
Then you won't have a problem.
Re: Googlebot trying to crawl Javascript strings as URLs luzie 5/20/10 12:13 PM
Hi BugSlayer,

well, I admit this is a bit funny ... :D ... yet doesn't really do any harm - and isn't wrong  on Google's side either. I mean "/downloads/download.com" definitely is an URL (a relative address), and as long as it's there in the script for some reason, Google will try to crawl it. If you say you don't like your logs to be polluted with this kind of entries, either filter the log - or exclude any (pseudo-) address of the kind from crawling in your robots.txt file perhaps.

-luzie-
Re: Googlebot trying to crawl Javascript strings as URLs webado 5/20/10 3:09 PM
Or fix the javascript code to prevent robots from crawling it: either move it to an externl js script, or add CDATA and/or html comment tags.