Categories: Crawling, indexing & ranking :

Riddle me this, weird GET's in my logs

Showing 1-28 of 28 messages
Riddle me this, weird GET's in my logs Alexander Maassen 7/20/12 3:25 PM
Today I was checking some stuff and found this in the access logs of www.scarynet.org, my question: why on earth does the bot even consider these urls existing?

66.249.66.162 - - [21/Jul/2012:00:10:39 +0200] "GET /krdiyknd.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/b
ot.html)"
66.249.66.106 - - [21/Jul/2012:00:10:59 +0200] "GET /uinvsnldnayynoz.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.googl
e.com/bot.html)"
66.249.66.162 - - [21/Jul/2012:00:11:19 +0200] "GET /bavxyytg.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/b
ot.html)"
66.249.66.106 - - [21/Jul/2012:00:11:28 +0200] "GET /skesfiaol.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/
bot.html)"
66.249.66.106 - - [21/Jul/2012:00:12:36 +0200] "GET /enqyhrefzgdtcja.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.googl
e.com/bot.html)"
66.249.66.162 - - [21/Jul/2012:00:13:25 +0200] "GET /fvybrjbmptvz.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.66.162 - - [21/Jul/2012:00:13:44 +0200] "GET /idoytkusq.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.66.106 - - [21/Jul/2012:00:14:33 +0200] "GET /yoylqwbrnjf.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.70 - - [21/Jul/2012:00:15:02 +0200] "GET /agjrozkgmx.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.66.162 - - [21/Jul/2012:00:15:12 +0200] "GET /xeqkbsehzmup.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.66.162 - - [21/Jul/2012:00:15:31 +0200] "GET /fwdxrvsygxisquz.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.70 - - [21/Jul/2012:00:16:01 +0200] "GET /kraimcsjsaosfnss.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.70 - - [21/Jul/2012:00:16:10 +0200] "GET /ltllttgmvfxa.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.70 - - [21/Jul/2012:00:16:30 +0200] "GET /npofaopeiorknfew.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.105 - - [21/Jul/2012:00:16:49 +0200] "GET /wpnylfoxo.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.105 - - [21/Jul/2012:00:17:18 +0200] "GET /aryquvxxhvaok.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.70 - - [21/Jul/2012:00:17:28 +0200] "GET /pqzueqzlalfrqu.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.105 - - [21/Jul/2012:00:17:38 +0200] "GET /dxqdbpha.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.70 - - [21/Jul/2012:00:18:36 +0200] "GET /egifoyrsjoqhvkgu.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.70 - - [21/Jul/2012:00:18:46 +0200] "GET /dygevmhqpkqkn.html HTTP/1.1" 404 5353 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.105 - - [21/Jul/2012:00:19:26 +0200] "GET /umlmfhmeahzh.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.70 - - [21/Jul/2012:00:19:35 +0200] "GET /zujiefsobhaqurz.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.70 - - [21/Jul/2012:00:19:45 +0200] "GET /hfnvbjku.html HTTP/1.1" 404 5352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"


Re: Riddle me this, weird GET's in my logs Panda_Effects 7/20/12 3:29 PM
A possible googlebot glitch or maybe a spoofed ip to look like googlebot?
Re: Riddle me this, weird GET's in my logs Panda_Effects 7/20/12 3:34 PM
Also, see this thread

https://productforums.google.com/forum/?hl=en&fromgroups#!category-topic/webmasters/crawling-indexing--ranking/8ceSHdelBj8 
Re: Riddle me this, weird GET's in my logs Lysis 7/20/12 3:38 PM
ScaryNet, I escalated this, if only because it's a weird coincidence. Not sure if a Googler will comment, but I thought it was a strange coincidence, so maybe they will look into it.

It could be a new hack or a scraper sniffing people's sites, and if it is, I'm sure Google can give us a headsup. Otherwise, normal noise. But we'll see. :)
Re: Riddle me this, weird GET's in my logs JohnMu 7/20/12 5:29 PM
Thanks for forwarding & including the link to the other thread -- I'll take a look at what's happening here with the team and post once I know more.

Cheers
John
Re: Riddle me this, weird GET's in my logs 50BMG 7/20/12 8:13 PM
Thanks  to Panda_Effects, and ScaryNet for linking my thread about this issue to this one.


On Friday, July 20, 2012 6:38:15 PM UTC-4, Lysis wrote:
ScaryNet, I escalated this, if only because it's a weird coincidence. Not sure if a Googler will comment, but I thought it was a strange coincidence, so maybe they will look into it.

It could be a new hack or a scraper sniffing people's sites, and if it is, I'm sure Google can give us a headsup. Otherwise, normal noise. But we'll see. :)

Thanks for the escalation. I could speculate about it's cause, but won't for the time being.

I would however note that the inquiries to my site don't appear to match any of yours here. [which I do see are repeated, at least in part on multiple sites]

Thinking the BOT might be legitimately looking for signs of Malware, I scanned all drives on the server and surrounding systems for those names. None turned up. Did any of you do this? and Did you find such a file?




Re: Riddle me this, weird GET's in my logs Panda_Effects 7/20/12 8:42 PM
"could speculate about it's cause"

Speculations are just that, so I would be interested in hearing your thoughts.
Re: Riddle me this, weird GET's in my logs webseos 7/20/12 8:53 PM
I think it is a hack case, check your folder's permission level.

Thanks
Bikram
http://www.rankingoogle.com
Re: Riddle me this, weird GET's in my logs 50BMG 7/21/12 12:07 AM
Ok... what the heck.

Q) Are they from Google or not?

A) Seems to me they are.  In my case, if they were "Spoofed", the senders would have to know what Google IP address normally Crawled my site, and duplicate it. But if they did this, they were clever enough to manage it so as not to interfere with Google's legitimate crawling of the site, which was simultaneous [within minutes], and who's activity I can validate in Google's WebMaster tools.

But there is more... it's multiple sites, again nearly at the same time. I take it that your activity too, was from the customary crawler addresses. What's more, these are different IPs than my crawler, meaning the spoofs would be multiple sources, multiple targets, again at the same time. Too much sophistication for me, I think it came from their network.

Q) Are they Legitimate crawling requests?

A) Two simple reasons I say "NO".
  • First, there is no referral link given in the requests. This is highly unusual for Google. [maybe a first]
  • Second, In my case the response to these requests is 404. If they were Legitimate, I would expect these 404's to be reported in Google's WebMaster Tools for my site as Crawler Soft Errors (404). They are not listed.

So... they came from Google, and aren't normal crawling requests. What could they be?

1) I've never seen it, but Google could have chosen today to scan our sites for signs of Malware. But where would they get these names? If it is such a probe, it could only be with specific knowledge of the pest sufficient to predict the algorithm it would use to create the names. Would Google take it upon themselves to do this? I don't know.

2) It could be that the Crawlers themselves, or perhaps the Crawler network, [less likely] has been hacked. In this case, I would expect that the requested names were generated by a pest that needed to find signs of it's "brethern" on the web. They chose to use Google's web identity, knowing that it's crawlers would be permitted, and that because of the volume of Google traffic, would be least likely to be recognized.

3) Google knowingly issued the crawls, perhaps in response to a special request by a government agency. A lot happened in the 24 hours preceding this event, and I would not think it unreasonable for Homeland Security or the FBI, or other agency, to have been able to make such a request in that time. Why the government would not perform this scan with their own resources, I do not know. Perhaps it's a reason I don't count this as a likely scenario.  However, if it is correct - well, you might not have long to read this post.

How's that for speculation?

Re: Riddle me this, weird GET's in my logs webado 7/21/12 12:49 AM
Robots do not pass referrer information. Your server will never see a referrer from an access by a robot, whether it's Googlebot or any other robot.

You have to wait a few days until those 404s are reported in Webmaster Tools to find out what some of the actual referrers might have been (part of the data collection and reporting for Webmaster Tools).
Re: Riddle me this, weird GET's in my logs Panda_Effects 7/21/12 12:55 AM
"I think it is a hack case, check your folder's permission level."

What about what bikramchoudhury suggested?  What if a good number of sites got "infected" quickly?

Or could be a simple glitch as I mentioned earlier.  And still seems feasible it could be a spoofed ip.
Re: Riddle me this, weird GET's in my logs Panda_Effects 7/21/12 12:57 AM
But also believe it could be a test by Google as you suggested to.
Re: Riddle me this, weird GET's in my logs Panda_Effects 7/21/12 1:11 AM
Some others have seen this issue too

http://www.webmasterworld.com/search_engine_spiders/4477593.htm 
Re: Riddle me this, weird GET's in my logs 50BMG 7/21/12 1:19 AM

On Saturday, July 21, 2012 3:49:31 AM UTC-4, webado wrote:
Robots do not pass referrer information. Your server will never see a referrer from an access by a robot, whether it's Googlebot or any other robot.

You have to wait a few days until those 404s are reported in Webmaster Tools to find out what some of the actual referrers might have been (part of the data collection and reporting for Webmaster Tools).

Rechecking the logs, I see your point about the referal link. As for the WebMaster tools, I saw some items listed from today, but not for after the time of the anomalies. So I will indeed have to wait and see. Thank you. I'll have to reconsider my speculations.


On Saturday, July 21, 2012 3:55:53 AM UTC-4, Panda_Effects wrote:
"I think it is a hack case, check your folder's permission level."

What about what bikramchoudhury suggested?  What if a good number of sites got "infected" quickly?


I guess this one is going to have to be explained to me. What folder permissions are we talking about?
Re: Riddle me this, weird GET's in my logs Panda_Effects 7/21/12 1:26 AM
"What folder permissions are we talking about?"

Do not know.  But since hackers can change people's .htaccess file seems reasonable as a possibility.

But I am inclined to believe like you that it is a test by Google.  However, it seems to be a very strange test as what possibly could be tested by looking for garbage html file?

Puzzling.

But on that link I supplied above one person said "changed a batch of individual 404s or 410s to page-specific redirects. So it made sense for Google to want to confirm that I hadn't stepped into Soft 404 territory by redirecting everything".  Something else to consider as possibly it is a quick test check?
Re: Riddle me this, weird GET's in my logs 50BMG 7/21/12 1:34 AM
If the GoogleBot found one of the files, I could see the conclusion I'd been hacked. Not all of you though.

So not having been hacked - what use is it to be checking my system for why the bot asked for these?

I'm missing this line of thought entirely.


Re: Riddle me this, weird GET's in my logs Panda_Effects 7/21/12 1:43 AM
Since you know you were not hacked I agree.  Just think many things should at least be considered.

Did you recently change anything with the .htaccess or httpd.conf that possibly could have affected something which triggered Google to start checking for something like that person on webmasterworld said?  I have seen when I have made changes in .htaccess and it was not complete enough see different errors in WMT.  Had one today and thought I pretty much had everything cleaned up or very close.

And it is possible that Google did not see anything recent but is doing tests on random websites to check for issues.  And if they do that for a number of sites in batches could explain why a few of you saw the same thing today.  Could be many more that just never write in groups/forums or maybe do not watch their logs as actively as some of you do?

Re: Riddle me this, weird GET's in my logs 50BMG 7/21/12 2:25 AM


On Saturday, July 21, 2012 4:43:33 AM UTC-4, Panda_Effects wrote:
Since you know you were not hacked I agree.  Just think many things should at least be considered.

Did you recently change anything with the .htaccess or httpd.conf that possibly could have affected something which triggered Google to start checking for something like that person on webmasterworld said?  I have seen when I have made changes in .htaccess and it was not complete enough see different errors in WMT.  Had one today and thought I pretty much had everything cleaned up or very close.


No recent changes. File contents of .htaccess and httpd.conf are unaltered, and the timestamps agree with the last time I edited them. I am less inclined to think I provoked this somehow, because I see other admins with the same issue in the same time period on their sites.
 
And it is possible that Google did not see anything recent but is doing tests on random websites to check for issues.  And if they do that for a number of sites in batches could explain why a few of you saw the same thing today.  Could be many more that just never write in groups/forums or maybe do not watch their logs as actively as some of you do?


I agree that many admins may not have looked to see these in their logs. [yet]

Incidentally... no further probes of this kind have occurred as of this hour, the last being Jul 20 14:36:52 EDT.


 
Re: Riddle me this, weird GET's in my logs Panda_Effects 7/21/12 2:37 AM
It does seem likely it is a test that Google is doing on a number of sites if they saw something or they are just doing it randomly as a test on all sites?

With the .htaccess or httpd.conf it would not have to be from any recent change as since Panda I have seen they are checking things much more now and seems differently and have caught a number of things I had no idea I had not done somethings the best way.  So to me it is good if they do some testing and show them in WMT, but seems it might be helpful if they explained some of the tests, thus possibly less confusion?  But maybe they do not want to say exactly?  I recall seeing strange urls in WMT before as well quite some time back.  And have seen strange gets in the logs as well.  Some of said that it is feasible that Google uses other ip addresses to check if different content is served to search engines and visitors and that makes sense.
Re: Riddle me this, weird GET's in my logs cristina 7/21/12 3:38 AM
Do you have something like this in web crawl errors in Google Webmaster Tools? Look in GWT at both site URLs, with www and without www.

Re: Riddle me this, weird GET's in my logs 50BMG 7/21/12 5:53 AM


On Saturday, July 21, 2012 6:38:56 AM UTC-4, cristina wrote:
Do you have something like this in web crawl errors in Google Webmaster Tools? Look in GWT at both site URLs, with www and without www.

 "No" - They are still not there as of this hour. The last item in that list [a short list] was in June and is a regularly formed query.
Re: Riddle me this, weird GET's in my logs webado 7/21/12 6:05 AM
You have to check your site by www, non-www as well as IP (79.99.133.85) since it responds the same way to all those urls and everything will get logged in the same log with no indication as to which was used.

After you've checked it all,  go on and implement this 301 redirection in the .htaccess file  to get rid of this canonical problem and send all requests to www.scarynet.org if they weren't to that:

RewriteEngine on
RewriteCond %{http_host} !^www.scarynet.org$ [nc]
RewriteRule ^(.*)$ http://www.scarynet.org/$1 [r=301,nc,L]





(unknown) 7/21/12 6:43 AM <This message has been deleted.>
Re: Riddle me this, weird GET's in my logs cristina 7/21/12 7:26 AM
Can you check in your server access logs, for earlier dates as well, did you have URLs like these accessed by other than Googlebot?

It is not a solution, but maybe until there will be an explanation for where from/how Googlebot found those URLs, if you want to just stop Googlebot accessing the URLs for now (just to unclutter a bit your server access logs), I see that all URLs end in .html. If you do not have good URLs on your site ending in .html block them in robots.txt to Googlebot with 
Disallow: /*.html$
I repeat, if you do not have good URLs ending in .html




Re: Riddle me this, weird GET's in my logs webseos 7/21/12 7:34 AM
I think Christina is correct, where Google found these URL ?Googlebot dont generate URLs
Re: Riddle me this, weird GET's in my logs 50BMG 7/22/12 1:45 AM
On Friday, July 20, 2012 6:34:13 PM UTC-4, Panda_Effects wrote:
Also, see this thread

https://productforums.google.com/forum/?hl=en&fromgroups#!category-topic/webmasters/crawling-indexing--ranking/8ceSHdelBj8 

In that thread, is a supposition I did not consider when speculating earlier:



On Sunday, July 22, 2012 12:41:40 AM UTC-4, harry201207 wrote:I found the same strange log entries, (its not the first time) look at this :

66.249.71.123 - - [20/Jul/2012:11:48:48 -0400] "GET /german/gdehilbxalgwda.html HTTP/1.1" 404 636 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:13:59:50 -0400] "GET /zysmervglu.html HTTP/1.1" 404 637 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:00:06 -0400] "GET /sokyetaeqxy.html HTTP/1.1" 404 637 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:09:26 -0400] "GET /uzqoyuaw.html HTTP/1.1" 404 637 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:09:29 -0400] "GET /meqctuorh.html HTTP/1.1" 404 637 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:14:01 -0400] "GET /uhavsnbn.html HTTP/1.1" 404 636 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:17:16 -0400] "GET /mqvksbgdf.html HTTP/1.1" 404 637 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:19:31 -0400] "GET /sitomqlunfpjdrjg.html HTTP/1.1" 404 637 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:20:20 -0400] "GET /ahadvudcjbgyry.html HTTP/1.1" 404 636 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:21:04 -0400] "GET /vhymnqhwnanvwa.html HTTP/1.1" 404 637 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:21:59 -0400] "GET /iotgodsauaawkxtx.html HTTP/1.1" 404 637 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:22:50 -0400] "GET /oskdlytqnhsfba.html HTTP/1.1" 404 637 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.123 - - [20/Jul/2012:14:23:40 -0400] "GET /yqyjexhcefppyclx.html HTTP/1.1"<span style="color: rgb(0, 0, 0);" class="styled-by-pretti
...
Re: Riddle me this, weird GET's in my logs 50BMG 7/22/12 2:07 AM
Perhaps this is this real answer.


On Saturday, July 21, 2012 7:16:51 PM UTC-4, Lysis wrote:
I retracted my escalation for ScaryNet, but the issue was sent to Google, D_Ready. They are looking into it, but remember, it's the weekend.

At the end of the day, it really doesn't matter anyway. It's not hurting anything. If it's a bug, they will fix it. If it's a hack, maybe Google will share some insight.

ETA: Google will crawl anything it thinks is a URL. We have truncated URLs in our logs, and it's from a scraper scraping Google's content. The truncated URL you see in the SERPs was scraped by some Indian scraper, so now Google crawls those links on our site. Interestingly, those links are textual on the site. They aren't hyperlinks, so again if Google thinks it's a URL, they will try to crawl it.

Re: Riddle me this, weird GET's in my logs JohnMu 7/22/12 3:50 PM
Hi guys

These requests are from our side, and accidentally requested by Googlebot. It's my understanding that this issue has since been resolved (or at least will be in the near future). Sorry for the confusion & thanks for posting here!

Cheers
John
More topics »