Categories: Crawling, indexing & ranking :

AJAX Crawling working with hash but not meta tag

Showing 1-16 of 16 messages
AJAX Crawling working with hash but not meta tag tmoyer 3/19/11 4:41 PM
I have read the FAQs and checked for similar issues: YES
My site's URL (web address) is: toddmoyer.net/blog
Description (including timeline of any changes made):

I have an AJAX site, and <meta name="fragment" content="!"> has been added to all the pages (all the pages of the site are root pages with hash marks). My snapshot mechanism seems to be working fine. Here's an example:

AJAX page: http://toddmoyer.net/blog
SNAPSHOT page: http://toddmoyer.net/blog?_escaped_fragment_=

But when I use Fetch as Googlebot, the page it gets is the AJAX version.

However, when I test the site with this URL: http://toddmoyer.net/blog#!

The snapshot version is shown by Fetch as Googlebot.

So it would seem my meta tag is not being recognized. Any help would be appreciated.2011-03-19

*all the pages of the site are root pages with-OUT hash marks

Re: AJAX Crawling working with hash but not meta tag webado 3/19/11 5:07 PM
Tghis is what web-sniffer.net sees:
 
<HTML xmlns="http://www.w3.org/1999/xhtml" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml" xml:lang="en" lang="en">
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="fragment" content="!"></meta>
<script type="text/javascript">
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-22065896-1']);
  _gaq.push(['_trackPageview']);
  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();
</script>
<script type="text/javascript">var pageRequest = "/blog";var getVars = "";var userId = null;</script><script type="text/javascript" src="/js/versions.js?cc=668093"></script>
<script type="text/javascript" src="/js/jquery-1.5.min.js"></script>
<script type="text/javascript" src="/js/jquery-ui-1.8.6.custom.min.js"></script>
<script type="text/javascript" src="/js/jquery.simpleSlide.js"></script>
<script type="text/javascript" src="/js/jquery.dropshadow.js"></script>
<script type="text/javascript" src="/js/serialize.js"></script>
<script type="text/javascript" src="/js/user.js"></script>
<script type="text/javascript" src="/js/cookies.js"></script>
<script type="text/javascript" src="/js/render.js"></script>
<link type="text/css" href="/css/jqueryui/jquery-ui-1.8.6.custom.css" rel="stylesheet" />
</HEAD>
<body>
<!-- start padding                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                       end padding -->
</body>
</HTML>
 
 
 
 
 
Re: AJAX Crawling working with hash but not meta tag webado 3/19/11 5:08 PM
WHich is the same as what I see in teh browser.
 
There's no content that's not in javascript.
Re: AJAX Crawling working with hash but not meta tag tmoyer 3/19/11 5:16 PM
What URL is that for? That looks like the source for the AJAX version, not the SNAPSHOT version.

Are you familiar with the AJAX crawling spec? http://code.google.com/web/ajaxcrawling/docs/getting-started.html

Once again...
The AJAX version: http://toddmoyer.net/blog
The SNAPSHOT version: http://toddmoyer.net/blog?_escaped_fragment_=

Thanks!
Re: AJAX Crawling working with hash but not meta tag webado 3/19/11 7:00 PM
 
This would be good: http://toddmoyer.net/blog?_escaped_fragment_= as it has actual content.
 
No, not  familiar with Ajax crawling. Would using a canonical link element that gives the url of what you call the snapshot page help?
 
But this may be an issue too:
 
 
 
Re: AJAX Crawling working with hash but not meta tag webado 3/19/11 7:02 PM
 
>>The snapshot version is shown by Fetch as Googlebot.

I'd say that's good, if by that you mean that when you try Fetch sas Googlebot for http://toddmoyer.net/blog# it returns http://toddmoyer.net/blog?_escaped_fragment_=
 
Re: AJAX Crawling working with hash but not meta tag tmoyer 3/19/11 7:24 PM
Thanks for the validator tip. It looks like I have a little issue with the case of my <html> and <head> tags.

Here's how I know that the validity of the HTML is not the problem: I'm using the "Fetch as Googlebot" feature for testing, and when I put #! on my URLs (to denote an AJAX page), the bot shows the content of my static snapshot page (as expected).

The only problem is that I would prefer not to have to submit all my URLs to Google with #! on them. The specification says the addition of the <meta name="fragment" content="!"> tag will work, and Googlebot will respond by fetching the page with ?_escaped_fragment_= on the query string, which provides the snapshot.

Everything checks out ok on my end, so I'm wondering if there's something I'm missing or there's a problem with Googlebot.

Do you know if it would be possible to escalate my issue to someone at Google knowledgeable about the AJAX crawling system?

Thanks again!
Re: AJAX Crawling working with hash but not meta tag tmoyer 3/19/11 7:26 PM
>> I'd say that's good, if by that you mean that when you try Fetch...

Yes, that part is working.
Re: AJAX Crawling working with hash but not meta tag webado 3/19/11 7:33 PM
I will see if I can get somebody's attention on this.
Re: AJAX Crawling working with hash but not meta tag cristina 3/20/11 9:08 AM
Can you look in your server access logs to see
what URLs are requested by the real Googlebot (not only by Fetch as Googlebot)?
Re: AJAX Crawling working with hash but not meta tag JohnMu 3/20/11 2:54 PM
Hi Todd
It's good to see more sites using the AJAX crawling proposal :-)!

Looking at your blog's homepage, one thing to keep in mind is that the Fetch as Googlebot feature does not parse the content that it fetches. So when you submit http://toddmoyer.net/blog/ , it fetches that URL. After fetching the URL, it doesn't parse it to check for the "fragment" meta tag, it just returns it to you. However, if you fetch http://toddmoyer.net/blog/#! , then it should rewrite the URL and fetch the URL http://toddmoyer.net/blog/?_escaped_fragment_=

When we crawl and index your pages, we'll notice the meta-tag and act accordingly. It's just the Fetch as Googlebot feature that doesn't check for meta-tags, and instead just returns the raw content. 

I hope that makes it a bit clearer!

Cheers
John
Re: AJAX Crawling working with hash but not meta tag tmoyer 3/20/11 10:11 PM
Ah ha! Thanks for the info. I had a feeling that Fetch As Googlebot might just not be parsing them.

It would probably be a good idea to make a note of this on the Fetch page so this doesn't come up again and again with other folks. Or better yet, make it parse the meta tags. ;)

Regarding use of the AJAX proposal, happy to oblige! It's great that the system exists.

While I have your attention, I have a proposal of my own:
Build Chrome's JavaScript + DOM engine into Googlebot. Run the fetched page in a UI-less browser, then crawl the DOM (rather than the HTML source). Theoretically this would make all AJAX sites crawlable without burdening developers with snapshot mechanisms. As more sites make heavier use of AJAX type techniques, getting meaningful information from the HTML will get harder and harder. It would seem to be much more reliable (and easier) to just the let the page run, and then examine the DOM for content. Just my 2 cents.

Thanks again.
- Todd
Re: AJAX Crawling working with hash but not meta tag ramya.krishna2525 3/25/11 3:04 AM
Facing a similar kind of issue:

In the Fetch Google bot, the URL without hash fragment is not crawled, on explicitly placing the URL with #! gets the Ajax content.

Example:

http://www.example.com/ajax.html is not crawling the ajax content.

where in, http://www.example.com/ajax.html#!ajax1 crawls the content.

How do we confirm the Ajax crawling is successful?

i)  http://www.example.com/ajax.html  => has < a href="#!ajax1"/> when Google crawler actually crawls this page, will it identify this
    hash fragment , encode the URL and place a request??

ii) Should the html snapshot be the same as the one user sees on the browser?? For Google crawler, what matters is the Ajax content not how the content is placed on the html page??

Any help will be appreciated.
Re: AJAX Crawling working with hash but not meta tag markthesnowman 3/26/11 2:53 AM
tmoyer, webado,

Thanks for posting your comments about "Fetch as GoogleBot" ignoring the metatag.  I can confirm that this also happens for me.  The "Fetch as GoogleBot" is working well for the pages with #! fragments but not for the home page with the metatag. (it seems to simply ignore the metatag).

I did I test where I incorporated  the <meta name="fragment" content="!"> into a page; used  "Fetch as GoogleBot and then wrote the expected query string using a php command: <h2><?php echo "Values in Query string are".$_SERVER['QUERY_STRING']; ?></h2>.  The value of $_SERVER['QUERY_STRING] was null.

The "Fetch as GoogleBot" inconsistency between the #! pages and the metatag pages is a bit inconsistent. Hopefully, this feature can be incorporated such that "Fetch as GoogleBot" works for the metatag pages but in the interim a note regarding this limitation would be appreciated.

Thanks
Re: AJAX Crawling working with hash but not meta tag JohnMu 3/26/11 4:00 AM
Hi Mark

As I mentioned above,  the Fetch as Googlebot feature was purposely designed to show the data in the raw format, so that you can diagnose issues that come directly from those requests. By using a meta-tag on your pages, the "real Googlebot" would also have to fetch it normally first, and afterwards fetch the #!-version. This is similar to redirects, where we'd fetch the original URL first, see the redirect, and then fetch the final URL (Fetch as Googlebot also shows the redirecting page, not the final one). 

That said, I'm happy to pass your feedback on to the team here, perhaps there's a way we could optionally do both :-).

Cheers
John
Re: AJAX Crawling working with hash but not meta tag webado 3/26/11 4:03 AM
Wait, that's a fragment appended to the REQUEST_URI, not a query string. I don't believe php will return fragments with the REQUEST_URI server variable. Javascript only can get it.
 
However if your fragment is handled internally  by url rewriting to url with  a query string, then you will get that  with the QUERY_STRING server variable.