Categories: Internationalization :

Questions on hreflang Sitemaps

Showing 1-3 of 3 messages
Questions on hreflang Sitemaps Steve Morgan 9/20/12 2:38 AM
Hi Pierre, Christopher et al,

I had a few questions regarding hreflang Sitemaps, which I'm hoping will be worthwhile to other 'hreflangers' as it looks like they've not been covered before (either in the official Webmaster Tools documentation or within this forum). Rather than post each of them separately, I thought it best to put them into one post - I hope that's ok.
  1. Best practice for a 'Rest of World' section (.com vs. /international/): example.com has a main 'master' structure (e.g. example.com/category/page) but then also about 10 other countries (e.g. example.com/uk/category/page, example.com/us/category/page, etc.) as well as an international/Rest of World catch-all part of the site, which is not country-specific but will all be in English (e.g. example.com/international/category/page). As "each url element must include a loc tag indicating the page URLs, and an xhtml:link rel="alternate" hreflang="XX" subelement for every alternate version of the page, including itself" (source), I imagine that the 'master' (example.com/category/page) should be highlighted in the main Sitemap, with it being designated the "en" code. But if this is the case, what should we do about the example.com/international/ area of the site? Should we noindex it and let international searchers land on example.com/category/page pages? We're considering a URL restructure, in which case should we take the opportunity to eliminate example.com/international/ entirely and just have example.com/category/page pages for those searchers instead?

  2. "alternative" for non-existent pages: Within example.com, some countries will have content that other countries will not have, e.g. there might be an example.com/uk/xyz page but not an example.com/us/xyz because in this instance it is not applicable to US visitors and therefore a page/version does not need to be created. In some instances, this varies on a page-by-page basis involving hundreds of URLs. Is this something that we will have to trawl through manually and amend the Sitemap to reflect it 100% perfectly? What if example.com/us/xyz were to be included as an "alternative" that doesn't exist? So long as it isn't also given a <url><loc> element as well, will it be ok to do this?

  3. (Legitimate) duplicate content within hreflang: In addition to separating their site into separate countries (which hreflang should take care of), example.com is regulated by an industry body and therefore by law it has to show two versions of each section to two different types of visitors. For example, let's say example.com is a law website and it has to show different content to solicitors and consumers (i.e. laymen), so you'd have example.com/uk/category/page/solicitors as well as example.com/uk/category/page/consumers. Sometimes they will have different content, but sometimes the same content will suffice, but by law they still have to show them as two different pages: /solicitors and /consumers. What's best practice here? Is it simply a case of making sure that the content varies regardless? Alternatively, is there a way that we can incorporate rel="canonical" within hreflang to tell Google that there is a 'master' page for each (e.g. example.com/uk/category/page/) or will that have other repercussions and cause other problems?

  4. Future content: example.com is a constantly growing website, with new pages being added in the form of a blog/news section as well as generally across the site. How should we manage the hreflang Sitemaps going forward? How often should we update the Sitemaps? If we only implement hreflang via Sitemaps, what are the possible repercussions if Google spiders a new page that hasn't been accounted for within a hreflang Sitemap (and therefore isn't attributed to hreflang in any way)?

Many thanks,

Steve

Re: Questions on hreflang Sitemaps pierrefar 9/20/12 6:27 AM
Hi Steve,

Good questions. In order.

1. A good way to think about this is that for each language, pick the "default" page that doesn't target any specific countery. In your example, suppose that example.com/category/page is such a page in English. Next you would mark up its alternates that target specific language + country combinations. To fully mark up your example:

<link rel="alternate" hreflang="en" href="http://example.com/category/page" />
<link rel="alternate" hreflang="en-gb" href="http://example.com/uk/category/page" />
<link rel="alternate" hreflang="en-us" href="http://example.com/us/category/page" />

I'm unclear about the differences between /international and the /category pages. Are you suggesting they are the same content that serve the same purpose but on different URLs?

2. If a URL doesn't have an alternate for a given language (or language+country combination), then you simply don't mark it up. They would be (404) error pages (right?).

3. I'm not sure I fully understand this question, so in general terms: If we discover multiple URLs that have substantially the same content, our algorithms will pick one representative URL for this content (the canonical URL) to index and show in our search results. You can use rel="canonical" in addition to rel-alternate-hreflang as long as it's between pages that are genuinely the same. For example, don't use rel="canonical" between different languages (e.g. French and English) or between two pages where the differences are important (e.g. en-gb with prices in GBP and the UK VAT rate and en-us with prices in USD).

4. Our standard recommendations for Sitemaps apply here. Update them when you update your content, and update the rel-alternate-hreflang annotations when new pages are added to the cluster. And if Googlebot discovers a URL that is not part of any rel-alternate-hreflang cluster, then it's fine: it's how sites that don't use rel-alternate-hreflang (i.e. most sites!) are handled. It's OK to have only a subset of the pages on a site annotated as part of rel-alternate-hreflang clusters.

Finally, I did a hangout about this topic recently that goes into more details and may help you: http://youtu.be/fRT5NSbtGrQ .

Hope this helps,
Pierre

Re: Questions on hreflang Sitemaps Steve Morgan 9/21/12 5:50 AM
Hi Pierre,

Thank you for replying, it's much appreciated.
  1. Makes sense, thank you. In answer to your question, I think I had my wires crossed slightly - the /international/ area is going to be the International/Rest of World site and there shouldn't be what I was describing as a 'master' site, so we'll have: <link rel="alternate" hreflang="en" href="http://example.com/international/category/page" /> instead.

  2. Again, makes sense. Yes, they would be 404s. I thought this would be the case, admittedly!

  3. That's interesting! In that case, if I've understood you correctly, I guess we could canonicalise example.com/uk/category/page/solicitors to example.com/uk/category/page/consumers (or vice versa), or perhaps even have a third 'main' version (example.com/uk/category/page/) that /solicitors and /consumers canonicalise into. Yes, I thought canonical across hreflang would be a problem - we won't be canonicalising across regions, just within them.

  4. Again, as I thought.

Yes, I was in the Hangout and found it very useful! :-)

Thanks again, Pierre!

Steve