Google +1 button working on robots.txt blocked page

Showing 1-9 of 9 messages
Google +1 button working on robots.txt blocked page Grandmaster Flash 9/20/11 5:22 PM
I have read the FAQs and checked for similar issues: YES

My site's URL (web address) is: http://rick.ly/nkyiXj
The above url is not my site but the issue is relevant to all webmaster implementing the +1 button and perhaps to Jenny Murphy since the issue is receiving considerable coverage on Twitter and social media in general.

The page I referenced is blocked by robots.txt yet the button displays properly (which I understood it would not do) my experiences are that a red exclamation mark is displayed on any blocked pages. I've had several developers complaining that they can't verify +1 button functionality until they push to production.

The secondary issue is that the pages are getting crawled/indexed, supposedly because of the +1 button.  

Since I'm working on a project that will implement the +1 broadly, I am hoping to learn from this case...perhaps what not to do.





Re: Google +1 button working on robots.txt blocked page JohnMu 9/21/11 2:45 PM
Hi Rick

As mentioned on http://www.google.com/support/webmasters/bin/answer.py?answer=1140194&hl=en if you put a +1 button on your pages, we assume that you're ok with us accessing the pages, even if they're blocked by a robots.txt file. In general, users don't know about robots.txt files, so they may +1 or reshare any URL if they find it interesting enough. We need to be able to access the page to get the title & snippet, and depending on what the user does, it may result in content from the page being made crawlable & indexable for web-search (eg if they share publicly). 

If you need to prevent content from being shared through +1 and/or resharing, you should use something like HTTP authentication to block unverified users from accessing the content. Doing that will prevent the +1 button from working (it'll use the red-button method of showing the user that it failed) and will also prevent users from reharing that URL with the prefilled title & snippet directly in Google+ (they will still be able to link to the URL, but it won't automatically show data from that page). 

Because of this behaviour, it's important that you use proper canonicalization (don't use the robots.txt for canonicalization) and that you use proper server-side authentication for private content (don't use the robots.txt or rely on unknown URLs). Both of these are things that we've said for a while now, so if you've been following those best practices, you won't have to change anything. 

I hope that helps clarify things! Let me know if you have any other questions.

Cheers
John
Re: Google +1 button working on robots.txt blocked page Grandmaster Flash 9/21/11 3:28 PM
Thanks John! 
Re: Google +1 button working on robots.txt blocked page Christian Oliveira 9/22/11 2:26 AM
@JohnMu, I think I am not understanding this the right way, so I hope you can clarify my doubts :)

In your FAQ, you include this:

"Will +1's from my site show up in search results?
If a user +1's a URL on your site, the Google search result snippet for that URL may be annotated in search results and search ads.

However, your site may make the same content available via different URLs. For example, your site may have several pages listing the same set of products. (More information about duplicate content.) One page might display products sorted in alphabetical order, while other pages display the same products listed by price or by rating. For example:

http://www.example.com/product.php?item=swedish-fish&sort=alpha
http://www.example.com/product.php?item=swedish-fish&sort=price
If Google knows that these pages have the same content, we may index only one version for our search results. As a result, +1's for the other versions may not appear in search results.

You can make sure Google displays +1 annotations for the most search results possible by adding a rel="canonical" link to the <head> section of the non-preferred versions of each page. This property should point to the canonical version, like this:

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish&sort=alpha">
This tells Google: "Of all these pages with identical content, this page is the most useful. Please prioritize it in search results."

Now, when a user +1’s a page with a non-canonical URL, Google will associate that +1 with the canonical, preferred version. More information about canonicalization."

In terms of usability, if we have two URLs for the same listing ordered by different criteria, and we place a +1 button on both of them, indicating which one is the canonical, and the user click on +1 and share to Google+ the one which is not the canonical, he/she will be not sharing the same thing he/she is viewing, so I don't think it's the best approach here (it could make sense if the +1 was only a +1, and not a sharing button)

On the other side, does this affect only to the pages that include the +1 button in their html or to any page which is given a +1? There is an extension for Google Chrome to give a +1 to any URL (https://chrome.google.com/webstore/detail/jgoepmocgafhnchmokaimcmlojpnlkhp), do that +1 have that effect on URLs blocked by robots.txt too?

And, regarding the indexing of those URLs blocked by robots.txt that receive +1, you mean indexing the URL the same way you index it when someone links to it? (like this for example: http://www.google.es/search?q=site%3Agoogle.es%2Fgroups), or you mean really index it, crawling all the content, links, etc.?

Looking forward your answer!

Thanks in advance
Re: Google +1 button working on robots.txt blocked page Christian Oliveira 9/22/11 2:54 AM
Also, :

* What happens if we place a <meta name="robots" content="noindex, follow"/> and add a +1 button?
* What happens if the URL is in a directory blocked by robots.txt AND removed from Webmasters Tools and add a +1 button?

Would these URLs get indexed?
Re: Google +1 button working on robots.txt blocked page Jenny Murphy 9/26/11 6:32 PM
Hi Christian,

These are lots of good questions! :) Let me see if I can sum up a concise answer.


If you target a page with a +1 button we will fetch that page when the +1 button is clicked. 

This is done regardless of the crawler directives. We do this because unless we fetch that page, we do not know what to add to the user's +1's tab or what content to use as a +Snippet if they decide to share that page.

Since that +1 is a public act we may use the information retrieved for other forms of indexing.


Does this answer your questions?

Thanks,
Jenny
Re: Google +1 button working on robots.txt blocked page Christian Oliveira 9/27/11 1:42 PM
Hi Jenny,

Thank you very much for your answer! Unfortunately, I think this is not *exactly* answering my doubts.

Let me write again the question trying to be a bit clearer:

- Can a URL blocked to google crawlers by any method (with the robots.txt, or with the meta robots tag) be indexed by Google (in a normal way, as if it was not blocked) just because there is a +1 button on it and someone clicked it?

- If so, does the same apply if the +1 button is not include but someone uses the Chrome Extension to +1 that URL?

I think the most restrictive rule should always apply, so if a URL is blocked the +1 should not count, or should give an error when trying to click it.

Looking forward you answer and explanation! :)

Thanks,

Christian
Re: Google +1 button working on robots.txt blocked page JohnMu 10/4/11 7:11 AM
Hi Christian

Regarding your questions:

- In terms of usability, if we have two URLs for the same listing ordered by different criteria, and we place a +1 button on both of them, indicating which one is the canonical, and the user click on +1 and share to Google+ the one which is not the canonical, he/she will be not sharing the same thing he/she is viewing, so I don't think it's the best approach here 

If the specified canonical does not match the currently viewed content then I might consider not showing a +1 button there (or just having one that applies to the site as a whole, maybe next to the site's logo). In general, if you're aware of something that could be confusing users, then you already know what to avoid :-).

- Can a URL blocked to google crawlers by any method (with the robots.txt, or with the meta robots tag) be indexed by Google (in a normal way, as if it was not blocked) just because there is a +1 button on it and someone clicked it?

This is a strange situation, on the one hand there are signals that the content shouldn't be crawled or indexed, on the other, there is a signal that the webmaster wants the content promoted & recommended publicly. My recommendation would be to make sure that this sort of conflict does not arise and that you explicitly choose one or the other. 

- If so, does the same apply if the +1 button is not include but someone uses the Chrome Extension to +1 that URL?

No, that's different. In a case like that, the webmaster is not explicitly signaling that they want the content to be promoted / recommended. 

I hope that helps to clear things up! Is there anything we missed? 
Cheers,
John
Re: Google +1 button working on robots.txt blocked page Christian Oliveira 10/6/11 12:58 PM
Hi John!

Thanks for answering :)

Regarding your answers:

"If the specified canonical does not match the currently viewed content then I might consider not showing a +1 button there (or just having one that applies to the site as a whole, maybe next to the site's logo). In general, if you're aware of something that could be confusing users, then you already know what to avoid :-)"

I get your point, but I feel it is not consistent (and so it's not good for the user) to show a +1 button (or a Facebook Like button, or a Tweet button) in some pages and not to do the same on others which are created the same way and have the exactly same structure (like the example of a listing ordered by price or alphabetically). Showing there one that applies to the whole site and one that applies to the listing is confusing too. So both solutions will confuse the users in my opinion.

"This is a strange situation, on the one hand there are signals that the content shouldn't be crawled or indexed, on the other, there is a signal that the webmaster wants the content promoted & recommended publicly. My recommendation would be to make sure that this sort of conflict does not arise and that you explicitly choose one or the other. "

I think you are not answering the question here :) I know I should avoid these cases, but I would like to know what the consequences could be. Sometimes there are limitations and the ideal situation cannot be achieved easily, so it's important to know the pros and cons of doing one thing or another.

"No, that's different. In a case like that, the webmaster is not explicitly signaling that they want the content to be promoted / recommended."

Right :)

So I've run a little test to see this in action: I created 5 URLs, with the following characteristics:

 * One URL blocked in the robots.txt file and with a +1 button
 * One URL blocked in the robots.txt file without a +1 button
 * One URL with the <meta name="robots" content="noindex, nofollow" /> and a +1 button
 * One URL with the <meta name="robots" content="noindex, nofollow" />, a +1 button, and deleted manually in Webmasters Tools
 * One non-blocked URL with a +1 button

All the URLs were linked from two other already indexed pages of the site and all of them received +1s (the second one received +1s through the Chrome Extension). This is what happened until now:

- The non-blocked URL is indexed as it should usually be.
- The URLs blocked by the robots.txt file were "inxeded" (only the title tag and the URL, as it usually happens when a URL blocked by a robots.txt gets linked from other sites), and omitted on the search results when doing a site: of the website where the URLs are hosted. No cache also.
- The URLs with the meta robots tag didn't get indexed.

All of them appear with a complete snippet in my Google+ Profile when I +1ed the URLs. The one that had no +1 button, appears even with the image and the content in my profile, so as I'm not including a +1 button in this URL, there are no signals that indicate Google I want that content to be promoted & recommended publicly.

So it seems (at least to me) the same behaviour (when indexing) I've seen until the time we had no +1 buttons. I can somehow understand that you don't obey the robots.txt when creating the snippet to show in the profiles of the users who +1 the URL, so you can show something "pretty there", but indexing the full content is another thing. Personally, I think that as in computer programming usually happens, the most restrictive rule applies, and the same should be done here .

Hope you can provide me with a clearer answer (if Google would by any reason index content blocked explicitly by webmasters) :)

Thank you very much for your time in advance!

Christian