Categories: Crawling, indexing & ranking :

How to allow Googlebot to index pages behind login.

Showing 1-3 of 3 messages
How to allow Googlebot to index pages behind login. bryanwklein 12/15/08 7:04 AM
I would like to provide Google with a guest username and password to login and index the information behind the authentication.
We have information that is read only and publicly available with a guest username and password.
Is there any way to provide this to the indexing bot to get through the login and index that information?

We publicize the guest account information for non-contributors, so it would be fine if below the search result Google reported the username and password used to access the site.

Is there someone at Google working on a solution like this?

Thank you,
Bryan Klein
Re: How to allow Googlebot to index pages behind login. luzie 12/15/08 7:12 AM
Hi Bryan,

it would seem that something like what you want to do is possible under a policy called "First Click Free":

     http://googlewebmastercentral.blogspot.com/2008/10/first-click-free-for-web-search.html

-luzie-
Re: How to allow Googlebot to index pages behind login. bryanwklein 12/15/08 7:51 AM
That is close, but there is a key restriction to this service as stated in the following snippet.

"To include your restricted content in Google's search index, our crawler needs to be able to access that content on your site. Keep in mind that Googlebot cannot access pages behind registration or login forms. You need to configure your website to serve the full text of each document when the request is identified as coming from Googlebot via the user-agent and IP-address. It's equally important that your robots.txt file allows access of these URLs by Googlebot."

It cannot get past login forms. We would have to figure out a way to make all content in the CMS available as a separate page that can be read independently of the login requirement of the CMS.

For an example of the kind of information we want to let Google index, take a look at a system like e-CAT.
http://www.axiope.com/pages/index.php

It would be almost impossible to separate the information from the system that stores it.  As it is much more than a news article which it seems that the First-Click-Free system tries to work with.

Still hoping for a better way,
-Bryan