Categories: Crawling, indexing & ranking :

How to know what Google has indexed / How to find out what is in the Index / List of indexed URLs ::: Auto-Response :::

Showing 1-3 of 3 messages
How to know what Google has indexed / How to find out what is in the Index / List of indexed URLs ::: Auto-Response ::: Autocrat 8/12/10 7:14 AM


Please - do Not post your questions in this topic!

---------------------------------------------------------------------

This is an Auto-Response for the (moderately) common questions regarding what Google has indexed from your site.
This is an attempt to compile information for questions/issues such as;

How can I see what Google has Indexed
How to tell what is indexed by Google
How to see what parts of my site are in Googles index
Where does Google tell you what it has crawled
Why can I not see the list of crawled/indexed URLs?
List of what is Crawled
List of what is Indexed
How to see what URLs are indexed
etc. etc. etc.


=============   Google does NOT tell you!   =============

Seriously - there is nothing/nowhere that will give you an exact list of what is Indexed.
Not even in Google WebMaster Tools.

(Yes - I know - you would think they could, and that it would be an obvious thing to do - but they don't, and though asked, don't seem interested in doing it either!)


=============   Worth noting...   =============

There are 3 parts to all of this;
* Crawling
* Indexing
* Ranking
in that order.

Being one does Not necessarily mean the others.
It is more than possible to be Crawled but not be indexed or to be Crawling and Indexed but not ranking/showing in the SERPs!


=============   Sitemaps can help   =============

Though not exactly Specific, you can use Sitemaps to get a "rough" idea of how well indexed your site is.
Remember that;
1) G may still Filter the indexed URL amount due to Duplication, Canonical and low quality pages
2) G will only be counting/including URLs you Submitted via the Sitemap
3) G will not be counting URLs Not included in a Sitemap
4) G wil lbe discounting URLs that are blocked/noindexed/redirect

You can also be a little bit constructively creative with your Sitemaps.
Rather than having a single sitemap - you could submit several,
 breaking the site up into different content types/sections (such as pages, products, forums etc.).
That will help you see with a little more specificity as to what parts are doing well or not.


=============   Use your Access Logs!   =============

Almost all hosting have logs.
Regardless of it being dedicated or shared or virtual etc., there shoudl be logs per Domain!
(If your current hosting does Not supply Server Access Lgos - Seriously - consider moving to a Proper/decent host!)

Such logs are incredibly useful tools, and one such advantage is that of seeing what has been crawled.
Chances are high that if it was crawled and gave a 200 response, G has it in it's Index.
(Remember - being Indexed and showing in the SERPs is NOT the same thing!)

In additional to Server Access logs - there are little scripts around that can write log files from scripted pages.
So if your site is dynamic (php, asp etc.),
you could write your own log files or add to a DataBase to make tracking GoogleBot crawls much easier!


=============   Stop being lazy - go and look!   =============

Seriously - you can Search and see what is indexed or not.

I know that is not ideal - and for larger sites is a massive nightmare in regards to effort/efficiency,
but it will give you the Specifics.
(If you do this in conjuction with split sitemaps and access logs/tracking - you can have a better idea of what to search for!)

So how to search?

Well - use the Operators to your advantage.

Domain search with - site: Operator
(site:google.com)
This should returns results only from the specified Domain.
So you will need to be careful if your site is with a SubDomain (or multiple SubDomains) ("www" is a SubDomain).

Domain search with - inurl: Operator
(inurl:google.com)
This should return results that contain the specified Domain.
This may not be only from the site in question though!  It is possible for other sites to contain your domainname in their URLs (whois.domaintools.com may have such URLs etc.)

Domain search with - site: and inurl: Operators
(site:google.com inurl:google.com)
This way you limit the results to your Domain Only ... and it seems to generate more "reliable" results than the site: operator alone.
(No - no idea Why G works that way - but it does!)

Domain and Path/Query search with - site: and inurl: Operators
(site:google.com inurl:/somepath/somedirectory/)
(site:google.com inurl:?this=that&rabbits=lunch)
This way you limit the results to your Domain Only ... and focus on a specific directory/folder or set of paramters etc.

Domain and FileType search with - site: and filetype: Operators
(site:google.com filetype:html)
This limits the results to those from your Domain, and to a specific type of file.
Please note - the filetype: operator may not show All of that type - it may only work for URLs that end in that type.  thus if you serve
content as html, but without the .html in the filename - they will not show in the results!)

Domain and Path/Query search with - site:, inurl: and inurl: Operators
(site:google.com inurl:google.com inurl:/somepath/somedirectory/)
(site:google.com inurl:google.com inurl:?this=that&rabbits=lunch)
This permits you to start limiting the results to specifica parts of your site


By doing searches such as that - you should be able to see what is indexed.


=============   Start keeping a log   =============

It's more than worth your time to make up a little file/spreadsheet/DB setup that records what URLs have been Crawled/Indexed.
It may even be worth you noting the Dates/Times (as this can help identify popular/important pages as well as what is indexed).


---------------------------------------------------------------------


NOTE:

This is a "general auto-response" post.

This is Not a Topic for discussion;

It is a point of reference to save having to type the same answer repeatedly due to the sheer number of times this question is asked and is meant as an aid for people that don't seem search/read the various other posts regarding this topic.
Thank you for taking the time to read this Auto-Response.

---------------------------------------------------------------------

Please - do Not post your questions in this topic!
Re: How to know what Google has indexed / How to find out what is in the Index / List of indexed URLs ::: Auto-Response ::: Autocrat 8/12/10 7:20 AM
And - as it's liekly to come up for some people reading this,
the topics linked to below may also relate and help answer some of hte questions occuring in your minds right about now :D


And - a reminder (as some people really don't seem to get the hint)...





Please - do Not post your questions in this topic!



Re: How to know what Google has indexed / How to find out what is in the Index / List of indexed URLs ::: Auto-Response ::: alamax 8/26/10 11:10 AM
I do like the way you swing the hammer, dude.