Google Product Forums

Re: Google indexing Strange URLs - URLs with strange Parameters being indexed // ?referer= // ?ref= ::: Auto-Response :::

Autocrat Aug 2, 2010 12:19 PM
Posted in group: Webmaster Central

Categories: Crawling, indexing & ranking :

[Apologies for the Split Posting - G has a stupid Character limit!]



=======   It's causing 404s!   =======

It means that your site shows the correct response for URLs that do Not exist.

No need to worry/stress or panic over that.
404's are not going to make you suffer or get you penalised.
It jsut means G has found a link (or some links) with incorrect paths/parameters,
and isn't getting a "found" response (which it shouldn't - as that URL doesn't actually do anything).


=======   It's doesn't give a 404?   =======

Well - we cover that in a minute (Duplication).

Basically - if you are Not returning a 404 response,
then you are liekly showing the same content as the "normal" part of that URL,
or showing it some other pages content (sending a 302 redirect etc.).

You should ensure that your server/script/site sends the correct 404 response for URLs that do Not exist!


=======   But it doesn't exist!   =======

Okay ... Parameters are not the same as Paths (FilePath).
Paths are like the folder/directory structure of your site,
   /this/that/theother
is a Path. (You can also tack on a file too (FilePath),
   /this/that/theother/somefile.
html

Where as what you are seeing is a Paramter and Value pairing;
   ?this=that

Parameters are used by Scripts as a method of passing data and deciding what to do/show, what to get from a DataBase etc.
Some systems are setup to permit such requests - regardless of the filetype.
Some systems are setup to show scripted pages/files as notrmal non-scripted ones.

MAkes little difference ... it makes a "different" URL.
And it is the URL that counts ... that is what G looks at.
As far as G is concerened;
   http://www.example.com/this/that/theother/somefile.html
   http://www.example.com/this/that/theother/somefile.html?this=1
   http://www.example.com/this/that/theother/somefile.html?this=2
are 3 different URLs - 3 different pages and should show 3 different bits of content,
else it causes Duplication (See next bit).


=======   It's causing Duplication!   =======

That is something to worry about.
Google doesn't really weant to see lots of the same content - no matter the variant URL.
If it sees the same content under different URLs - then it may Filter ... picking 1 version out of the 2+ versions it see's.
The one it chooses may Not be the better ranking one - and thus may impact your rankings/traffic.


=======   Where are they coming from?   =======

These things can occur due to various reasons/sources.

* Some are "trackers" ... the linked to URL is set so that you can see where the link originates from.
* Others may be "look at me's" - some people watch inbound traffic - and if they see traffic from an unknown site - may go look.
* I've seen some people state them as "hijacks" - apparently some systems are setup to automatically do a redirect on certain paramters ... so people may intentionall include such a URL hoping to jack some of your traffic.  (No idea if true!)
* Then there's simply "bad links" - sometimes people miss-type, other times the stuff up the copy and paste process etc.


=======   What can I do?   =======

Well - there are several options ... and which you do/use depends on your setup/skills etc.

--- Setup Server based Redirects ---
If you have Apache with htaccess - (no idea if IIS can do similar),
you can tell the server to examine URLs request for specific paramaters,
and if found, remove them from the URL and redirect to the "cleaned" version.
If G encounters a 301 - then it will remove the bad URL from the SERPs.

--- Setup Script based Redirects ---
Pretty much like the Server based method,
but you use your Script (php, asp, cfm etc.) to examine the URL, and send a redirect resposne if needed.
If G encounters a 301 - then it will remove the bad URL from the SERPs.

--- Start using the Canonica Link Element ---
By including the CLE on your pages - you wil lbe telling G which URL you would prefer
that content to appear on in the SERPs - regardless of what URL G see's it on.
That means though G may see the same content udner the normal URL and hte one with the strange
parameters - it will know to use the "clean" one.
If G encounters a CLE - then it will remove the bad URL from the SERPs.

--- Start using the Parameter Handling Tool ---
In Google WebMaster Tools (GWMT) there is the PHT.
You can tell G to ignore certain Parameters (or Paramters and Values).
This works along the same lines as the CLE method - but instead of telling G which URLyou want it to pay attention to,
you tell it parts of a URL to ignore.
If G see's a match for the PHT - then it will remove the bad URL from the SERPs.

--- robots.txt [Last Resort] ---
And I'll say it again - this is the Last Resort Method!
Robots.txt does Not stop Indexing - it stops Crawling.
Blocking such URLs
 a) may Not stop G showing them in the SERPs
 b) will Not automatically remove the Bad URL from the SERPs
(If you want the URL removed from the SERPs - then you will have to use the URL Removal Request tool as well)


------------------------------------------------------------

NOTE:

This is a "general auto-response" post.
This is Not a Topic for discussion;


It is a point of reference to save having to type the same answer repeatedly due to the sheer number of times this question is asked and is meant as an aid for people that don't seem search/read the various other posts regarding this topic.
Thank you for taking the time to read this Auto-Response.


Please - do Not post your questions in this topic!


(And as there are people that are Hard of Reading ...)


                     Please - do Not post your questions in this topic!