Categories: Crawling, indexing & ranking :

How to stop Google from crawling the IP address and only use the domain names?

Showing 1-19 of 19 messages
How to stop Google from crawling the IP address and only use the domain names? mase.bay 12/4/09 5:06 PM
I'd like to ensure that Google will stop crawling our IP address and instead always use the domain name. They do crawl the domain, but seem to also be hitting the IP address of the same website.

I don't believe I can add the IP address to the robots.txt file as those are all relative paths. I have put in some code to redirect any requests to the domain, but it's still happening. What can I do? Do I need to use the .htaccess and is the right route to go? I know about the hosts file, but am not familiar with editing it or how this affects traffic.
Re: How to stop Google from crawling the IP address and only use the domain names? O-Dog 12/4/09 5:17 PM
G will index a relative IP with it's title and content(been, done and left), don't mess with .htsaccess.... g bot see's a relative and will treat it as so. hosts file is only relative to the DNS. Best practice is to add no follow to those which do not need be addressed.

^..^
Re: How to stop Google from crawling the IP address and only use the domain names? mase.bay 12/4/09 5:21 PM
It's the same website though. Are you suggesting I embed in my code for every single page to render a no-index tag when it's accessed via the IP address?
There's got to be an easier way, no?
Re: How to stop Google from crawling the IP address and only use the domain names? O-Dog 12/4/09 5:35 PM
practice... if links reference an IP address G bot will follow regardless of..... ie...

 http://216.58.112.44/ffav2.htm

nothing there... followed anyways....
Re: How to stop Google from crawling the IP address and only use the domain names? O-Dog 12/4/09 5:36 PM
but no documents...
Re: How to stop Google from crawling the IP address and only use the domain names? O-Dog 12/4/09 5:39 PM
take that same address and punch the url..... very mysterious....
Re: How to stop Google from crawling the IP address and only use the domain names? webado 12/4/09 6:06 PM
If the server is Apache then you have to use the .htaccess file to 301 redirect all accesses to the single preferred canonical form.
 
See here:
Re: How to stop Google from crawling the IP address and only use the domain names? mase.bay 12/7/09 5:47 PM
The server is IIS 6. That article is on apache only. At the bottom there was a link to an approach for IIS, but it involves code.
I was hoping that there was a setting in the Google crawler dashboard, or a way to restrict it without code on the server.
Still looking for the best answer....
Re: How to stop Google from crawling the IP address and only use the domain names? webado 12/7/09 6:00 PM
Well no, you need to do 301 redirections. So code is needed, or access to the IIS console, or both.
 
Your pages also can use the rel="canonical" microformat in the head where you state the exact correct url of each page, but this is not the same as redirection. Once the urls with IP addresses have been indexed they will be extremely hard to get rid of without 301 redirection.
 
 
This is one of the main reasons why IIS hosting is nasty, this inability or difficulty to manage the site and its urls.
 
Re: How to stop Google from crawling the IP address and only use the domain names? Autocrat 12/7/09 6:05 PM
You cannot tell G which URL to crawl via GWMT (as far as I can figure).
You have to handle it at the server level (or via scripts).


You could try using the Canonical element - and ensure each/every file/page points to itself using the prefered URL (including the DomainName).
Failing that - you will either have to setup IIS to handle IP requests - or script in URL checks to look at the Host/Domain request and send a 301 Header()
Re: How to stop Google from crawling the IP address and only use the domain names? webado 12/7/09 6:11 PM
You could add the IP address url as a new site in Webmaster Tools. Verify it. 
Then set up a change of address in Webmaster Tools and tell it the new address is the domain url. Make sure you use the preferred canonical form fo that domain to avoid any further issues.
 
Use this in conjunction with the rel="canonical" microformat.
 
But again this is not the same as a proper 301 redirection and it will probably be hard to dislodge the old IP based urls in favor of the domain based ones.
 
Good luck.
 
 
Re: How to stop Google from crawling the IP address and only use the domain names? Autocrat 12/7/09 9:19 PM
Can you verify via IP?
(I thought that would only work if you have a dedicated server?)
Re: How to stop Google from crawling the IP address and only use the domain names? webado 12/7/09 9:25 PM
If the site is accessible at an address like 123.123.123/~somesite/ you surely can verify it. What you cannot make use of is a robots.txt file to call your own because it would be looked for at the root level. However there is likely no blocking directive in the robots.txt located at 123.123.123 (if there is any robots.txt file there) because if there were, then the site would not have been indexed under the IP.
 
 
Re: How to stop Google from crawling the IP address and only use the domain names? Autocrat 12/7/09 9:32 PM
Can you use the CoA tool for such URLs?
(Not sure if you can use it on SubDomains/Directories?)

If that works - cool ... I just learned something new ;)
Re: How to stop Google from crawling the IP address and only use the domain names? webado 12/8/09 5:20 AM
You have a point there...
 
 

Change of address

Setting is restricted to root level domains only
 
 
 
 
So indeeed, unless the IP address is the root level of the site the change of address cannot be used.
 
 
Re: How to stop Google from crawling the IP address and only use the domain names? JohnMu 12/8/09 2:37 PM
I'd use the rel=canonical if you absolutely can't set up a 301 redirect (if you have access to the IIS dashboard it's usually easy to set up though). Another thing that can be done is to make sure to use absolute URLs for your internal links. That way, even if we start crawling a URL from the IP address, all links will point to the correct version.

Cheers & good luck :)
John
Re: How to stop Google from crawling the IP address and only use the domain names? daveNovee 1/15/10 8:49 AM
This is right on.  Very helpful.  I have a very similar issue.  When I do a link: (my ip number) operator search the result is hundreds of pages from my domain.  When I look at the pages to see if my IP is in any of the links on those pages, only our domain.com comes up.  When I do the same link: operator search with my domain (link:mysite.com) the same pages come up.  Why would / could that be?  I have a IP 301 redirect in place and a canonical.  We do not, however, have all absolute URLs.  Is the suggestion to verify the IP in W.T. a good idea?  Hopefully this doesn't happen to anyone else...
Re: How to stop Google from crawling the IP address and only use the domain names? mase.bay 1/21/10 5:12 PM
Thanks everyone, these are all very helpful ideas and solutions.
My site does use absolute URL's for the links, but it's all dynamically generated based on the root of the URL for easy configuration all the way from localhost to staging to production.

I think I know what I can do which will be the easiest. I will setup a new Website node in IIS and set the host header to be only the IP address. Then I can set that new website node as a 301 redirect and tell it to always reference the site through it's domain. This means if anyone does somehow get a G search result using our IP address, they'll click link, IIS will receive the request and 301 redirect them to the same page they wanted, only with the domain name in the URL instead of the IP address.

Should have thought about this before, as it's the easiest way to solve this problem. I will just have to setup the site in GWMT, verify it and request it be crawled
Re: How to stop Google from crawling the IP address and only use the domain names? webado 1/22/10 6:04 AM
>>I will just have to setup the site in GWMT, verify it and request it be crawled
 
You'd have to do that before putting in the 301 redirection or it won't get past the verification point.
 
There's actually no need to add a site which you intend to redirect elsewhere to WMT IMO, unless you just want to see crawl error messages all the time rgarding the redirections.
 
Once you 301 redirect everything, all the  urls  currently indexed with the IP will eventually drop out in favor of the same urls on the domain to which you redirected them. Even if it take some time, it's nothing to worry about since they won't be impacting the indexing of the domain urls any more, they will be able to stand on their own.