|301 redirected being "Phantom indexed" after 1 year||Mike0000||12/18/11 1:08 AM|
I have read the FAQs and checked for similar issues: YES
My site's URL (web addresses) are:pc ap dot com, las vegas - nv dot com
Description (including timeline of any changes made):
I have a number of pages indexed in Google, which were LONG AGO 301 redirected. I'm pretty sure they were gone at one time, but recently they have come back and I strongly suspect are causing duplication penalties.
Google does not seem to be removing certain 301 redirected pages from the index. Among them is the redirected home page pc ap dot com which has been redirected to las vegas - nv dot com home page, but there are a number of others as well. Maybe this is not unusual. Perhaps I don't have a problem at all. So if you know of others instances let me know, and it will tell me if I'm off on a wild goose chase or not.
The URL was 301 redirected over a year or more ago. If I type the old url name (pc ap dot com) into the google search field it is still there, but it does NOT show up in "site:pc ap dot com.
For example say you have domaina.com/abc.htm which has been 301 redirected to domainb.com/xyz.htm. Type "domaina.com/abc.htm" into the Google search field and see the result.
I get back in the search results a listing which looks like the following rather than the expected NOTHING:
Title from Page Xyz
Description snippet from page Xyz
Of course once you click on it, it goes to domainb.com/xyz.htm because it has been redirected. Now type "domainb.com/xyz.htm" into the Google search field.
and get the expected:
Title of Page Xyz
Description snippet from page Xyz
I suspect these are being conceived by G as duplication errors? They also seem strongly to appear to NOT be passing PR on to the destination page, probably since G still thinks the old pages exist! It seems to be more prevalent with OLDER redirections on two of my sites which have recently suddenly lost PR rank.
It seems to be MOST predominant with redirects which have been there for more than a couple months, but NOT all of the older ones. I have yet to come up with a pattern. ALL of the more recent (in the last month) 3011 redirects which I added to the SAME .htaccess files appear to ALL be working correctly and NEVER malfunctioning like this.
It also SEEMS to be more likely with urls which have been 301 redirected to ANOTHER domain, but not necessarily all.
In my case both domaina.com and domainb.com are both on the same physical Apache server, but use different IP addresses. Most other 301'd URLS in the same .htaccess that go to the SAME domain seem to work as they should and not show up in the index.
If you google search "pc ap dot com" or "www.pc ap dot com" or
"www.pc ap dot com/las vegas.htm" you get an index result with a snippet from the page it has been redirected to: www.las vegas - nv dot com
las vegas - nv.com/lvnight.htm
las vegas - nv.com/las-vegas-coupons.html
produce similar results when redirected to the SAME domain
I can e-mail other examples of other affected urls as well as unaffected urls, and full .htaccess to seriously interested parties if needed.
Browsers seem to work fine with these pages. None of the other search engines seem to have problems with these.
They get crawled every couple of days. I have already sent fetch as googlebot requests (with expected 301 results) and resubmitted all combinations via WMT. I would send a "removal request" via WMT but many of these are VERY high PR ranking pages (they were our old home pages) still much backlined and we would prefer not to loose the PR.
|Re: 301 redirected being "Phantom indexed" after 1 year||alistair.lattimore||12/18/11 3:58 AM|
I think there are a few different elements at play with the scenario you described.
First the 'phantom' redirects, they aren't phantom - it is Google trying to show the user the most relevant search result. For example, I redirected my employers website from ab.com to a.com over two years ago now. A search for [site:ab.com] yields no results and hasn't for a very long time but if you search for [www.ab.com], Google will show the home page 'snippet' from a.com but use www.ab.com as the domain.
I suspect the logic from Google here is that you've specifically searched for a domain name and they know that ab.com was redirected to a.com and showing the user ab.com is more relevant and doesn't harm anyone since it actually does redirect to the same website. Imagine the scenario where your business name has changed completely & a user searches for your old domain name - the behaviour you and I have describe is going to be a far superior search result to show the user than the new domain (completely unrelated to the old name) that the user doesn't recognise as the old domain name or business in anyway.
Next the redirect is working as you would expect it to and it isn't going to be causing you duplicate content penalties. If you could access all of the URLs/content from your Las Vegas website but by using your pc ap dot com domain, then you'd have duplicate content issues.
Google have said in the past that they have problems when you 301 redirect some of a website and they get mixed signals about the content. Normally they like to see a site wide 301 redirect for every URL to a new corresponding URL. If you send back HTTP 200 responses from that domain or other sub-domains, Google have stated that they aren't always sure whether or not they should action the 301 redirect. In your instance, you're redirecting the home page of pc ap dot com but still have lots of other content indexed under that domain. For me personally, I'd just remove the 301 redirect from the home page and move on - that doesn't seem to be a relevant redirect to me and Google could well be ignoring it due to poor correlation between the content on pc ap dot com and your Las Vegas website.
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||12/18/11 3:50 PM|
Sounds like you have an example of almost the exact same thing (only difference is ours produces slightly different results for both the www and non-www search). At least I've found one other example! However you don't mention if a.com was Panda-lized and/or suddenly dropped in PR recently like ours?
I'm afraid I have to disagree though with much of your explanation for the following reasons.
The thing is, there should be NO content in the index for these pages at all (and in most other cases there isn't when you have a 301 redirect), thus they SHOULDN'T be the most relevant result for anything. Even when they index the home page for a whole noindex'd site as I've noticed they do, they do not place any snippet below it.
The old home page of pc ap dot com and pc ap dot com/las vegas dot htm ARE (and always have been) exactly the same page as the current home page of las vegas - nv dot com. It was just moved when we got the new domain. At least earlier versions of it from the time when we transferred. Based on archive dot org it was 301 transferred back in 2005. It was gone from the index then. A lot of old, authoritative sites still link to the old versions though (under the vegas topic) and provide a major amount of PR, so we can't simply forget about it.
The old domain originated in 1995 when we (and most others) had no clue about all this silly SEO stuff back then - we just produced informative content for PEOPLE (not SEs) long before SEs even existed (remember lists and rings?). If you look closely, you'll see almost every page of each domain links to pages of the other domain, thus I think it would be safe to say both would be considered the same topic. Even G's WMT seems to agree based upon the primary keywords they assign each. Our primary company oriented page is buried at pc ap dot com/pc ap.htm
The reason this whole cross-linking thing had to be done originally was due to load balancing issues. In the 90's (pre-G intervention) when we would exceed our bandwidth limits on our host server every week day, we had to distribute the most popular pages onto several different servers/domain/accounts or be automatically shut down for an hour or more at a time by our hosting company for exceeding our hourly/daily bandwidth (guess very few people on here had to go through that or remember those days now). Anyway, enough history, we're down to only three domains now and hopefully soon only the one.
Except for a handful of company oriented pages TOTALLY ALL the rest 95% of the content on pc ap dot com is totally related and in fact integrated to the other domain. Check the links on the home page. We should have moved it over long ago, except they were both doing so well on all the SEs (until G's changes in March) that we were afraid to mess with it. Ironically, ALL of the new 301's we've done on those domains in the last month are working fine. Only some of the older 301s seem to have reversed (i.e. phantomed). On the other hand there are some 301s from pc ap to lasveg as-nv which would be far worse matches, as far as similar content, which are working great. I think that part of the theory is busted :-).
Additionally, most of all, this does not explain all of the other examples (and I have many more) which I gave such as the lvnight and coupons.html which are simply page renames, of the same page, ON THE SAME domain? One common factor COULD possibly be that these are all highly externally and internally linked pages (at least at one time in the past). That's why we 301'd them rather than moving and ignoring them.
Also it doesn't explain why doing a site:lasve gas-nv dot com comes out with an extremely convoluted ordering as if all the PR (which USED to flow through the two redirected pc ap pages but suddenly is no more) suddenly vanished overnight. Oddly, a similar site: search followed by just about any keyword (e.g. vegas) produces a nearly perfect ordering . A site:pc ap dot com search looks perfect.
I had not seen the statement from G about partial redirections. I find it VERY hard to believe it is not common that a part of a person's or company's diversified website get so large and popular that it is advantageous to split it off into it's own website of that topic, leaving half of the old one intact. That is sort of what we are doing here by moving all the topical content on to the largest domain. If that sort of thing is a problem for G I think THEY need to rethink their apparent strategy of compartmentalization and ASSUMING that every ENTIRE site HAS to be dedicated to only ONE topic and is not allowed to be about more than one unrelated thing.
I have considered, partially as you suggest, un-redirecting the old home page (probably put up a page saying the new home page is on the other domain with a single link to get some of the PR passed through) then waiting a couple days for it to be re-indexed (Gbot visits it at least every 2-3 days), and then try the 301 redirect anew.
I also considered submitting for the other phantoms to be removed from the index, but what happens if you submit a 301 redirected page? Will it remove the page it is redirected to? Will it still pass external pagerank? I doubt it.
Thanks for your response,
|Re: 301 redirected being "Phantom indexed" after 1 year||alistair.lattimore||12/19/11 4:43 AM|
Regarding Google returning a 301 redirected URL when you say that they shouldn't, I liken that behaviour to Google changing the <title> or META description on a page even when they are set to something that they feel is more relevant. You think that doing what you've asked them to do is more relevant but Google think that showing the 301 redirected URL is and I suspect that they've run tests to indicate that is in fact the case. I'm not making excuses for them but in general they have reasons why something happens and it isn't by accident - so as webmasters we just have to roll with it unfortunately.
If I were to offer a few suggestions in general:
* Get an exhaustive list of sites that refer traffic tand link to pcap with las vegas related link text. You could use your web statistics software, Google Webmaster Tools, Bing Webmasters, Open Site Explorer, Majestic SEO and a few others. Then I'd start contacting those site owners to get them to move the link across to your las vegas domain, so that you can cleanly separate your two domains once and for all.
* You can access your two domains content on both domains at the moment, which has the possibility of causing duplicate content issues. If you can, I'd setup 301 redirects for all of your las vegas content on pcap to move it over to your las vegas domain
* I'd seriously consider updating quality of the HTML used throughout your site and a design update. If you do go down this road, make sure you only do one thing at a time - update your mark up or update your design while keeping your content identical. Don't make sweeping changes that involve changing URLs, updating <title> text or META descriptions - just small progressive updates one at a time until your site has had a bit of a spruce up - it feels really tired at the moment.
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||12/26/11 3:17 PM|
>>You can access your two domains content on both domains at the moment, which has the possibility of causing duplicate content issues.<<Yes there is content on both domains (as there has been for 15 years with no negative effects), but absolutely NONE that I know of is duplicated on the same domain or even across domains.. If you see any identical content on both domains (Other than G's mis-indexed phantom 301 pages - that this thread is complaining about) PLEASE, please let me know ASAP so I can fix it.? We are EXTREMELY careful not to EVER have two copies of the same page ANYWHERE.
Al, Thank you, I'm glad you agree with me. That is PRECISELY what I HAVE BEEN TRYING to do, and started doing months (actually even years) ago and when I started noticing these issues and that is what is CAUSING these issues. That is WHY we're getting these issues because of attempting to move and 301 redirect all the lv content on pc ap to the new domain one page at a time (some unrelated pages will ultimately remain there)! But as evidenced, G is not handling them as it should. If it WAS they would all have already been transferred and pc ap would only have a couple unrelated pages left on it. I can't really justify moving more at this time and doing more 301 redirects and risking yet more phantom duplication penalties until I find a way to clear these ones up. Instead we have been concentrating (this week) on getting all the old on-domain links caught up with the past 301 redirected moves (which again, if B and Y are any indication, should NOT be an issue - but we're doing it anyway since someone suggested it and we can't find any other logical explanations).
As far as contacting external linkers. I even started doing that (tedious with 10's if thousands of them), but in 90% of the cases they are either 1. blogs and forums which either don't know how, to change them, or in most cases can only offer to delete them altogether. Kinda violates the whole intent. 2.auto generated pages from other countries who don't speak the same language, 3. Authority sites which don't have time to make the change and or don't respond. Better to have an authority link to a 301'd page from a PR4+ or none at all?
An update: I tried to get one of the phantoms to go away. About 5 days ago, using pcap dot com / lvnight dot htm as a test page I removed the 301 redirect from .htaccess, and replaced the page with a unique place filler file AND put NOINDEX in the file header, to basically allow it back in the index (since G seems to want it there anyway.) and hopefully, better still, remove it altogether via the NOINDEX. I figured I could try redirecting it again later or at the very least just keep it this way and get rid of the duplication penalty. But G is refusing to cooperate there too! If you search right now for "pcap dot com / lvnight dot htm" It NOW shows the pcap dot com / lvnight dot htm url, the title and snippet from lasvegas - nv dot com/las - vegas - nightlife dot htm (where it was previously 301 redirected), and the cache (if you put the cursor to the right side of the screen on the >>) shows the correct new page which is at pcap dot com / lvnight dot htm with a <META NAME="robots" CONTENT="NOINDEX, FOLLOW"> in the header. This shows, that even though G KNOWS the page is now different, AND should know it is marked NOINDEX, it is continuing to index it, WITH the WRONG content internally (by virtue of the snippet and title), which I can only surmise is still causing a duplication penalty internally. Totally MESSED UP! It's not even following NOINDEX instructions anymore. (Unless G eventually fixes it) Anyone can go look at these results right now with their own eyes. Check the source of the pages too.
Meanwhile. last week, I did just a simple 301 redirect of a bunch of expired pages on the new domain to a single appropriate "expired" page on the same domain, to avoid further duplication issues (say compared to changing them all to separate identical expired template pages). Out of about 70, 12 apparently random ones of them, instead of being gone from the index, are now phantomed with all the old urls coming up showing a snippet from the same new expired page they are redirected to. All the rest 301'd just fine and are gone. Doing a "site:las vegas - nv dot com" of course still shows a random ordering of all the pages consistent with a duplication penalty.
I get frustrated when everyone (not just you Al) keeps blaming me or the site and conjecturing about POSSIBLE MINOR infractions in the site content, etc. (which no one knows for sure are even a Panda factor) as if I am haphazardly tossing sites around, as opposed to addressing a glaring, concrete, reproducible result in the search results. Not that I am saying there are NOT issues on the site that have accumulated over the many years and various hands that have worked on it. These have been/are being identified and are being prioritized and addressed daily ongoing, and some of the most glaring that we agree MAY have had the most impact have already been addressed with no significant impact. Actually one fix early-on DID seem to have impact. It involved a single major page (not a duplicate) which was weirdly being doubly indexed by G in a very odd alternate way. These sites have both been around for over a decade and had been ranking excellent with no indication of these phantom results prior to this past year. We are extremely diligent about studying each move carefully in advance and moving things slowly, as you yourself suggested, specifically to avoid duplication issues. And it shows in all the other search engines indexed content being perfect for us entirely as expected. If it was something I was doing wrong I think we could expect the supposedly "inferior" Bing or Yahoo to have a similar or worse issue with them, or SOME indication of a problem there, but as much as I have tested there, I can't find a single instance on either of them with these sort of remnant results. I truly wish I COULD find a problem there, as it would confirm something concrete I could address and fix. Meanwhile our rankings have remained the same higher over there. Therefore, objectively, I have no choice to assume this is anything but strictly a G thing, not site coding or configuration on our part. Many others have reported unexplained / unexpected pandalization (just look at the length of the thread on THAT topic), and I'm wondering if this could be the explanation for SOME of them. About the only thing G HAS positively stepped forward and confirmed is that Panda relates to CONTENT DUPLICATION. If these PROVEN phantom examples are falsely causing internal duplication in G's index, which by all available appearances they seem to be doing, it could certainly explain WHOLE a lot. So far I can't see a pattern and the reasonable, practicable suggestions others have put forward (here and in other forums), are definitely in the works, but as yet are not producing any positive results or further provable evidence.
That is why I proposed this thread to see if others can look and find their own examples of this happening and maybe we can come up with some common factors. Other than you Al, it appears no one else has looked or found any. So perhaps it IS something only we two are doing wrong.
Scientifically and objectively, if you have three voltmeters and two read one value and the third shows something completely different measuring the same circuit, over and over again, who are you going to suspect: the circuit being measured or the one meter? Or does one make excuses for the third meter and blame/condemn the circuit rather than replace the meter batteries or get it fixed
.All the best,
|Re: 301 redirected being "Phantom indexed" after 1 year||alistair.lattimore||12/26/11 10:26 PM|
Regarding contacting third party websites to get your links updated, of course a link through a 301 redirect is better than no link at all. The only reason I mention that is that there is some amount of PageRank lost as links flow through a 301 redirect. However, my reasoning for updating the links was more to do with you being able to safely decouple your two websites entirely without fear of loss of rankings.
On the topic of loss of rankings and being able to decouple your sites, you don't need to leave a 301 redirect in place forever (though you can if you like). Google, more precisely Matt Cutts, has previously stated that you really only need to leave 301 redirects in place for a few months and Google understand that it is now a permanent move. In your instance, since the home page of pcap has been 301 redirected to your Las Vegas website for such a long time - I suspect you could simply remove the 301 redirect and nothing would change. You could test that theory by removing a different long standing 301 redirect from pcap to Las Vegas and monitor the rankings of the page on the Las Vegas site over the following 3-6 weeks to see if it has an affect or not.
I'm seeing some strangeness with your lvnight.htm page that you applied the meta robots noindex tag to. As you indicate, searching for the URL directly shows it in the search results and using the site preview - I can see the screenshot of the replacement page. Using a cache search operator on the pcap lvnight page shows it is returning your Las Vegas night life page and was last crawled on 23 December 2011. I'm reasonably confident that Google periodically check on 301 redirects from time to time to make sure nothing has changed, but they don't crawl them as regularly as normal content. It is possible you're some how in the middle of that process at the moment, where you can see they've crawled your replacement page (via preview) yet a cache search shows that URL still pointing at the Las Vegas night life page. I agree it is odd that the replacement page is showing up in the search results given the meta noindex but I'd let that slide for the moment and give it another couple weeks to see what happens.
With all of the effort you're investing into this particular issue, I'm unsure if it is worth it - as I don't think you've indicated above that it is actually causing you a problem. You've stated that you've enjoyed strong rankings for a long time and that nothing has really changed recently, you certainly haven't been hit by the Google Panda algorithm or your traffic would have fallen off a cliff. Given that you're rankings are holding and traffic has remained steady, what makes you think you even have a problem that you need to address? For all you or I know, this could simply be a glitch in the display of the search results that has no material affect on the internal mechanisms that Google use to calculate what page should rank in what order for a given phrase.
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||12/26/11 10:42 PM|
OK http://pcap.com/ is 301 redirected to http://www.pcap.com/ which is 301 redirected to http://www.lasvegas-nv.com/ . So things will be slower.
Last modified on Dec 16.
Was it maybe blocking before?
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||12/30/11 3:18 PM|
Prior to Dec 16 there WAS NO robots.txt at all. I just added this generic one at the suggestion of someone.
I'm particularly concerned about how the pc ap dot com and www pc ap dot com are both showing up in the search and likely being a big duplication penalty. The site has been been canonicalized for years to www. Apparently there is no way to get rid of them either if lvnight is any indication.
>you don't need to leave a 301 redirect in place forever <
>reasonably confident that Google periodically check on 301 redirects from time to time to make sure nothing has changed,<Isn't this contradictory?
Kind of my intent with lvnight, except, after 10 days un-301'd AND noindexed (I did it on Dec 19), it's STILL phantomed and acting weirder than ever as you yourself noted. I have seen the preview display come and go twice now. It was also showing supposed caching from Dec 23 and Dec 29? both still showing the nightlife page that it USED to be 301 rediredected to!!
If I gave that impression it couldn't be further from the truth. As I thought I mentioned from the start the new pages redirected TO, USED to rank very well for over a decade, and still do ok on Bing and Y (who have none of these phantom issues), however now they display the typical symptoms of having a duplication penalty on Google, and of course ranks and traffic DID fall off a cliff for them overnight corresponding with the last major Panda update. PR for the phantom 301d-to pages all dropped at least one point (4->3) as if they suddenly stopped getting backlinks passed through. Others changed only slightly as far as I can tell, but that would be expected since the pages being redirected to are key pages on the new site and if they lost their 301 redirected backlinks it would affect everyything they in turn link to. Also if you do a site:las vegas dash nv dot com you get the typical totally random listing usually indicative of a duplication error. The home page is on the second page of listings and the other main pages are nowhere to be seen! A site:pc ap dot com looks fine.
If this is as it appears, this could be quite an issue for Google. It would be quite an easy thing for say, someone to 301 enough dummy pages on their domain to the key pages of a competitor to ensure getting some of them into this phantom mode and thus penalized. The competitor would plummet and never have a clue why since they wouldn't even know where to look for the phantomed links.
Happy New Year all!
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||12/30/11 3:36 PM|
Whatever is being 301 redirected elsewhere ceases to exist and have value on its own. So any would-be competitor that would 301 redirect any of their urls to mine, good, they simply lose their urls. I couldn't have destroyed them better myself if I wanted to LOL
There is no duplicate content penalty as such, there is only lousy, dismal indexing and ranking and hence, performance, depending on how exensive the duplication is.
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||12/30/11 3:40 PM|
Whatever you 301 redirect elsewhere must not be blocked in the robots.txt file.
The reason you 301 redirect a url is to ensure robots and people continue to find it, and to let it keep its accrued value (i.e. backlinks).
|Re: 301 redirected being "Phantom indexed" after 1 year||alistair.lattimore||12/31/11 5:21 AM|
>> I'm particularly concerned about how the pc ap dot com and www pc ap dot com are both showing up in the search
A search for [site:pcap.com -inurl:www] yields only two results for me - host & ns3 subdomains but no content from your www.
>> You don't need to leave a 301 redirect in place forever
Following is my understanding/interpretation of how Google act on a HTTP 301 permanent redirect.
When you issue a 301 redirect from URL A to URL B, Google will continue to crawl URL A with the existing schedule for a period of time. After they've crawled URL A a few times (say 2-3 maybe) and see that it is consistently redirecting to URL B, they will begin to act on the 301 redirect on their side of the fence. Some time later the 301 redirect will have taken complete affect & URL A will no longer be present within the search index.
At this point the frequency that Google crawl URL A to check if the 301 redirect is still in place is reduced. At some point in time near here, it would be safe to remove the 301 redirect with respect to it having a detrimental affect on rankings, as internally Google have moved all links that were pointing to URL A to now point to URL B. If however you still receive a lot of traffic into URL A, maybe referrals from other websites for instance - removing the 301 redirect in this instance would be a bad move because you'll suddenly stop redirecting those users to URL B - which was the intention all along.
Matt Cutts covered this topic in a Google Webmaster video on YouTube, referenced below for your convenience.
There are a myriad of reasons why the PageRank of a given URL might drop, links to that page could have been removed on other websites, links to your site have been devalued for a number of reasons, Google detected that you were selling links and more. I don't think there is a relationship between your 'phantom' URLs and the fact the PageRank dropped - to me they are not connected like that.
I think the thing to be mindful of with your ranking drop that coincides with Google Panda, there are a raft of things involved in that - not just duplicate content. It is fantastic that you're investing the time to fix your duplicate content issues, however once you feel you've done a good job and followed best practice techniques - it is time to start planning your next move.
If I were you at this stage, as I said above - I'd start planning a more aggressive move to separate your two domains. At the moment you interlink between the two of them all over the place and there is a lot of Las Vegas content still residing on pcap that should be moved permanently over to you Las Vegas domain.
Next I'd start going through your site with a fine tooth comb looking for thin or low quality content. If Google detect you've got a lot of thin or low quality content on your site - part of the Panda algorithm is that your entire website will get penalised, not just the low quality URLs. Removing, updating or merging your low quality URLs to produce a better over all experience from a content standpoint is going to help you.
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||12/31/11 6:19 AM|
That's just it. If there exist any links to the old urls on the which are now being 301 redirected to other urls, then the 301 redirections must remain in place for as long as those links continue to exist in either live page or cached pages on the web. Forever is not too long in this case.
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||1/1/12 4:47 PM|
Webado: Not sure what you're talking about in the robots.txt - I'm not BLOCKING anything at all there.
As far as:
This is not true with respect to the "phantom" indexed items that this thread is all about. Reread the earliest posts. In the specific examples I gave, the 301 redirected (these are not ALL that are redirected in the .htaccess, only about 1/8 - so the majority DO work fine) files started showing up in the index again (only if you search for them by url - but obviously still in there nonetheless where google can access them) after being gone for years. These are corresponding to some key pages previously very well ranked pages which have been hit hard in the last Panda update. Other pages not having these phantom links are ranking well pretty much as they did previously. Thus, among other clues, leading to my deduction that the two are related to a duplication penalty, especially since the descriptions on the old indexed urls matches the content of the new page. My scenario with the competitor was based on the assumption that the duplication would penalize the competitor as it seems to be on my two sites.
Alister: Do NOT type "site: pc ap dot com" simply type "pc ap dot com" (a total of only 7 characters) itself and "www dot pc ap dot com" (11 characters total), or any of the other examples I gave, into the google search field to see what I am talking about.
Almost all these pages are widely linked from offsite backlinks, and per the logs, Google constantly still follows those links and spiders many of these phantomed 301s almost daily (AND gets a 301 returned as proven in the log and "fetch as googlebot" tests), So I'm expecting to have to leave them there indefinitely. No biggie.
Great advice on splitting the sites Alister, that's precisely what we HAVE been doing for almost a year until we ran into these discoveries. What's still left on the old domain are just a relative handful (<100) that were still ranking well on the engines, and we didn't want to risk those ranks with them plummeting after the move due to phantom duplication. We're going to give it a leap of faith shortly and just move them all over with the 1 in 8 odds of losing some of them.
Of course we've also been going through the site as you suggested. Ironically, the pages doing the best are what _I_ would think G would consider thin content (one screen pages with very little text and one large image - maybe because they are well backlinked) while large pages full of unique content are not?
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||1/1/12 5:41 PM|
I was saying in geenral what you redirect must not blocked in robots.txt, or the redireciotn will not be found.
The only way a url you have which is being correctly 301 redirected elsewhere will still be indexed is if it's not yet been crawled. In other words the redirection has not yet been found in the particular datacenter whos eindex you are accessing. You cannot hurry that up. It will happen when it does. In the meanwhile it shouldn't matter to you since any access to the OLD url will end up at the NEW one .
To find all urls indexed from a site you have use the site: operator. Searching by [pc ap dot com] will find pages where that is used as a plain character string, not pages which have that in their url.
For people who use google search boxes instead of the browser address bar, depending on other settings they may end up accessing the site rather then performing a search.
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||1/2/12 2:12 PM|
I agree 100% with everything you say. At least in the IDEAL world that is exactly what SHOULD be happening.
However if you simply TRY the examples I listed, (you apparently have not) you would confirm for yourself exactly what I have been saying that this is not always happening in Google.
Wrong. The example in your quote is a prime example. That is why I call them Phantomed indexes. The site: operator obviously does NOT tell the whole story of what is in G's index. They DON'T show up under the site: operator, but they are DEFINTIELY known by and IN the Google index and DO show up when you search just on the old url. I am not the only one this is happening to either.
Apparently wrong again. These pages have been 301d for many years in some cases, and Googlebot still crawls many of them nearly every single day (always getting a 301 according to the logs) and obviously KNOWs they have been 301'd based on the preview display and cache, yet continues to index them. The proof is sitting right out there. I'm not making this up.
No it does not matter to user browsers, Bing and Yahoo who are all doing things properly, but I DO believe it matters a lot to Google ranking. Since they are still in the index multiple times (identically listed under the old url and the new one), the old one by definition of 301 redirection is an exact duplicate of the new page. G sees them both and as far as it is concerned they are exact duplicates. Under Panda they have admitted duplication is being treated much more harshly than in the past. Worse yet, my experiment mentioned above has proven that once they are in the index in this manner they are there for good and cannot be removed even by un-redirecting the old url and placing a NOINDEX in the file itself at that location! The evidence is sitting out there in the G index for all to see. And even if you change the new page to make it different, Google continues to re-cache the new page under the old url AND the new url (as proven), so you can never escape the duplication penalty.
Google NEEDS to fix this ASAP.
The exploit I was concerned could be done would be to create 1000 NEW dummy pages on your domain. Put them in your site map to get them indexed by G. Then, once indexed, immediately 301 redirect them all to a competitors home page and remove them from your sitemap. From my odds calculations this creates on average about 100-150 (remember this only occurs in about 1/8-1/10 of redirects in my experience, YMMV) copies of his home page in the index with no backlink PR, resulting in one heck of a duplication penalty. You couldn't find the index entries though unless you knew the exact urls used. Even if you did, they can't be removed.
Hopefully G will look into this in our lifetime.
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||1/2/12 2:35 PM|
Yeah ok pcap . com shows up when searched for that character string but the cache indicates it's the cache of http://www.lasvegas-nv .com/ so the redirection has been found.
Since it doesn't show up in a site: query this simply means it's not a contender for being returned for any real search. A search for the domain itself isn't a real search. If you know the domain you put it in the browser address bar and go there. And in your case you'd end up at the new address.
It's probably not been long enough since you implemented correctly the 301 redirection.
There's no exploit.
In order to find that your server 301 redirects them they have to be requested for crawling! So of course your server log is going to show it.
You seem to think Google keeps in its memory the mere fact that something is redirected somewhere else and will go straight somewhere else. No, it tries to crawl them and winds up wherever it does.
Not sure why you are so concerned about this.
In your place I'd work to improve both sites, because, sorry to say, they don't present well at all. To many ads right in your face. Quite possibly penalty material.
The better you make them the quicker everything will fall into place properly.
|Re: 301 redirected being "Phantom indexed" after 1 year||cycle-hire||1/6/12 3:02 PM|
I have read this entire thread and see that your points are being deliberately mis-understood or being ignored through inexperience - despite obvious efforts to help!
Phantom 301s have been appearing for me too.
ie. I too have seen the phantom 301 problem get worse recently, and I'm only just getting around to understanding the full implications as I'm mid way through a redirection project.
Old sites that have been bought and their pages correctly 301 redirected to new sites several years ago are feeling this pain.
The 301d URLs are re-appearing in search results, taking up positions they used to occupy before the 301 was put in place.
How 301s used to work in the ideal world:
What used to happen, is that G would crawl the old URL, see the 301 and pass 'some' link equity to the new URL and de-index the old URL over a few days.
The upshot is that your new URL would rank immediately, and the old URL would be gone.
What has started to happen, is these old URLs have popped back into SERPs.
As you say, there has been no change, other than Google messing about here...
These old URLs are still 301d (no changes made for many months / years in some cases), but the they are being presented in SERPs carrying the old site name and titles, showing the old URLs and often having much lower quality CTAs.
The upshot appears to be that:
1. Pages that used to benefit from the 301 now suffer a dupe content risk
2. The old URL is back in the SERPs (even though it is 301d)
It is as if 301s are no longer guaranteed to remove a page from the index.
As you say, this leaves Google open to abuse:
1. Creation of lots of pages, each 301d to a competitor in the hope that some of the 'fresh' pages create phantoms and devalue the competitors URL.
2. Creation of multiple URLs within a site (such as changing to friendly URLs), may allow ranking of the old unfriendly URL that has been 301d alongside the new friendly URL. (I know this happens, trust me, where we used to rank 2nd, we now rank 2nd and 3rd because of the phantom 301 URL).
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||1/6/12 4:29 PM|
Proof of these allegations?
I would say that your 301 redirections are simply incorrectly set up and thus are not working.
Many people do silly things such as blocking the same urls which they want to 301 redirect in robots.txt - ending up with redirections never being found.
|Re: 301 redirected being "Phantom indexed" after 1 year||cycle-hire||1/7/12 5:21 AM|
Webado: search for propertyfinder on google co uk
You'll see the top result is a 301 d domain that has been 301d for many years.
Is that proof enough, or will you always want more?
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||1/7/12 6:00 AM|
http://propertyfinder.com/robots.txt responds with a Status: HTTP/1.1 503 Service Temporarily Unavailable .
So no part of the propertyfinder.com site will be crawled again unti lsuch time as the robots.txt file responds properly with 200 or is 301 redirected itself, so none of the currently implemented 301 redirections from propertyfinder.com to www.zoopla.co.uk will be seen by robots.
Do you need more proof?
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||1/7/12 4:50 PM|
Thanks very much seoeditors for the brief and elegant summary, and for reporting another possible example.
Yours is very similar, but also slightly different, hard to tell without knowing what your originalpages previously looked like. You have some 87000 old urls listed in the index (looking at additional omitted pages), but of the ones which have titles and descriptions none that I could find appear to have the title and/or description and/or cache of the NEW page. I would have to agree with Webado on the possibility that a non-responding robots.txt could cause some issues, although I didn't get a 503 as reported (at the time of this writing), I got what looks like a blanket, soft 404 screen (is it returning a 404 or a 301 which G frowns upon?) and then autoredirection, to the new site (which G also does not like I think) . Perhaps restore the robots.txt and see if it clears yours up. Also check your logs on the old server to see whether G is/was crawling these pages and getting a 301 returned. Also try the webmaster tools fetch-as-googlebot to confirm what G is getting when it does get through.
>Yeah ok pcap . com shows up when searched for that character string but the cache indicates it's the cache of http://www.lasvegas-nv .com/ so the redirection has been found.<Bingo! on the agreement that the redirection has been found. I would not be silly enough to suggest any example where it had not been found yet. Likewise the www version shows up too. Google's canonicalization bug was fixed on this domain at least 3 years ago or more, but index.htm was 301 redirected well before that, resulting I guess, in both www and non-www automatically being redirected and later phantomed. But that is not the issue in itself as evidenced by the lv night example and "pc ap dot com slash las vegas dot htm"
However, it does NOT mean that somewhere in G's twisted little algo-brain that is it not being used to compare against other copies of the page to be a duplicate. This is also consistent in many cases with a URL which IS an identified duplicate.
did you try the "pc ap dot com slash lv night dot htm" example?
Same case, but here, I reactivated the page on Dec 19,2011 with a NOINDEX in the header. The display preview EVEN SHOWS the correct reactivated page (proof it has been crawled), but the cache (supposedly retrieved AFTER Dec 19) , title and description all show the OLD redirected-to page still!
I think 5 years of almost daily crawling of it SHOULD be sufficient! Don't you? These are just a small sampling of examples, I have dozens of others including ones which simply redirect to another PAGE ON THE SAME DOMAIN and have been there for years as well.
No I didn't/don't think that at all (although I am BEGINNING to believe it based on the evidence from the lv night example!). I was simply stating the fact that G still crawls them, as evidence that it SHOULD KNOW by now that they were and are still redirected. Sure, G SHOULD still crawl them periodically, I'd be concerned it it didn't, because it is still finding external links to them.
I DO, OTOH, believe it is a FACT that G keeps copies of the content of each page (or at least parts of it - as evidenced by cache feature, titles, descriptions in the listings). And, it is also a FACT that G intentionally goes to great lengths to look for duplicates of many things like titles and meta descriptions and likely whole pages (e.g. page cache data) and thinks they are a BAD THING (you didn't think the "duplicate title and metatag" data in webmaster tools was simply for the website OWNER's benefit did you?).
It wasn't so long ago, that a duplicately indexed paged was obvious based on index results and immediate rank plummet (it's a bit more difficult now). In which case add 2 + 2 and if Google HAS the WRONG data for these phantoms (as shown: a description and cache is showing up for the wrong URL), it is not a far stretch at all to believe it is seeing them as duplicates.
Already explained in former posts. Basically, why waste time flogging unself on conjecture of what G MIGHT be thinking is a little bad about a site when a perfect example of what we KNOW G thinks is VERY bad is staring one right in the face and in the end we have no control over it, until G fixes it, IF it IS the problem? 99% of pages do NOT act this way, therefore it must be an unexpected exception rather than the intended rule.
My view is: It's like thinking your car is not starting when you turn the key because: the windows are dirty, your mirror is missing, your upholstery is ripped up, your seatbelts stick, your radio won't tune to AM and your tires are worn flat, and then someone comes along, opens the hood, stands back and takes one look from afar and says "well THERE's your most OBVIOUS problem you have no engine!" Those others may contribute to running issues, but without an engine they aren't going to make a bit of difference.
Yeah, sure, ALWAYS blame the site based on, unproven, subjective CONJECTURE over what G likes, never the objective proven FACTS. :-) I've received ad many DIFFERENT explanations as to why G doesn't like the pages as I have gotten responses. No two can agree.
I agree, we have many aesthetic concerns, which is why they ARE all being changed and moved. But it does't change the fact that G, Bing, Yahoo, etc. liked them for many years and overnight just CERTAIN pages, corresponding closely to those phantomed, suddenly dropped 50-100 places in rank. We still have many ranking just a few places below previous levels (probably the Panda quality demerit I would be willing to believe or guilt by association with the phantomed pages) nearly the same or even better than before.
You'd think so, about the ads penalty, although Adsense reviewed the site personally a couple months ago and insists there should be MORE ads/adlinks up top, in the HOT zone, we disagreed based on how blatantly MFA it then looks (most of our current design predates G and Adsense) and our previous test results that locating more up there results in lower earnings. Also the pages with the most ads at/near top (we rarely have ever had more than one above the fold), are currently ranking relatiively better than the ones with this dup issue.
OR the ENTIRE site gets penalized yet more, for more 301 phantom duplications, as more pages are redirected from pc ap. Darned if you do, Darned if you don't.
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||1/11/12 6:54 PM|
Well, just a quick update:
1. One of the above mentioned phantomed listings FINALLY suddenly disappeared from the index in the last day or two. And guess what? All the penalized pages downstream from it links-wise shot up 10-30% in the SERPS overnight. But not ones attached to other ongoing phantoms. Yeah, "No duplication penalty" due to them, my butt! There are still other duplicates of this page phantomed out there, so I don't expect them all to return to normal until ALL those are gone.
2. Wanna see something really funny. search for "pcap dot com slash index" (no trailing .htm, etc.) . Google is now indexing a 404 error page which is a prior phantomed page which was un-redirected over a week ago as an experiment. You can see it STILL has the title and snippet from the page to which it was previously redirected. ( BTW, the lvnight "noindex" phantom page is still indexed too.) Even funnier, look at G's cache of the returned page. The text part, generated by Google, says:
"404. That’s an error.
The requested URL /search?q=cache:JcUcT4wgawUJ:xxxx.xxx/xxxxxxx+&cd=1&hl=en&ct=clnk&gl=us was not found on this server. That’s all we know. "
Uh-Duh. Gee I wonder WHY they couldn't find it. They got the robot graphic on the right side, right though (that IS supposed to be Google, right? :-).
Still think G is not broken? I think this is an admission of it. LOL, :)
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||1/11/12 7:00 PM|
It's not indexed:
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||1/12/12 1:39 AM|
Oops. My Fault. It is:
pc ap dot com slash LVindex
Still indexed at this time.
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||1/12/12 5:29 AM|
It's NOT LVindex but lvindex and it's a 404 after a 301 to the www version.
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||1/13/12 2:51 AM|
Yeah, that's what I said (I was just emphaizing the LV part, I missed, for all to see). Don't think case makes a difference on G search does it?
And I also said it was previously a 301 which is NOW, as obviously known by Google, a 404 since over a week ago, before it was last cached.
Is there something else I'm missing?
|Re: 301 redirected being "Phantom indexed" after 1 year||webado||1/13/12 6:21 AM|
Not in text search, but for urls it sure does.
You don't know when it was last cached as it has no public cache.
What is cached in its place is index.htm :
Possibly /lvindex.htm was being 302 redirected to the extensionless url /lvindex at the time (Dec 26).
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||1/14/12 2:36 AM|
Yes, webado you are correct, they FINALLY realized it was 404 and eliminated it from the index in the last 2 days and the search result is coming up as it should now. No lv index dot htm was never redirected to lv index (the reverse is the case) but it was redirected just recently to somewhere else in the midst of our move and the lv index redirect removed from the htaccess. It was funny while it lasted (I did a screen capture). What I was saying was that the page was 404'd over a week prior, and somewhere in between they had obviously "cached" it as 404 even though they should have realized it was 404.
So in summary removing the 301 and making it 404 seems to work, eventually, to eliminate phantoms. Replacing with a 'noindex, follow' page, still not working. But with a 404 you lose any PR pass-thru, so not acceptable in all cases.
Interestingly the aforementioned previously-penalized-but-improving pages are up yet another 10% in the last 2 days since the one phantom went away. They're slowly bubbling up, and there's still another dozen or so phantoms to go for the home pages. The one they found and eliminated though was probably amongst the most egregious: pc ap dot com. But the www dot pc ap dot com version is still there.and the pc ap dot com slash las vegas dot htm url is still there too. If we could get rid of those I think we would be well on our way back to a major recovery.
|Re: 301 redirected being "Phantom indexed" after 1 year||Mike0000||4/20/12 4:37 PM|
At last Google listened and FIXED this issue! Thanks G. Some time between 4/16/12 and 4/20/12 ALL of our remaining phantomed indexes disappeared ALL at once. I'm talking dozens of them. Searching on the old URL now displays the redirected-to url as it should in every case!
|Re: 301 redirected being "Phantom indexed" after 1 year||alistair.lattimore||4/20/12 9:04 PM|
Are you going to move forward with completely separating your sites now or have you already done that?
Also I recall you were concerned about ranking implications because of these phantom URLs, has anything changed now that they are gone?
|Re: 301 redirected being "Phantom indexed" after 1 year||vistastores||6/19/12 5:12 AM|
I don't think so. I can see Phantom Indexed issue with my old 3 domains. Today, I have added one comment & raised one question on SEOmoz about it. This is marathon discussion on similar subject and was quite excited to drop my issue... Because, It's still happening in Google.
I would like to paste few portion from that question to give more idea.
I have 301 redirected following 3 domains to new website...
I have done it before 3 months but, Google still shows me home page URL in search result rather than new landing page. You can check following search results to know more about it.
For LampsLightingandMore ~ On second or third page:::
For VistaPatioUmbrellas ~ On second or third page:::
For SpiderOfficeChairs ~ On Second or third page:::