|How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/13/12 5:57 AM|
I have studied the Help Center, read the FAQs and searched for similar questions.
My URL is: http://www.carsurvey.org
TL;DR Almost 2 years of full-time work could not save a popular car reviews site from the Panda algorithm.
My site http://www.carsurvey.org has been going for over 15 years in one form or another, and is focussed on user contributed car reviews (over 100,000), and related advice. I get lots of great feedback from visitors, so I must be doing something right. Google themselves even thought so, when they featured the site on their AdSense blog: http://adsense.blogspot.co.uk/2008/02/driving-up-placement-targeting-earnings.html
I was hit by the first run of Google's Panda algorithm in February 2011, and have been working flat-out since then to try to improve the site in the right ways, with no positive results. Note that my site seems to only have been hit by the Panda algorithm; later Google algorithms such as Penguin and the Page Layout Algorithm (Top Heavy) seemed to leave my site alone, although I guess it's possible that Panda is masking the effects of those later algorithms.
Looking back, I can now see that the old site was less than ideal in a number of ways:
1. The quality of the writing and moderation was variable, as is often the case with forums/user generated content sites.
2. The custom CMS made no sensible attempt to deal with thin content such as very short reviews.
3. The ad layout was too heavy, and also slightly unconventional (ads on both sides of the page).
4. The lack of threaded comments had resulted in duplicate content issues, due to people quoting previous comments in their replies.
5. There were a lot of thin navigation pages, as there were several alternative ways to navigate the site.
6. The look and feel of the site was quite dated.
Here, in rough order in implementation, are the changes I've made to improve things:
1. Meta noindexed redundant navigation pages and pages with very thin content.
2. Added proper comment threading to the site, to reduce future duplication.
3. Wrote a tool to identify 1000s of instances of duplicate content caused by quoting, reviewed them by hand, and converted them into proper threaded replies, with little to no duplication. The Perl CPAN String::LCSS_XS module was very useful here.
4. Totally rebuilt the structure of the site to almost eliminate thin pages. A two pass pagination algorithm is used, and I've moved to a multiple reviews per page model, with inline comments. Visitors should almost always find pages that have a generous helping of relevant content. All old links are redirected to the current new addresses. Any remaining thin content, excepting some essential nav pages, is noindexed. The site used to have about 250,000 pages in Google's index. It now has about 44,000 pages in the index, and this is mostly due to the pagination and noindex changes.
5. Split off the airline reviews part of the site onto its own domain, with appropriate redirects, in case that wasn't helping.
6. Redid the page layout and CSS from scratch. The code is now HTML5 (had been XHTML 1.0 Transitional), and is also more compact. Works better on tablets such as the iPad too.
7. Paid for a designer to produce a new logo for the site, and even tested candidate logos on Amazon Mechanical Turk to gauge people's opinion. Also created a matching favicon file.
9. Reduced the ad load. Navigation pages only have one AdSense ad, and there's never more than two AdSense ads above the fold on the content pages. Very thin but essential pages are entirely ad free. AdSense keeps sending me emails, saying that I'm leaving money on the table, but I've been putting the user experience first.
10. The ad layout has been modelled on bluechip sites. So a leaderboard banner, with other ads on the right, and the content taking up the majority of the width. Content has been prioritised over ad click through rates. The page structure has been rethought to improve usability, and years of accumulated clutter has been removed.
11. As the pagination system means that content often moves pages, my sitemap file generation code has been updated. Certain types of edits to the site result in changes to database reindex dates, and these dates are used to populate the lastmod field of the sitemaps file. This should result in Google's index being far more up-to-date, improving the experience for visitors, and reducing the chances of pagination changes causing duplicate content issues. I've also implemented rel="next" and rel="prev" tags on paginated pages, to make the relationship between the pages clear to Googlebot.
12. Removed in-text ads from the site, leaving only a limited number of AdSense ads. This cost me a large fraction of the site's revenue, but again, I put the user experience first.
13. I built an automated system to find poor quality content that was still on the site. I looked at many factors, such as poor spelling, punctuation, and overuse of capitals. This system flagged over 15,000 reviews and 36,000 comments, which have been removed from the site, pending manual review (sadly those poor quality reviews and comments include many false positives - I wanted to find as much of the sub-standard content as possible).
14. Implemented Schema.org markup for products, reviews, and user comments, to make it clear to Googlebot what the nature of the content is.
15. Had another go at finding duplicate content in the site. Wrote a tool to find the most unique 11 word sequences (ngrams) in each piece of content, or each section of larger pieces of content, and then using MySQL full text search, ran my Mac Pro for the best part of week to search out partial duplicates that were missed by my earlier duplicate content search. Each duplicate found was examined and fixed (if necessary) by hand.
16. I hadn't worried too much about scraper sites in the past, but I thought that maybe they were causing my problem. I decided to use my ngram system to search the entire web for copies of the content of Carsurvey.org. Search engines aren't happy about people doing automated searches for free, so I had to use a paid solution. I had over 500,000 searches to do, so Copyscape (5c a search) and Google ($5 per 1000 queries) were too expensive. Not only was Google very expensive, but there was a limit of 10,000 searches per day. Yahoo BOSS however was a much more reasonable 80c per 1000 queries. I mailed Yahoo to check that they'd be OK with me sending so many searches, and they replied that it wouldn't be a problem. Again, my Mac Pro spent the best part of a week battering Yahoo's servers, costing me just under $500. I loaded the duplicate data into a database, and set about dealing with the sites that had the most duplicate content.
17. Trying to get duplicate content removed is utterly soul destroying. In many cases, reviewers had submitted their car reviews to multiple sites on the same day. Where I found this, I removed the reviews from Carsurvey.org. Many legitimate sites were very helpful when it came to taking down copied content, but the serious scrapers and their hosts just ignore your requests. Once I'd dealt with the low hanging fruit, I needed to start using Google's DMCA process, to either remove scraper pages from Google's search results, or to get pages removed from Blogspot, which seems to be a den of content scrapers as far as I can tell. The Google DMCA form is painful to fill in, and there's no way to get an entire site removed, other than by passing some mysterious threshold that only Google knows. This process is terribly broken. Legitimate requests seemed to be refused at random, and the more requests I sent, the higher the percentage of rejections I got from Google, without reasonable justification. Short of suing Google, I was fighting a losing battle, and was also finding the whole task very unpleasant, so I gave up. In case anyone is wondering, these DMCA takedowns were before Google announced that successful DMCA takedowns could be used to rank search results. Note to Google: your DMCA process is broken, and seems to give immunity to sites that scrape on a massive scale.
18. In order to avoid spam problems, Carsurvey.org never allowed real HTML links. Unfortunately that meant that the site was full of old plain text links that no longer went anywhere useful. Spam had never made it onto the site (even in plain text), but there were thousands of links to expired domains and old sites such as Geocities. Using the excellent CPAN module URI::Find::Schemeless, plus a few additional regular expressions of my own, I catalogued the thousands of links on the site, checked each one by hand, removed bad or broken links, converted good links to Markdown, set the links to either follow or nofollow as appropriate, and updated the site to recognise Markdown links in its pages, and render them as HTML.
Having done all that, over the best part of 2 years, working 6.5 days a week on average, I've got precisely nowhere. Poor quality reviews sites and scraper sites (often with my content) continue to rank in Google, while Carsurvey.org seems to be banished to the dark basement of Google's search results, almost certainly behind a sign saying "Beware of the Leopard" (see, I've read HHGG, and haven't quite lost my sense of humour). In fact, as I've removed content, and as Google seem to have increased the strength of the Panda penalty over time (presumably as they became more confident in the algorithm), the drop in traffic has gotten much worse since the initial Panda launch in 2011. Regardless of the absolute quality of the site, its relative quality has improved massively, but despite this, the visibility in Google search results has continued to fall.
It's also outranked by lots of spammy and irrelevant content; I wouldn't be complaining if I was only being outranked by quality, relevant sites. To be fair to Google, the spam and scraper situation has been improving, but the first few pages of search results now seem to be utterly dominated by multiple pages from a few top sites, with relevant content from smaller competitors not getting a look in (note that my experience is mostly of google.co.uk).
It's apparently OK for Yelp, IMDB, and Trip Advisor to have thousands of pages of consumer reviews, but my guess is that Panda basically assumes that any large site about a single subject, that doesn't have enough triple A quality links, or which is not easily identifiable as a forum or online shop, must be a content farm that's polluting the search results. As I've always stayed away from trying to artificially boost the links to the site, and because car reviews are something that most people only need once every few years, I don't have the link profile that some other sites may have. I also strongly suspect the threshold for escaping Panda is higher than being caught by it, so that once you're in its clutches, it's very hard to escape, and even perhaps impossible for some types of sites.
I've never had a warning about anything unnatural in Google's Webmaster Tools, and unlike with a manual penalty, there's no way to raise this issue with Google (yes, I tried a reinclusion request). I've tried politely contacting a few Googlers, and never got anywhere. I watched pretty much all Matt Cutts' videos, read thousands of forum comments about Panda, posted my own story on forums, read "In the Plex" (actually, I listened to the audiobook) to try and get inside the mind of Google, and despite trying everything I could think of, my site seems to be collateral damage in Google's war on web spam.
If anyone has any ideas, or if a friendly Googler is able to take a look at the site, I'd really appreciate it, as otherwise I've come to the end of the road, and sadly Carsurvey.org will have to become my hobby again, while I move onto pastures new (iPhone development) in order to pay my bills. I'm also happy to answer questions if anyone is interested in more technical details of some of the actions I took.
|Stephen Sherman||11/13/12 6:48 AM||<This message has been deleted.>|
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Ashley||11/13/12 8:25 AM|
Featured by Adsense it totally separate from anything to do with organic rankings - just so we're clear there.
Why not just remove all thin content pages?
Are you interlinking the sites at all?
The biggest issue is likely content. Unfortunately, it's the model of user-submitted content that allowed you to create such a massive site and that is pushing it to fall.
When I arrive to the site - I'm immediately confused by the value. It's just a sparse looking design with nothing but a long list of links. There's an ad above any content and no description of what I can actually find on the site. I see "Car reviews by manufacturer" but even that is confusing. Is the manufacturer leaving the review? No... It's just how you navigate. It's just coming up short.
I went to look up my car (2011 subaru outback) but found little useful information. Why are there "faults" listing that are issues the current owner has like "rock chips". That's totally irrelevant to me. I see no pics of the cars, no description of the different packages, nothing on fuel economy or pricing - nothing that is actually really useful to me if I was investigating buying the car. Take a look at these other top ranked sites and compare them to yours
http://usnews.rankingsandreviews.com/cars-trucks/Subaru_Outback/2011/ - lots of features, plus excellent writing
http://www.thecarconnection.com/overview/subaru_outback_2011 - Same, tons of features on this page
http://www.cars.com/subaru/outback/2011/expert-reviews/ - fewer features, but robust writing - seems very authentic
http://www.edmunds.com/subaru/outback/2011/consumer-reviews.html - user reviews, but easy to tab and sort, nice star rating, includes some pics and links on the car, etc.
Your site is falling short on usability and value to me as a user. I didn't look at technical aspects, but I can. But you need to find something that you feel you can do really well, something that you can offer that other sites cannot. You need to build the best possible website out there for users. What value can you add that these other sites are not?
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/13/12 8:30 AM|
It's interesting to hear a similar story, especially from someone who's been running their site for a similar length of time. We're clearly in this for the long haul, and I feel like we're putting the effort in to do the best by our visitors and Google, but Google isn't doing enough to help people who genuinely are trying to do the right thing. I appreciate they're in a tough place with spammers, but at least a few clues about where we're going wrong, or an indication that they appreciate our sites are false positives, and they'll use that data to tweak future versions of the algorithm, would make all the difference.
I just had a look at your site, and I see that it's got lots of relevant images. I had been wondering if the lack of images on my site was hurting it, but it doesn't seem to have saved your site.
It appears that running a website has become a bit of a lottery. Prior to Panda, if you kept up to date on Google's policies and used a little common sense, you'd be fine, but now there's a significant chance that some algorithm will bury your site, with no right of appeal.
As I'm going to be working on iOS apps soon, I'm aware that Apple can be somewhat arbitrary with their app rejections, but at least common sense can keep you reasonably safe. I'm planning on building a weather app and a running app, and uncontroversial apps in established niches like that are going to be very low risk. Plus, any disputes will ultimately be reviewed by a human being. Contrast that to the current Google situation, where if I launched a new site, I have no clue whether it would fall foul of one of Google's algorithms, and little recourse if it did so.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Lysis||11/13/12 8:39 AM|
It's probably the user generated content like Ashley said. Do you pay them? Is this just some cheap guy on freelancer.com? If it's free stuff submitted by users, you also have to remember that if you aren't paying them, they are probably putting the content all over the web. User generated content should be heavily edited and monitored, and my experience is that sites were always very lax when it came to poor content, because it was always about just generating as much content as possible for search engines regardless of content.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/13/12 9:05 AM|
Thanks for the feedback Ashley.
Of course you're right about AdSense and search quality being entirely separate at Google, but I was just making the point that at least someone at Google wasn't completely horrified by the content and quality of the site in 2008.
1. If I have a shortish review of a car, especially if it's quite rare, I think there's value in having that review available to visitors.
5. My airline site and motorcycle reviews site are interlinked in the page footers, share the same server, and share the same Webmaster Tools account. I can understand that having a web of hundreds of sites cross linking each other would be a problem, but I didn't think that three related sites was anything unnatural. Lots of major sites cross promote their sister sites. But if it's likely to be a big problems, it's 5 minutes work to remove those links.
Regarding the user-submitted content, I can see your point, but many larger sites, including Google's own YouTube, and Wikipedia, are user generated, and aren't penalised in this way. I'm not questioning your analysis, but if it's correct, it seems that larger sites get to play by a very different set of rules.
Your point on the wording on the first page is well made. I've tried a few alternatives, but whenever I A/B test more detailed descriptions, the metrics are actually worse than the current design (which has a very low bounce rate). Plus, most new visitors arrive from search engines via deep links; the front page is mostly used by return visitors who already understand the purpose of the site.
The site is mostly focussed on reviews of cars that people have owned for a few years. This, combined with the drop in traffic in the past few years due to Panda, means that there are relatively few reviews of 2011 cars. There are however 9 reviews of 2006 Outbacks, and 48 reviews of 2000 Outbacks. The 2011 Outback reviews are also quite short (that's just chance), whereas some of the reviews on the site are massive (1000's of words). I don't do pricing or technical data, because I don't have the resources to cover that at a quality level that I would consider to be acceptable, and there are plenty of good sources out there for that information.
The value of the site is in the number of reviews (admittedly of mostly older models), the speed of the site, and the lack of clutter. I came up with the site in 1997, when it was pretty unique. I do appreciate that there's good competition out there, and if Google chooses to rank those sites higher, that's perfectly reasonable. It's the devastating effect of Panda, when other similar sites aren't impacted, that I'm concerned about.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/13/12 9:40 AM|
Thanks for the feedback Lysis.
I agree that it's probably the user generated content, but there's little I can do to change that. Other user generated sites, with more than their fair share of poor quality content, seem to be unaffected though.
I have never paid for reviews, and have always turned down offers of reviews in exchange for links. I also actively monitor the site for suspicious reviews and spam, and content has been removed and people banned when attempts are made to game the site. The site is supposed to be on the side of consumers, and I take that responsibility extremely seriously. I'm not going to pretend that I didn't see more content as a good thing, but it wasn't a case of more content, regardless of quality or trustworthiness.
When I was looking for duplicate content, I did of course find reviews that had been legitimately submitted to multiple sites. I have removed every example of such content that I could find. As stated above, I spent several months on this problem alone (plus $500 in service charges with Yahoo BOSS).
The site has always been moderated and edited. From 1997 - 2001, I personally moderated everything, and that was also the case from 2009 until today. In the middle period, I partly relied on volunteer moderators (due to work commitments), and despite needing multiple moderators to vote on every new review and comment, the moderation wasn't always perfect. I've attempted to address this by using a script to find and remove content that may have issues, and that hasn't helped, despite about 15% of the site being removed. Another data point is that my smaller motorcycle reviews site (same codebase and very similar design) was always 100% moderated by me, and has also been hit by Panda. I'm not saying that content issues didn't contribute to the problem, but they have been addressed significantly, and clearly aren't the only factor.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||StevieD_Web||11/13/12 10:16 AM|
>but I didn't think that three related sites was anything unnatural
What one thinks is not relevant.
I think Scarlett Johannson is going to take me up on my offer of a date.
Reality is most likely something else.
Thus your three sites appear to be a massive link scheme.
AT BEST Google will simply ignore the links. AT WORST you might get drop kicked in the $%^&
Combine your interlinking issues with thin content compared to competing sites and you have a recipe for disaster.
PS: When was the last time an auto manufacturer gave/loaned a long term test vehicle to your site..... think about that when you compare your site to Edmund's as they routinely receive vehicles from the manufacturer:
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/13/12 11:20 AM|
Those links weren't important, so they've now been removed from all three sites. I hope it wasn't something as innocent as that, but just in case, they're gone.
I don't see the site as competing head-on with Edmunds and the like. I see the site as the equivalent to Yelp or TripAdvisor (both of which I predate). There's space for both TripAdvisor and Condé Nast Traveller in this world. If you want to know what the latest 2013 BMW 3 Series is like to drive, then you're better off with Edmunds, whereas my site can tell you exactly what it's like to own a 1974 Jensen Interceptor (http://www.carsurvey.org/reviews/jensen/interceptor/) or a 2001 Ford Tourneo (http://www.carsurvey.org/reviews/ford/tourneo/), which you'll struggle to find elsewhere on the web.
p.s. Check out those reviews. They certainly don't fit my definition of thin content, though I appreciate that those are two of the best reviews on the site, and Panda is more interested in the worst content it can find, rather than the best.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Marie Haynes||11/13/12 11:49 AM|
Wow. You have worked really hard to rectify this. I'm so sorry that Google is still affecting you.
I have a few thoughts.
My first thought is that the first couple of pages I see have nothing but a list of links. The home page is simply a list of car names as links. There is no text or anything to tell Google that your home page is of value. The next level in, when I click on a car, is another list of links with no text. When I get to the third level, it's links along with thin content, but I see that you have noindexed this page (which is good). It's not till I get to the fourth level that I get to the real content.
If this were my site I would write some substantial content on the homepage about what your site is about and how the user can benefit from using it. I would do the same on the second level of pages too.
I have to tell you that I have used your site in the past and really found it useful. Recently we were buying a car again and I was, of course, searching online for reviews. I was actually trying to find your site. Unfortunately I couldn't remember the name and I couldn't find it. It's so unfair that Google penalizes sites like this, but that's because Panda is an algorithm. There are going to be sites hit unfairly.
...one other thought to finish things out. Are you absolutely sure that you have been affected by Panda? I have seen some sites that have been trying desperately to recover from Panda when really there was another issue. Because I've gotten a lot of help from your site in the past, I'd be happy to take a look at your analytics and give you my thoughts just in case.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/13/12 2:15 PM|
Marie, thanks for your very kind words.
The fact that you were actively looking for my site, but couldn't find it, is both gratifying and disappointing.
Due to the scale of the site, sensible navigation that's not a hierarchy is hard. I have considered putting more content on those navigation pages, but I've always been wary of either creating more duplicate content issues, or filling the page with bland content that doesn't really need to be there, in order to somehow affect search engine ranking. That said, it's clear that I need to look at this again. I'll spend some time looking at how other quality sites deal with this, and hopefully I'll find some inspiration for some improvements.
I'm 99.9% sure I was affected by Panda, but thank you for your kind offer. My organic Google traffic dropped massively on February 24th 2011 (Panda 1.0), and there was another big drop on April 11th 2011 (Panda 2.0). Other traffic sources were not affected, and some of the more recent Panda rollouts have also resulted in significant falls in visitors from Google. I know statisticians like to say that correlation is not causation, but the analytics correlation with Panda is incredibly strong, so I'd be utterly amazed if an endangered bear species with a fondness for bamboo wasn't responsible.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||StevieD_Web||11/13/12 2:44 PM|
It reads like a personal blog post..... has flavor and character and uniqueness.
It also reads like a big run on sentence.
There are no pictures of the car to enhance the post or chapters/headers to break up the text. Not a great user experience.
Now compare this thread to some of the UGC comments of "this car is the da bomb". Panda don't like those bomb comments.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/13/12 3:53 PM|
I agree that user generated content is of variable quality, and even with effective moderation, such content is rarely going to reach the level of the best professional content.
It's probably true that Panda doesn't like such content, but many large, established sites that are based on user generated content, seem to have escaped Panda, so this either isn't the full story, or alternatively, there's an exception in Panda for large, established sites (presumably this exception is algorithmic, rather than manual).
Google's own sweetheart product, Google+, is composed almost entirely of user generated content. In fact many of the most popular sites on the web are effectively user generated (eBay, Craigslist, Wikipedia, Facebook, Twitter, Pinterest, Foursquare, Flickr, Linkedin, YouTube). Even stores such as Amazon and Google Play rely heavily on user reviews of highly variable quality for much of their content. Not only that, but the vast majority of content-centric sites, even the bluechip ones, host user comments, which are rarely moderated sufficiently to ensure a uniform high quality.
Google's actions in running YouTube and Google+, and their thwarted desire to index Twitter and Facebook content, suggest that at least in some forms, Google can see the value of user generated content.
If Google has an algorithm that is biased against user generated content from smaller sites, it would be helpful if they would be up front about this. I frequently use Google to search for relevant, recent discussions, and maybe I'm just an old-timer who can remember Usenet in its prime, but I don't want to be served up only professional content. The variable quality of user contributed reviews is a great signal to the reader about the level of trust they should place in that review. A slightly amateurish review that goes into lots of mundane details, is often a good counterpoint to a professional review by written by a journalist on a tight deadline. Plus, I suspect that user generated content avoids some of the worst link bait tendencies of some professional content creators.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||StevieD_Web||11/13/12 5:00 PM|
>In fact many of the most popular sites on the web are effectively user generated (eBay, Craigslist, Wikipedia, Facebook, Twitter, Pinterest, Foursquare, Flickr, Linkedin, YouTube).
Yes they are. But you missed the one thing those sites have in common versus your own.....
They are all branded destinations.
The top queries for each of those sites is the brand or "brand+subject" as in "cameras on ebay" or "jobs in New York on Craigs list".
To Google there is no doubt as to the user's intended destination. In fact Google has a special name for those queries... navigational.
Comparatively, what % of your queries are for your brand ? Even if the query is for "car survey" the searcher could be just as happy with Consumer Report's latest car survey of reliability or Cars.com survey of top selling cars for October.
Branding is more important today than ever..... I suspect each of the latest algorithms has a branding component in the algorithm. And lets not forget the EMD update which has significant brand (business name) verification components to the algorithm.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||devcfc||11/13/12 6:44 PM|
I cant believe you worked on this for 2 years and didnt add any text to the home page, no offence. Completely agree with everything Marie said. I would make priority #1 to add 1000 words to your home page.
If you dont want 1000 words dominating the page you could show the first paragraph then have a "Read More" button script that displays the rest of the text upon clicking. I believe this a legitimate thing to do, if not somebody correct me if im wrong as ive started doing it on my own sites. If you want to see an example view this web page and the text at the top: http://www.weddingfavors.org/cheap-wedding-favors
Also what is the point in these pages: http://www.carsurvey.org/reviews/bmw/3_series/
I think you need to implement some CSS drop down menus on the home page. For example when you hover over "BMW" it would then show a drop down menu with the years: 2012, 2011 etc. Then you could cut out all those thin content useless intermediary pages.
Then I would add alot more text to your site. Personally, I would make the text smaller. Looks more professional and you will fit more on the screen and less scrolling. Get some images on there, that is content too. I think adding the car logos beside the names would make it look 10 times better.
Thats my 2 cents
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/14/12 4:55 AM|
For my site, 0.3% of query impressions and 3.6% of search clicks are for the Carsurvey.org brand. 23.2% of visitors are repeat visitors (and 6.2% have visited 9 or more times). That's obviously not the profile of a site like Amazon, but neither is it the profile of a content farm or a poor quality site.
Regardless, I'm not arguing whether I should rank as highly as the most trusted brands. My point is that the site has enough value to justify being ranked as a normal site, rather than being penalised in the same way as the most cynical content farms and scraper sites.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Rainer Maurer||11/14/12 6:12 AM|
hi, why he need 1000 words? its a perfect site if i seek for a review. i dont have read mutch and find my carbrand fast. its this not what you want? because the crawler wants 1000 words...not the user!
because google suxxs with his algorithm and makes some mistakes thats why he can´t survey. google pushing brands and telling you thats you have a low quality site, yes he is telling you but can´t mess it.
btw do u shows on the google startpage? who is here the content for the user? 1000 words??? i have the same problem with all my sites and i think i will never return on the index pages..so for me google is dead, only links are the right deal to get back but more then you ever can get.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Stephen Sherman||11/14/12 6:22 AM|
Since you read my comment and we have commiserated briefly, I will delete my post. Because: 1) I don't want to hijack your thread. 2) I might start a new thread with my own website and tale of woe. You seem to have gotten some good feedback; maybe I will too.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||devcfc||11/14/12 6:41 AM|
hi, why he need 1000 words? its a perfect site if i seek for a review. i dont have read mutch and find my carbrand fast. its this not what you want? because the crawler wants 1000 words...not the user!
He doesn't "need" 1000 words. It was a personal recommendation, perhaps 500 or 350 might be sufficient. I dont work for Google. Why do I recommend he adds 1000 words? Because he has identified the problem as being Panda which is about Thin Content. Therefor by logic if you make thin content into fat juicy content = problem solved.
I wouldn't stop at the home page either. I would go through every page and ask myself the question "Does this page have thin content?" If YES = add more content.
I have sites that are Panda and Penguin effected too. I dont make the algorithms or the rules, but I will try and beat them to get my sites back on top.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/14/12 6:43 AM|
Thanks for the feedback.
I did try adding more text, but my A/B testing indicated that the users preferred the more simple link table.
Showing excerpts of some of the better quality reviews is a possibility. I'd stayed away from this, due to worries about duplicate content issues, but Matt Cutts recently released a video where he addresses how to do this appropriately: http://www.youtube.com/watch?v=hy3_Rjc0Tso
There's still an issue of the content being relevant to enough people, as the site is global, and some effort will need to be made to surface content that's relevant to both a 60 year old woman from England, looking for an SUV with space for her dogs and grandchildren, and an 18 year old man from the USA, who has $5000 to spend on something sporty.
That BMW 3 Series page exists to allow users to choose which year they're interested in. There are 740 BMW 3 Series reviews on the site, and if they weren't separated into individual years, it would be very hard to find relevant reviews. For models with a much smaller number of reviews, the site does not show a year list, and takes visitors straight to a list of all the reviews: http://www.carsurvey.org/reviews/bmw/1_series/single-page/
The problem with drop down menus is that they're not as intuitive as links, and they don't work as well on certain devices, especially touch based clients. Plus, given the size of the site, many drop downs would be massively long and unwieldy.
Car logos are potentially a great solution, but I've always been wary about the copyright and trademark issues involved. Those logos are owned by corporations, and using them without permission, especially to criticise their products, would put my company in a dangerous position. There is a fair use defence, but that's not cast iron, so it seems sensible to play it safe, and not risk potential legal action.
More text and images would be desirable, but only if it's a good fit for the site. I could cynically grab some relevant text from Wikipedia and Creative Commons images from Flickr, but that's what real content farms do, and it's not my style. I need to think about this some more, but anything I add needs to justify its inclusion in terms of adding value for the visitors, rather than trying to just tick some boxes for Googlebot.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/14/12 6:47 AM|
No problem Stephen. I hope you either receive some helpful advice, or attract the attention of a friendly Googler, who can perhaps bring it to the attention of someone in the Search Quality team.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/14/12 7:00 AM|
I understand the point you're both making. More content is better, but Google also say you should make your content with users in mind. The reviews pages are full of content, but the navigation pages are pretty minimalist. I think that makes sense for the user (as Rainer suggests), but it may well be the case that it triggers something in the Panda algorithm. In that case, I think ideally Google should see this as a data point to help tune the algorithm, but after almost 2 years of effort, I don't have much faith that things work like that in reality.
If it really is the case that there are magic word count or image count numbers to hit, to avoid Panda, then Google have made success on the web all about hitting a mysterious set of metrics, rather than the user satisfaction test they have been publicly promoting.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||devcfc||11/14/12 7:02 AM|
Well I think you need to make a decision on what is more important you:
a: having a site that complies with Google, Panda and Penguin etc.
b: having a site that complies with only you and your users best interests
For me regardless of what A/B testing tells you is better, if you want the traffic from Google play by their rules and suck up to them by adding what they want: content.
Scraping content from wikipedia or any other content that is duplicated across the web doesnt count as content in my book. Open MS word and write content for each page that comes from your own brain.
Agreed drop downs arent going to be the best solution for mobile views, but if its a trade off between losing some frustrated mobile viewers or a 90% traffic hit from Google I know what I would choose. If you dont want to do that, then I would noindex, nofollow all those pages that have zero content and just links to other pages.
Picture in your head your website details stored by Google. They prob have you marked down as having 50% of pages with a red flag beside it saying "Thin Content". No index those pages and only have stuff indexed with quality content.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Free2Write||11/14/12 7:15 AM|
"ad layout has been modelled on bluechip sites"Poor ad-layout, lack of control of user content, and thin content, is not likely to attract new users or retain users.
Bluechip sites get to use blatant ad layouts primarily because they already have an established huge following of user traffic; either by the length of time they've been in business or because of something they offer that cannot be matched. Until you have such a following, copying blue-chip ad layouts is probably a very bad idea. Ads should come after a following has been established not before. The blatant ad at the very top of the site, on all the site's pages, is something you might want to strongly reconsider along with how the user-generated content is being handled and the thin content. The alternative is to simply wait for Google or the already established brands to change and push your site to the top. Which is not likely to happen.
What is your bounce-rate? What is your returning to new visitor ratio? How many interviews or surveys have you done with your returning customers? What feedback have you gotten from users, friends, and relatives about the site? Is the site a hobby or a business? If a business, what is your business plan? How many people have read and reviewed your business plan and given you feedback with respect to the website? What other advertising has been done such as local TV, radio, newspapers, or magazines?
|Re: How I failed to save my 15 year old site from Google's Panda (long)||devcfc||11/14/12 7:20 AM|
Well, here is an update on that. Make your website with your users in mind. Then if/when you get hit by Panda or Penguin, make your website with Panda and Penguin in mind.
Yes, so hit those mysterious set of metrics.
Think about it. If they try to eliminate spam sites some sites like your own will get taken down in the crossfire because your site will share some ugly commonalities with spam sites such as lots of pages with just links. If you add content to all those pages they no longer get flagged up cos it will be passed through as a normal good page.
Pages like this: http://www.carsurvey.org/reviews/bmw/ could have been created by a robot that also built half your site. So how does Google differentiate between a good page made by a human or some auto created robot stuff?
Well if your page has 300-500 words of unique content on it im sure it gets the green light.
Go read articles on BMW then write your own content for the page. If there are too many pages to write content for get your site users working for you. Ask them to create content about BMW. "Review BMW cars" "Do You Think BMW Is Value For Money" "Would You Recommend BMW To A Friend"
Once a few people fill these out you will have content for each of those pages.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||lostB||11/14/12 7:51 AM|
I concur with the last writer.
What is your bounce rate, and your time on page. your overall dwell time... for a user.
Have you changed your CMS
Have you changed every title of every post in your site
Have you looked at Analytics and seen the all the posts that have high bounce/rate low time on page and dumped them.
Have you made sure that when the user finishes a car review, they move on to another site (not google), maybe a car forum or the manufacturers page.. anywheres but back to google.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Ryan Clark||11/14/12 8:08 AM|
I'd even consider adding in some more useful content via social media ie video reviews, license free images, tweets about that car, related news etc.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Free2Write||11/14/12 8:14 AM|
If users are bouncing away no amount of social media will help. Social media will actually do harm in that case.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Michael Martinez||11/14/12 8:37 AM|
Bounce rate really has nothing to do with it.
It's unfortunate that you have worked so hard and realized so little benefit. I feel for you. I have seen this depth of frustration through many years of helping people but it's rare and I know you're really not interested in platitudes.
Let me begin by observing that all your hard work has approached the problem from a mechanical perspective. You haven't even touched on the conceptual aspect. Panda is not really about mechanics. It's more about concepts. Of course, we translate those concepts into specific things and that is where I think a lot of people fail to appreciate the subtlety of the algorithm. Here are my criticisms and some suggestions.
I don't like your navigation. It's just not very usable for people. You're arguing that on a large site like this it's not easy to design good navigation. While that is true, it's an excuse for not implementing better navigation. Breadcrumbs don't cut the mustard and the way you have organized the site is really harming its crawlability. So one of your issues could be that the pages are not supporting each other properly. They are all semi-orphaned because there is too little connectivity.
You have separated out the primary content of the reviews (the information about the vehicles) as sidebars. Your point of view may be that the site is a review site and therefore the primary content is the reviews -- but to someone who is looking for specific information ABOUT A CAR, they want immediate confirmation (in visual terms) that they are on the right page. So putting the car details in the right-hand margin is not user-friendly.
The home page is rather useless. It's just a list of keywords and that is not going to help you much at all. In fact, you could already have recovered from Panda and yet succumbed to Penguin. Google isn't just throwing 1 algorithm at your site -- it's pointing dozens, maybe hundreds of them at you. You cannot afford to focus on just one poorly understood algorithm.
And the fact that Panda is a learning algorithm means that a certain amount of conformity to stylistic choices is almost mandatory. In other words, Google gives the algorithm a pool of "good" sites to learn from and a pool of "bad" sites to learn from (this is probably also true of Penguin and most likely all major future algorithms from Google). If your Website is so differentiated from the norm that it doesn't much resemble a "good" Website the algorithm probably won't know what to do with it.
So while we are pretty sure that Panda is adding some sort of Positive/Negative value/score to each Website it processes, one possibility to consider in your case is that you're getting a 0 score -- neither good nor bad -- and so other sites that are less relevant to your targeted queries are being promoted above you because they have been scored higher.
There is no "main content" on your site -- nothing that anchors it topically, to which users can return. The home page/root URL doesn't act as a nerve center of information for the site so there really isn't any value in that page. And yet, all your pages link back to it. Hence, you're using the lowest quality page as the anchor for the rest of your content and that cannot be a good thing by any measure.
I can honestly say I have seen and worked with Websites that were organized far worse than yours -- their navigation was horrific and presented a terrible user experience. But the fact that we can find examples of worse design doesn't make your design "good".
At the end of the day you have to take the search engine out of the picture and ask your visitors if they found value on your site. If they cannot find anything then they won't report much value. It's good that you have a site search tool for visitors but the presentation of the information does not live up to the promise.
So I think your core issues come down to:
1) Bad navigation. You really do need to implement better connectivity.
2) Bad presentation. You need to communicate clearly to the visitor what they are looking at.
Headers and breadcrumbs don't solve these kinds of problems.
One thing you might consider doing is adding a blog to the site (anchor it on the root URL) where you write an article about each vehicle in your inventory (yes, it WILL take years to publish that much content). The articles should be real and sincere. Each article can link to the appropriate reviews.
Your navigational link anchors need to be richer, not just 1-keyword wonders. It looks like you are trying too hard to avoid intrapage duplicate content. The best approach is to spread your navigation out across more pages, not to reduce the number of words per link anchor.
There are many alternative classifications you can use to illustrate the value of your content. You can provide your visitors with guides to sports cars, family sedans, utility vehicles, trucks, etc. You can write about automobile brands and companies. You can share information about the engineering challenges and solutions that companies deal with. You can organize your content in ways such that microlists of your reviews BECOME content that is useful, informative, and helpful.
I have brought several sites back from Panda downgrades. I didn't invest my time in trying to identify and implement individual fixes. I rebuilt the sites from scratch and completely reorganized the presentation and structure. I changed the concepts. That may not be the only way to fix a Panda downgrade but so far it works pretty darned well.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Free2Write||11/14/12 8:50 AM|
The point was not about bounce-rate per se. The point was that if the site has poor design that has not been vetted by asking actual users about how they react to the site and changing the site based on user feedback then social networking can harm the site by resulting in more negative feedback from the user community. Users do not give positive feedback in any form, bouncing away, organic links, or otherwise, if the site is presented poorly or has ads placed where only an established site should consider.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Michael Martinez||11/14/12 9:36 AM|
I understood your point about bounce rate. However, I operate very successful blogs that have nearly 100% bounce rates. People need to stop fussing over bounce rates because they don't mean anything.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Free2Write||11/14/12 9:48 AM|
There was no fussing about bounce rate. There was fussing about ad placement, site content and layout based on feedback from users, thin content, user-generated content, how users are reacting to the current site, and business plans. There needs to be some data to measure the before and after results; even if the data is the bottom-line.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||lostB||11/14/12 10:02 AM|
Bounce Rate alone is a not a very important factor, but bounce rate taken with time on page is a very important factor
randomly saying you have a site with a 100% bounce rate, and not taking into account the other aspects like time on page, or where they bounce too is just speaking nonsense.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Lysis||11/14/12 10:57 AM|
Something else to consider, and I speak as someone who uses these sites when I'm car shopping is that I am really not interested in some 1000 word description of the interior of a car. I'm a sports car fan myself, and this is one of those sites where if you can't show me a good visual representation, I'ma bounce. I want specs, I want to know how the car compares to other cars in its class, and I want to see pics. People are car SHOPPING when they go to these sites unless they are looking at their favorite unattainable car. Either way, they want pictures.
I haven't looked at the site, but I know Edmunds off the top of my head because I've always gotten what I want from the site.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/14/12 11:13 AM|
There are so many replies (thank you all so much) and so much discussion, that I'm going to try to answer as much as possible in one big post, rather than 7 or 8 separate replies.
If I have to choose between twisting the site to try to artificially fit Panda, or putting the needs of the users first, then the users will win every time. If that means Carsurvey.org never recovers, so be it. I've got other opportunities I can pursue, and Carsurvey.org will just revert to being a hobby.
I've no indexed quite a lot of pages already. But some of the thinner navigation pages actually make sense as landing pages. For example, if someone wants to read BMW 3 Series reviews, and doesn't specify a year, then landing on a simple page with a list of years makes sense from a usability perspective.
I don't think the ad layout is a problem. The ad load is lighter than most sites out there, and the site has been around for 15 years, so it's pretty established.
People want bounce rates and dwell times:
Whole site: 3.43 pages per visit, 2:35 visit duration, 65.95% bounce rate.
Homepage: Avg Time on Page: 30s, Bounce Rate 13.49%.
Manufacturer pages (e.g. BMW): Avg Time on Page: 15s, Bounce Rate 23.66%.
Model pages (e.g. BMW 3 Series): Avg Time on Page: 26s, Bounce Rate 59.86%.
Model year pages (e.g. 2001 BMW 3 Series): Avg Time on Page: 2:06, Bounce Rate 54.08%.
User comments pages: Avg Time on Page: 3:18, Bounce Rate 80.59%.
Reviews are mostly on the model year pages, though some on the model pages. The comments page numbers include no nav pages at all (whereas the model and model year page numbers do).
So the thin pages have short time on page, and low bounce rates, and the pages full of content have high bounce rates, but high time on page, so most visitors have probably found what they were looking for, have read it, and have left content.
I get very good feedback from visitors. The only major complaint is about the constant domestic vs foreign cars arguments that rage in the comments among North American visitors.
The site was a hobby from 1997 until the early 2000's, then became a business, with me working on the site full-time since summer 2005. The business plan is simple; run a good site, keep the users happy, and hopefully make enough from advertising to pay the bills and a decent salary for me. No advertising has been done to promote the site.
Adding content has much to recommend it, but I agree that it's got to be unique, good quality, and relevant. It would also have to be kept up to date. To put some numbers on it, there are 185 manufacturers on the site, and over 2000 models. Producing that much content without it being bland or repetitive is a tall order for one person. If Google sent me a message via Webmaster Tools, telling me to do that, it would be worth the effort, but otherwise, I could spend years doing that, and still not be certain of fixing the problem.
I haven't changed the CMS completely, as it's custom piece of software written be me over 12 years. I did however make extensive modifications to the CMS, such as an entirely new pagination system, that was designed to be Panda friendly.
Neither have I tried to tweak each page title individually. With over 40,000 pages, it's not practical.
I'm guilty as charged when it comes to being rather mechanical in my approach. I'm a coder at heart, not a content person. I had thought that because Google is full of coders, that I would be on their wavelength, and I'd have a fighting chance of figuring out why Panda was penalising my site. It turns out that my approach hasn't really worked very well.
I think Michael Martinez does a great job of summing up what others have been saying, while adding some additional insights. I can see the point about the navigation and additional content, but that solution is a very different site, that would need a massive investment in time to achieve. I'll be honest and say that I'm not sure I'm the right person to build that site, and nor do I probably have the resources to do so. Still, it's something for me to think seriously about. Perhaps I can find some inspiration from some of the sites I read regularly. Otherwise, it's maybe time to just accept that my idea of a useful site and Google's are no longer the same.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/14/12 11:30 AM|
Lysis, that's a fair comment. You're describing Edmunds perfectly, and they do a good job of filling that role. I'm not aiming to compete with the whole of Edmunds' site; just their consumer reviews, which are very similar in nature to the content of Carsurvey.org. People are often shopping when they visit my site, but they're also often trying to investigate problems they have with their current car, and there's a lot of helpful reviews and comments on the site for that group of people too.
Also of note is the fact that my site is global. There are many countries that are less well served by car sites than the USA and related countries, and in those countries where English is widely spoken as a second language, such as Malaysia and the Philippines, Carsurvey.org seems to be very popular.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Free2Write||11/14/12 12:02 PM|
The layout and design was much better back when Google featured the site.
Google measures the usefulness of a site based on content and in large part on how users behave and respond, not age.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Ashley||11/14/12 12:12 PM|
No on is asking you to 'twist' the site to fit Panda in a way that isn't good to users. You've got lots of people here saying they'd all want to see more - more text, better content, better navigation, etc. We're not bots. You're not looking at two separate paths here. I think Michael is right (Gasp! I know!) when he said you're focusing far too much on the technical aspects.
The site looks dated.
The content is lean in lots of places, low quality in lots of other places.
What the site has to offer is lean (nothing beyond consumer reviews - no added value).
The navigation is terrible, so user-friendliness is an issue.
One thing I notice is you keeping talking about how unfeasible it is to make all 40K pages better/stronger. Then don't. Get rid of a lot of those pages. Only keep the ones that are cream of the crop. Focus on what you can do really, really well - better than anyone else. No need to go big/low quality. I really don't understand why you're noindexing low quality content - why don't you just get rid of it? (also, noindexing doesn't mean that Google is ignoring it - they can still see it, they're just not indexing it).
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Michael Martinez||11/14/12 3:45 PM|
"Bounce Rate alone is a not a very important factor, but bounce rate taken with time on page is a very important factor"
That is absolute nonsense. Every time someone says something about bounce rate in a discussion of what may be wrong with a Pandalyzed Website they are just adding a non sequitur to the conversation.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Michael Martinez||11/14/12 3:59 PM|
Someone like me -- who is not a car buff -- would quickly disagree with you. Usability is not a 100% perspective. That is, what is usable to Person A is not usable to Person B. Nonetheless, you're being defensive about the page content and structure and that is typical of someone who sees their site "from the inside". This is a very common response from people -- most of us don't spend much time being the naive consumer in any industry where we invest our time and passion. We become knowledgeable and comfortable with the insider jargon and assumptions.
That's an advantage when you're dealing with other insiders but not such a great advantage when you're trying to reach out to people who are only more casually familiar with the topic than you. Searchers NEED reassurance that they are on the right page. That requires more than what you are providing them.
Their requests may be well-intended but there are hundreds if not thousands of discussions in Webmaster forums where bounce rates and times-on-site are discussed and no one has recovered from Panda. Google doesn't have this information and it wouldn't produce the kind of signal these folks want to believe it should produce.
You're receiving fewer visitors because you have already been downgraded. Whatever your current visitors are doing (which is beyond the search engine's knowledge) won't change anything.
Google doesn't care how much content is on the page. What they care about is how the content you provide is presented to the visitor. Is it doing anything for visitors that isn't being done elsewhere or is it doing something better than other sites are doing.
Right now your presentation style isn't doing as well. If your reviews are indeed unique then you have the unique value you need. You need to reorganize your presentation, structure, and navigation.
There is no such thing as a "Panda-friendly" anything. If we could honestly designate Panda-friendly Website design no one would be struggling to recover from a Panda-downgrade.
Actually, there are people who can do that. I certainly can and I know I am not alone. You should at least give it some thought before writing it off. Don't tell yourself it cannot be done. But don't bang your head against the wall if you cannot immediately think of how to do it.
NO ONE has figured out Panda, and I include the Google engineers who work with it. They may know how it arrives at its choices but they don't know in advance what choices it is likely to make. It's looking at an immense amount of data.
Trying to reverse-engineer Panda is a fool's errand. The criteria for the upgrades/downgrades are not in the algorithm but in the choices the algorithm makes, and those criteria appear to change from time to time. So every time the mix of signals change or the weightings given to signals change, whatever you might have accomplished toward reverse-engineering the process just goes right out the door.
You don't have to do it all at once. You just have to do it. SEO is not an all-or-nothing proposition. Assume you have a ten-year timeframe in which to work. You have already done that once. There is no reason why you cannot do it again unless you are just ready to move on to a new challenge.
You really do have an advantage over many other people who have been downgraded by Panda that you don't appreciate, and that is that you HAVE invested a lot of time in a Website. You should not be nearly as daunted by that prospect as many people who have only been doing this for 2-3 years.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Steve -||11/14/12 11:43 PM|
The amount of work you've done is inspiring, I'm sorry you've not reached your goals yet.
A couple of simple observations;
If content is king in the post Panda world how can it be possible to review/maintain 40,000 pages. I don't intend this as a criticism rather a discussion point. If you are crawling your own pages looking for problems will you ever see what the googlebot sees? The summary of the advice I've gleaned on these pages, and that I've filtered from the Google communication channels and SEO experts is that; well written unique content in the single biggest ranking factor by far. If this is true, moving towards more quality and less quantity might also future proof the site?
I'm new to these pages and not technically very literate but I've encountered time and again the theme of the relationship between advertising and content in the post panda world. The number of advertisements, the location of the advertisements and the size of the advertisements all seem to matter both on a subtle and more gross level. I can only imagine that pages with a banner ad above links and thin content might not rank well.
Good luck in every case.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/15/12 2:57 AM|
The message about the extra content, usability and navigation is getting through. I haven't decided exactly what to do about it yet, but I do appreciate that lots of smart people are saying very similar things, and it would be a mistake to ignore that feedback.
Getting rid of even more content doesn't feel right to me. Visitors very kindly spent the time to write those reviews, with the expectation that they'd be published, and for me to just cherry pick only a fraction of them for publication, seems wrong.
I aware that Google does look at noindexed pages, but I'd be amazed if noindexed pages counted towards Panda. Lots of sites have effectively an infinite amount of noindexed thin content via their search results pages, so it would be very poor signal to use for rating a site's quality.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/15/12 3:27 AM|
I didn't intend to reverse engineer Panda itself, but rather figure out some candidate signals of Google's 200+ that might be important to Panda. I thought that if I improved enough of them, Panda would see the difference, without me having to know the details of its internals. At one point it did actually cross my mind to build my own machine learning system, feed it a data set of sites that had been hit by Panda and a control set, to see what patterns it would come up with. Then I came to my senses, and realised that it almost certainly wouldn't work, and even if it did, Google wouldn't be impressed, and would just change the algorithm again. You're right that focussing on the users is the best approach, and hopefully Panda will eventually recognise any improvements.
What I need to do now is spend some time examining the design and usability of other sites that are similar to Carsurvey.org, to find some fresh ideas on how to improve the usability and visitor experience. After 15 years, I am far too close to the site and its subject matter, and you're right that I need to force myself to look at the site from the point of view of a new visitor.
I do have some other challenges that I want to pursue, but the site means a lot to me, and I'm not quite ready to give up on it. I'm probably going split my time between some new projects, and trying to improve the site further, with the deciding factor being the quality of my ideas.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/15/12 4:23 AM|
My solution was to write my own bot to judge quality. It probably helped, but it clearly wasn't effective enough.
For a new site, quality is definitely preferable to quantity. The problem is that my site's strength historically was in the variety of content on it, and though Panda has hurt the site badly, the amount of content does at least ensure the site gets reasonable long tail traffic, for queries where no other site has anything suitable. I did remove the content that I detected as being poor, but can't bring myself to take things further, and remove all but the very best content from the site.
I had assumed that a single ad on a navigation page would be fine, but I am now questioning that assumption. The nav page ads aren't very effective anyway (86% of the ad revenue comes from the main content pages), so I'm probably going to remove ads from all the navigation pages, in case the ads plus links mix you mention is a factor.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Seppo Puusa||11/15/12 5:54 AM|
I feel for your pain, with all that effort and nothing to show for it. I agree with you that there is room for sites like yours in the shopping experience. It's nice to read slick-looking and detailed reviews from big brand sites, but I can never bring myself to really trust them. I would always also like to see user submitted reviews. So if I was shopping for a car I would probably spend quite a bit of time on your site.
With that said, I can see problems from SEO perspective. I'm just going to mention some things that immediately jumped on me, and there's no point to repeat what others have said.
Take a look at this page as an example: http://www.carsurvey.org/reviews/subaru/forester/2001/single-page/
What you see is a long list of links that more or less say the same thing. I can see how Google would see that as a strong spam signal. I have a website on the acne niche, and 95% of the sites are either scraper sites, autoblogs or have cheaply outsourced, extremely poor quality content. And it struck me just how much your site resembled what I see on those sites (from this link perspective). Basically they have tons of keyword optimized anchor text links, like acne diet, acne milk, acne sugar, cure acne with little surrounding text explaining the relevance of the link.
In the case of your site, having every link say Forrester [something] is not very helpful, especially since you only show the summary of the review under the link. Often times the summary is not very helpful.
I would get rid of all those 'keyword rich' anchor text links immediately. I do think the content on your site is both desirable and helpful to users, it's the presentation that's below par.
Perhaps try showing a bit more of the review on the page, say 200 words or so, like normal blog post excerpt. And then have a read more link. I might also use the summary blurb as bolded 'title' for each review.
One more reason to get rid of those keyword rich links, keyword stuffing. Open the page in Firefox, search Forester on page and click 'highlight all'. Can you see the problem? You have Forester ALL over the page. The search found 61 instances on the page. With total word count of around 740 your keyword density is close to 10%. Massive spam signal, and so 1995 :)
Again, I think your site can be very helpful to the users, and it's my humble opinion that it SHOULD rank. I hope these observations can help you to get back there.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Honest_Stan||11/15/12 7:37 AM|
It is quite depressing to see (as usual) a site being analysed under a microscope by the wise owls on this forum.
You have been caught up in the Google's broad brush of spam removal. The site is great and should be placed and ranked accordingly.
Hopefully at some time in the future the algorithm will be a little more accurate and will not hack down good feature filled and content rich sites.
Sadly Google points those who are loosing their livelihood to this forum for help - this is a kick in the head. The solution is not changing the layout of page or adding images to improve the user experience.
I'd consider a rebuild of the site on a different domain if you can face it. But what may work is to partition off sections of the site into sub domains thereby giving each sub domain more relevance and hopefully better ranking.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/15/12 7:42 AM|
Seppo, thanks for your feedback, and for your very kind words.
That page you mention, and all similar pages on the site, are noindexed. I actually removed those pages from the site at one point last year, but some of my regular visitors were really upset by their loss, so I restored them to the site, but made sure they weren't in Google's index. Still, it's possible that the keyword density and links you highlight may be causing a problem, despite the noindex tags.
Given the Matt Cutts video about duplicate content I mentioned in a previous comment, I'm a bit more confident about using excerpts than I was. The lack of details from Google often creates a climate of fear, but smallish excerpts sound like they're safe, especially if they're marked up in a blockquote element with a source link.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Steven Lockey||11/15/12 8:33 AM|
"Honest Stan", what an ironic name.
Try living up to your name and been honest. The people commenting here know a LOT more about the way Google works than you do. Don't give people faulty information when you clearly have no clue what you are talking about.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Ashley||11/15/12 8:40 AM|
Honest Stan says:
Why would putting the site on the domain change the ranking if the primary issues are the content? What a waste of time!
Why do you think subdomains are a good idea? I'd only ever separate content if it is TRULY separate. The entire site is focused on car reviews and it'd be foolish to separate it into different subdomains, in my mind.
I think it's important to have different opinions, but you have to respect them. I respect yours when you say the site is good feature filled and content rich sites but the majority of people looking at the site would say the opposite - the content, although plentiful, is often low quality and the site is lacking the extra features that would make me, as a user, go to other sites (like I linked to above).
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/15/12 9:25 AM|
Stan, thanks for your kind words and feedback.
Some of the replies to your comment were unnecessarily harsh, but I agree with their central point, that moving the site onto a new domain, or splitting it across subdomains, is unlikely to be a long term solution to the problem, and even if it provided a short term boost, it would be very confusing to my regular visitors. There was an airline flights section of the site due various accidents of history, and I did split that off onto its own domain, but I think it makes sense for the car reviews to stay on the Carsurvey.org domain, and for my efforts to be focussed on the navigation and content issues mentioned in previous comments.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||eyepaq||11/15/12 1:31 PM|
I think most of the messages here are off topic.
The issue is not how to improve the site overall (basically it is but it's out of scope here) but how to get out of the filter (Panda).
I would say you need to focus on the ones that are on top in serps now for your area - you just need to be either better then them or different. You don't need to be perfect.
There are a lot of sites that are way worst then yours but in different areas or niches and those survive or recovered if got hit.
I've seen several sites recover from panda just by approaching everything different and those got back since they had something different to bring on the table not the same as the ones that were there on the first page.
On the other hand I've seen sites recover just by getting above, just a noche, the ones that were up there - and by above I mean just a little better, but far from perfect.
Ofcourse improving your site will help - and that is a good thing but that dosen't mean you will recover.
Or maybe I am out of scope here :)
|Re: How I failed to save my 15 year old site from Google's Panda (long)||RalphSlate||11/17/12 10:38 PM|
Here is what struck me: you have a hierarchical navigation design to get to your "money" pages, but you're including all the intermediate step pages in Google's index.
A page like this doesn't really add much to Google's index:
There's really nothing on it except some linked years. You probably have dozens of pages that look very similar. I understand how the page fits your hierarchical architecture, and I understand that your intention is that someone arriving at this page from Google would then go to the next level, but the crawler doesn't know or care about the next level - it cares about the page it is returning, and it's just not going to return that page ever. This page is like a "search results" page with no content that anyone would want on the specific page. I'd noindex it - the way I noindexed the search result pages on my own site. You can still keep the navigation, but don't clutter Google's index with it.
Now let's say that someone is searching for Dodge Avenger reviews without a year. There are 10 pages on your site that apply - so maybe they get the 2008 Avenger page in Google, but they really wanted a 2007 Avenger. So make the 2007 link available on the 2008 page as a "here are some other years of Dodge Avengers".
What might this do? I'm no "link juice" expert, but from what I understand, it focuses the link juice onto the pages that are most important and away from pages that won't satisfy a searcher.
You have 46,000 pages listed in Google. If you have 185 manufacturers and 2000 models, 46,000 pages seems a bit much, even considering that each model can exist for a number of years. There's not much consolidation you can do with your money pages (the model/year pages), so maybe the key is to only index the pages that are useful to searchers, and deindex the other pages.
I did a search for [1999 Oldsmobile Alero reviews]. I got your page as the #6 result on page #1. I then searched for [1995 Volkswagon Passat Reviews]. Again, #6. That seems pretty good to me since you're in with the heavy hitters like Edmunds and KBB. Is it possible that maybe you were getting incidental traffic that was not related to your site before Panda hit, and that the Panda drop is legitimate? Have you analyzed at your logs to try and figure out where the drop occurred?
Is it possible that in the past, you were getting hits when someone searched for "2002 Volkswagen Jetta" instead of "2002 Volkswagen Jetta Reviews"? I notice that when I search for the former phrase, I get mostly "for sale" results. Maybe Google simply decided that when someone searches for generic car brands, they are usually interested in buying them, so they no longer return review sites, or return less than they used to, and your site no longer makes the cut because it's the 6th best review site and they're returning just 3 reviews.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||StevieD_Web||11/17/12 11:13 PM|
+2 to Ralph
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/18/12 7:57 AM|
Thanks for the great feedback Ralph,
Your comparison to a search results page makes sense. There's over 1000 pages with a list of model year links in Google's index. That's not a huge percentage, and they are performing a useful function for visitors who start at the site's homepage, but as a landing page for someone unfamiliar with the site, they're probably disappointing and confusing.
Many of the navigation pages are already noindexed, but I clearly need to be more aggressive about noindexing things.
I'm still considering what navigation and content improvements can be made to the site, but as a shorter term improvement, I'm almost certainly going to noindex all of those year lists, and will probably also remove all ads from navigation pages, including the homepage. Maybe even the single leaderboard ad on those pages was too much, considering the navigational nature of those pages.
I'm still convinced I was hit by Panda, as the traffic drops and dates match perfectly. There is however considerable variation between different searches. The site does better with older models, and tends to do better in the USA than elsewhere. I'm in the UK, and when I look at google.co.uk results, the Panda effect seems to be larger than I see when I force a US based search on google.com.
The drop was more noticeable on less specific searches like [oldsmobile alero reviews] than [1999 oldsmobile alero reviews]. Basically, the more generic the query, the harder the site was hit, though most of the queries are either review focussed, or long tail obscure queries. The exceptions are rare cars, where the site will often rank for just an [oldsmobile alero] type query, but I think that's just a lack of relevant pages. The site used to rank number 1 for example for the query [subaru reviews], and now it's result 11 (so top of page 2). Still not a bad result for a one man operation, and I'm certainly not claiming that I should be back at number one, but those examples are the best case scenarios, and there are many others where the site is much further down the ranking, behind often completely irrelevant results (especially here in the UK).
That said, I do suspect that while Panda is the biggest factor in the site's recent drops, there have been other changes to Google's main ranking algorithms that probably did not favour Carsurvey.org, including the increased importance of freshness, which sometimes leads to some odd results, but overall, is probably an improvement (though I wish there was an option to disable it in the Google UI).
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||wcspaulding||11/18/12 1:19 PM|
The way the Panda algorithm works in regards to copied content is it divides all websites into what I call Panda-trusted websites and what Google refers to as the un-trusted sites. Panda-trusted websites are the websites of governments, educational institutions, major businesses, other major sites, and Google Books. In order for the Panda algorithm to work, Google had to decide who to punish. Since Google did not know who the authors of the content were, it made an assumption: that the Panda-trusted websites are the originators of their content and that all other websites with matching content have copied that content. However, in order for this matching-content algorithm to work, it cannot be applied to the Panda-trusted websites, since many of them do have matching content with other Panda-trusted websites, since, in fact, they are not the originators of all of their content. For instance, many of the Panda-trusted websites have public domain material that would match over many sites, so without this division, the Panda algorithm window which site to demote. But because of this division, if an un-trusted website has public domain material that is also on a Panda-trusted website, then the un-trusted website will be penalized, even though neither site is the author of that content. This is why so-called branded sites are not affected by the Panda algorithm, so they tend to rise because other websites are penalized by the Panda algorithm.
A fundamental problem that I see with your site, as with most other forum or user-generated content sites, is that since most of your material is written by the users of your site, there is a good chance that they have also uploaded that same material to one or more Panda-trusted sites, in which case, the Panda algorithm will flag your site as having copied content, even though that is not actually the case. I don't know how you would be able to prevent this, so it may be a problem that you cannot solve, especially since you would have to verify that each submission by your users was not also placed on a Panda-trusted website, and you would have to check for this continuously since they could upload the same material later, after placing it on your site.
Another thing that you should be aware of is that the Panda penalty seems to be a timed penalty. For instance, my own site was hit on April 11, 2011 because I had reformatted IRS articles in a special folder; 2 days later I deleted the entire directory. I have read everything I could about the Panda algorithm and just about everything else that Google publishes in regard to quality guidelines. So I am very certain that my website was demoted only because of the IRS articles. However, after more than a year and a half, I have yet to recover. This also comports with the way they do manual penalties, which generally lasts for a specific amount of time. Amit Singhal also basically said this in a Wired interview that I reference in my link below, in which he said that a website "could eventually" recover if the problem is fixed.
During the time that I have been penalized, I have gathered as much evidence and information about the Panda penalty as I can, and I have placed it on this page: http://goo.gl/b3bv5
I hope this info helps you.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||ShopSafe||11/18/12 2:40 PM|
I don't know why anyone would -1 eyepaq's comment. It's not the only good thing in this brilliant thread but to this dummy it's the most concise and looks likely to be helpful.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||RalphSlate||11/18/12 5:08 PM|
I think the drop from #1 to #11 ranking is a sign of a penalty. I had the same exact problem. Once the penalty went away, I went back to #1. I have no idea why I was penalized, or what I did to get un-penalized. As someone else mentioned, penalties are time-based, so you may have already solved your problem even though you don't know it. I was penalized from April 24 2012 to October 11 2012. Then, like a switch was flipped, the traffic came back.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||devcfc||11/18/12 5:19 PM|
Msg to Ralph,
April 24th is the date Penguin was unleashed
Oct 5th was a penguin refresh, when you may have recovered.
Could you kindly post your url and mention any changes however minor, you made during April to October as a lot of people would be interested.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/19/12 6:49 AM|
Thanks for the feedback wcspaulding.
I'm sure copied content is part of Panda in the way you suggest, but I think it's not quite as dominant as you imply. I suspect what happens is that Google's manual quality raters are asked to rate a large selection of sites. That data, together with Google's 200+ signals, is then used to train a machine learning system (like a bayesian classifier or a neural net), such that it can get as close as possible to the results produced by the manual raters. The trained up machine learning system is then let loose on a snapshot of Google's search index, and each site is rated and given a score. Sites that don't score high enough with the machine learning system are then assigned a penalty in Google's search results. This is very similar to how spam filters work, except the training is done behind closed doors, instead of by the users in the case of spam filters. A system like this is capable of detecting non-obvious patterns that a simple scoring system would miss, and it also means that sites can be caught by Panda for entirely unrelated reasons. Such a system will also adapt automatically to changes on the web, as long as Google makes the effort to retrain it with manual raters. Unfortunately, like a spam filter, there can be legitimate false positives, and if machine learning is involved, it's not easy to give a definitive reason why a site was hit by Panda, as a complex interaction of tens of signals is likely responsible, rather than one or two simple reasons.
If my theory is the case, the solution would be for Google to add false positives to the training set for the algorithm, which wouldn't guarantee a recovery, but would make it much more likely. Unfortunately no such process has been acknowledged publicly as far as I'm aware.
Regarding your navigation suggestion; the site is too large for me to put all the links on one page, and I also suspect than too many links on a page may be a signal that contributes to being hit by Panda.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/19/12 6:56 AM|
It may well be a penalty, but when I did a reconsideration request, I got the stock message saying no manual penalties were in effect. So whatever it is, it's algorithmic. I'm convinced there's a time based component though, which may well be semi-random, to make it harder to determine cause and effect. I'm sure that if the site ever recovers, like your penalty, I'll never know why for sure, unless Google decides to be a lot more open about Panda than they've been so far.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||KiwiSi||11/19/12 3:04 PM|
s'funny...when I type in "car surveys' it comes up number 1....
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/20/12 2:28 AM|
I guess that's because Google sees that as a navigational query. It's not number one for me here in the UK though; it's number 2, behind a whatcar.com. However the site is still number one for "car survey" without the final s.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Alex_in_HK||11/20/12 3:29 AM|
Nice post actually, you have done plenty.
I do want to point out that while you may have unlinked text urls on your pages, Googlebot now seems to consider those as links anyway, and will look at where they point to. It's probably better to actually let them be links, but put no follow tags on them, or to come up with a display method that does not allow Googlebot to see them as URLs.
As for ads, Matt Cutts and others have made it pretty clear that the amount and location of advertising can have an effect on the overall considered quality of the site. If the ad layout over the whole site is similar (ads left and right, content in the middle) you may also get dinged for having to much template compared to too little actual content. You may want to consider using a much thinner layout, especially on post pages, as the ratio of content to template might end up getting you dinged.
Duplicate content is probably one of the areas where Googlebot is weakest right now, especially when it comes to user submitted content that may appear on more than one site. If for whatever reasons your site isn't considered the best, most authoritative, most linked, relevant, or whatever metric they use, you can end up being the one dinged for the duplicate content. It is pretty hard to get around this on a user contributed site, and for the moment at least, this appears to be a very powerful ranking item for Google.
Looking at your site, I would suggest a couple of things (as it appears today):
1) Assign each new review a definite URL, a "permalink" where it can always be referenced. It might be worth changing your pagination system to link from one permalink to another, rather than trying to sequence the reviews on pages, as every time someone adds a new review, all of the other ones "move".
2) on pages such as this: http://www.carsurvey.org/reviews/peugeot/ , you are likely dinged because your important term Peugeot only appears once or twice, but the word "reviews" appears to the point of appearing to be spam. You may want to consider a layout change that doesn't have any single word repeating that often.
3) Also, between the index and first page in ( like the example above) is very, very light on actual content, and heavy on internal links. I am not 100% on this, but it seems to be a lack of content that is really getting you hit hard. You may want to consider putting some quotes from the newest reviews, or exerpts that link to the content directly, rather than just dry links.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||RalphSlate||11/20/12 4:32 AM|
Steven, I think there are algorithmic penalties as well. Nothing else explains my sudden drop in traffic on April 24, and sudden return on Oct 11. Google may not call them penalties per se, but at the very least I was seeing a "-10" on dozens and dozens of pages that is now gone.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/20/12 6:21 AM|
Thanks for the feedback Alex,
All the plain text links have been checked and converted to real HTML links, with sites I'm unfamiliar with being nofollowed. Any broken links were fixed or removed, so hopefully the links are no longer an issue.
For ads, I'm now careful to keep the ad to content ratio sensible.
The site originally had one review per page when Panda hit, but as many of the reviews are not long, it created a thin content issue unless I noindexed almost all the reviews. The redirects from the old review URLs effectively work as permalinks, but this is not exposed in the user interface. By generating a daily sitemap file full of lastmod dates, and using rel="next" and rel="prev", Googlebot should be able to quickly respond to pagination changes. Whether it does this effectively enough is debatable, but going back to over 100,000 pages of single reviews, many of which would be quite thin, seems unlikely to help.
Your point about repeated use of the word reviews is well made. It's there for good usability reasons, but I may have to switch to the word reviews as a table header. It's sad that fear of Google's algorithms necessitates such choices.
Yesterday I noindexed all the model year pages (lists of year links for models) that hadn't already been noindexed, and also noindexed some of the smaller manufacturer pages, although not ones as popular as your Peugeot example. Hopefully that should make a useful difference to Google's perception of the site. I'm still considering other measures to improve those navigation pages (such as more content), but this is a quick fix that might help in the meantime.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||11/20/12 6:22 AM|
That's interesting. I shall keep my fingers crossed for a similar recovery.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||mark munroe||12/7/12 10:58 AM|
Sorry that you have had issues with Panda. I have some ideas (and you can read more about my Panda at my blog www.afterpanda.com).
My thoughts are that your attempts (and most recommendations about Panda) are really shooting blind - you are taking a machine gun instead of really understanding the problem. IMO, Panda is not a duplicate content algorithm, it is not a thin content algorithm, it is not a too-many-ads algorithm, it is not ulgy site algorithm, or a site performance/speed algorithm. And yet all those things could have caused the Panda hit.
It is a behavior based algorithm based on metrics that Google collects - I have speculated what those are on my blog, but what it really comes down to is users are not getting their questions answers. Any problem or issue with your site that prevents users from getting their problems answered - provides the possibility of a Panda hit.
With everything you did to your site, I didn't see anything about analyzing your visitors they came to your site in the first place? You need to understand why they are going to your site and how can you satisfy their needs so they don't go back to Google and give Google whatever signal indicates they did not get what they want (most likely a click to an alternate site among others). Understanding your customer - really understanding your customer and not projecting what you think your customer wants - and pushing that percentage up is the way to battle Panda. Use a survey tool to understand you users. I have used SurveyMonkey but I think I will use Qualaroo in the future because it is less intrusive and you can also segment to only survey visitors from search engines. Suppose 25% of your visitors just want to see a picture. You are guaranteed to not satisfy that 25%. But don't guess, learn what is important and work the most important issues as you learn them.
This is not to say there some global things that makes sense and you have done some of those things. Really thin content is of course bad because it likely cannot satisfy your users - but probably it didn't get that much traffic anyway so it wasn't dominating the statistics that Google collected. Site design and navigation really matters and affects all your users (and thus all the metrics that Google collects) - and your site looks quite dated (if users don't trust a site - they will abandon and continue their quest elsewhere).
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Alex_in_HK||12/9/12 4:12 AM|
Mark, I think that to some extent, you may be missing the boat by looking at the symptoms and thinking that is the disease.
The real roots of the issue with Google is a basic decision as to the type of site and site product they consider "good", and what they consider bad. That combined with a somewhat odd dependence on social media to guide them appears to be leading them down a very narrow path of reflecting the public's opinion back to them. What this means is that the highest ranked sites for many terms are not the best sites, not the sites with the deepest content, not the sites run by people enthusiastic about the subject, but rather sites that have managed to gain a certain amount of links via social media. So the people talk about something, which moves it up in importance, which in turn gives it more exposure in search, where more people are likely to talk about it - feedback loop.
In some ways, Google is reinforcing viral.
The problem is rather than having the best search results, Google becomes more and more of a social mirror, giving the people MORE of what they already know.
From what I can see, poor sites with lots of viral / social links are doing well, good sites without it are dead in the water.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||12/9/12 6:21 AM|
Original poster and site owner here again.
Thanks for the great feedback Mark.
Your points about behaviour are well made and valid, but I do suspect that Panda is looking at basically every signal Google has about a site, whether it's behaviour based or not.
Focussing on the users makes a lot of sense, and I have used Feedback Army in the past to poll potential visitors, as opposed to the existing visitors already on the site.
As a more general update, the site experienced a partial recovery from Panda on the November 21st run. Traffic from Google is up about 40%, and whereas the site would often languish on page 20 or 30 of the search results for some queries, it now seems to be more like page 3, at the tail end of the higher quality sites. So if anyone wasn't 100% sure that it was Panda that had hit the site, this should help convince you.
The rise in traffic doesn't take the site anywhere near to its old traffic levels, but a lot has changed in the past two years, and I don't expect the site to ever reach those heady heights again. I'm just going to continue to work on improving the site, and will also be working on some unrelated projects, as despite the improvement, I feel that working full time on the site is a risky career choice over the longterm.
Just so people are aware, I did make some changes that were discussed elsewhere in the thread just before the last Panda run, but it was literally a few days before, and given the size of the site, I doubt they would have been baked into Google's index quickly enough to have made a difference. I've had no communication with anyone from Google, so no idea if this thread had any direct impact. Maybe something was tweaked on Google's side, maybe the last minute changes worked, or maybe all that work over the last few years took a long time to have the desired effect?
The lift on November 21st has persisted, although traffic is sliding a little, probably due to seasonal effects. It looks like I'll have to wait for the next Panda run before I see if this is the start of something bigger, or if this is as good a recovery as I'm going to get.
Anyway, whilst things aren't back to the way the way they were, it's at least nice to be moving in a positive direction, and to be ranking as a reputable site.
Thanks for all the helpful advice on this thread, and if any Googlers took the time investigate things internally, it's very much appreciated.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||12/9/12 6:41 AM|
I agree that Google seems to be caught in a feedback loop. I'm sure they're aware of the dangers of this, and they've talked about trying to improve the diversity of their results, but I think they need to do better in this area, otherwise it's going to stifle innovation on the web.
The web used to be the most meritocratic place in the world to showcase your efforts, but I suspect that the various app stores (including Google Play) are now more fertile ground for new work. Both are very competitive markets, but the app stores don't seem to inherently favour the big brands, and a small upstart with some good ideas can compete effectively against an established developer, despite the massive disparity in resources. I hope the web moves back in that direction, but I'm not holding my breath.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||Steve -||12/9/12 9:19 AM|
I think this is excellent news, Panda recovery often comes in fits and starts, sometimes over; two three and four refreshes so there may be more to come.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||12/10/12 10:19 AM|
Yes, I've often seen reports of Panda recoveries being staged, so I'm keeping my fingers crossed for further improvements.
|Re: How I failed to save my 15 year old site from Google's Panda (long)||TimKoster||12/12/12 9:54 AM|
I can't tell you enough how bad I feel for you. Google's Panda punishment is a nightmare you can never seem to wake from. We've been the #1 public records site from the time that Google started-- but got hit hard by the first and second Panda releases and lost 80% of our traffic. Like you we put everything we had into finding and fixing the problems: Canonicalization, site revisions, better navigation, getting rid of duplicate content, adding helpful guides (written by us), and a host of other site fixes and improvements. Then came the day last month when our Panda curse was lifted and all our rankings went back to where they'd originally been. It was a glorious day-- that meant we could keep our doors open, keep our home, pay some of our bills, maybe hire some people back-- but most of all, really work on making the site better with improved search features and navigation-- plus an interface for mobile devices. Just as we were putting all our plans in place, yesterday Google hit us again and we're crushed all over again. I for one am sick to death of the punishment without a crime, the sleepless nights, a wife in tears, and the constant threat of going under. If we were doing something wrong or spammy I'd understand, but we offer a free service that has been copied by countless sites. We're used for research by journalists, universities, banks, stock exchanges, we're recommended by the Library of Congress, and we're used for training purposes by the Homeland Security's Law Enforcement Training Center.
So you're not alone. There are many others who know too well exactly what you're going through.
Tim Koster http://publicrecords.searchsystems.net/
|Re: How I failed to save my 15 year old site from Google's Panda (long)||distantparts||12/12/12 12:45 PM|
Thanks for your kind words. Your story does sound very similar to mine.
In my view, Google needs to offer better help to sites hit by Panda. They have been improving their feedback and appeals procedures elsewhere, but even compared to the more recent algorithms such as Penguin and Top Heavy, Panda is the most complex and mysterious, and the lack of any formal feedback or appeals process makes it very, very difficult for legitimate sites to escape its clutches, even with the best of intentions and an incredible amount of hard work.
I've just had a quick look at your site, and I can see that it's useful resource, especially for the users you mentioned, and it has a similar hierarchical structure to my site. It does though, also have very high use of repeated phrases. For example, if I click on the Births link on the front page, the next page mentions the phrase "Birth Records" 58 times, and that phrase is also in the URL, page title, meta keywords and meta description. Perhaps use a few alternate phrases, and remove the phrase birth records from each of the links in the big links table? I have a bad feeling that having such heavy repetition of key words and phrases, even though I can see the usability argument for it, may be causing your site to attract Panda's attention. I had similar issues on my site, and tackled the issue by merging some pages, noindexing others, and then following advice in this thread, I also dealt with a few some keyword over-usage issues that I had overlooked.
I don't know if any changing any of that will help, but maybe it's worth trying?
I hope you have some better luck in the next few months.
Best Wishes, Steven
|Re: How I failed to save my 15 year old site from Google's Panda (long)||TimKoster||12/12/12 5:32 PM|
Many thanks for the feedback. We are aware of those duplications and are working on them now. We had two items we changed in November that did make a difference-- we had a link called "Report a broken link" after each resource. We recognized that would be seen as duplicate text and changed it so that the "report" phrase only appears at the top and bottom of each page (we're going to take that a step further and change it to an image). We also had over 5,000 links to Code of Ordinance databases. Since there are only about eight companies that formulate those databases for public agencies, they all work the same-- so we had essentially the same description for each one. Since they're scattered throughout the site and most were added more than 10 years ago, we just didn't think about them. But when I realized that they'd be taken as duplicate text, I suspended all of them from appearing and three days later our traffic went back to Pre-panda levels. (only to be hit again even harder yesterday).
As to why Google isn't more helpful to Panda victims, my theory is the 1) they don't see Panda as a penalty. They see it as a reward for sites that are specifically built with Panda in mind. 2) Google uses a tautological definition for Panda and "bad" sites, and many people in this forum parrot it: That is if you are bad you will be hit by Panda; and if you were hit by Panda you must be bad. It seems to be inconceivable to Google's engineers that they might have made a mistake. 3) Google does not provide assistance to sites who have been penalized because of "2" above, but also it's because if they provided feedback to "good" sites, then the "bad" sites would use that information to outsmart Panda. And since only "bad" sites get penalized (according to Google), and there aren't any penalized good sites to worry about, then they feel they're in the clear.
At least that's my theory. The real problem is that we're dealing with an algorithm, a mathematical model based on Google employees assumptions, without seemingly any thought as to whether some of those assumptions are correct.