Categories: Crawling, indexing & ranking :

AngryPenguin: looking for Penguin advise (links or otherwise)

Showing 1-105 of 105 messages
AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/4/12 9:18 AM
Hi AngryPenguin --

I started another thread at your suggestion so as to not hijack the other thread any more.

My site is http://www.hockeydb.com. It has been online since 1998, and is a very respected reference site for hockey statistics. It is used by every hockey league out there, including the NHL. 

As I mentioned, I have never participated in any link exchanges, nor have I bought or sold any links. I have a lot of links because people really like my site and they have linked to it.

On April 24 my site in general was seriously deranked by Google. I have attached a picture below. 

People come to my site primarily to view hockey player profiles. I'll give you an example of where I think I should come up in a search - if you search for [Jack Hillen], I think that my site should be returned in at least the first couple of pages. Here's my page:


Hillen was just signed to a free agent contract, so he is in the news today. Most of Google's results right now are the same AP article across many different sites. They all have the exact same headline. That's a separate issue, I think. However, I'm up to page 17 and still can't find my page.  Here's the odd thing though: when you type in Jack Hillen to the search bar, the fourth "suggestion" is "Jack Hillen Hockeydb". Additionally, at the bottom of the first page of search results, a "related search" is "Jack Hillen Hockeydb". So enough people are actually searching for that phrase so that Google ranks it - but Google won't return my site in its results. 

Another example which isn't clouded by recent news is [Craig Janney]. Again, I'm not even in the mix of sites, although I recognize this search term is relatively competitive, with many relevant pages returned. What troubles me is that I don't even appear after the non-relevant pages start showing. I'm nowhere to be found. Here is the page I'd expect:


Let's try a more obscure - and thus less competitive - player. [Pete Pleban]. My site comes up as the first result on page #2. I see that position more than I should, which leads me to believe that there is a penalty being assessed. 

Now let's try a more recent obscure player. [Dominic Plenzich]. First result on page #2 again. See what I mean?

Here's a really odd case. [Matt Generous]. My site is explicitly referenced 2 times in the "related links" - but never comes up in the general search. 

I am not deranked for every search. There are plenty of players  where, if you type their name, my site comes up on the first page. However in many cases, my site is no longer returned 

I am trying to capture two primary audiences: people who search for hockey players (usually current), and people who Google themselves or their friends. I'm not doing well in either category. The latter category only works well when the name is less common -- I don't expect to be returned when someone searches for Doug Smith or Bill Gates, but when someone searches for Mike Gerstanbuhler or Josh Gianotsos, I think that my site should be there. 

Likewise, when someone searches specifically for hockey players by typing in [Daniel Gentzler hockey] or [Daniel Gentzler stats], I'd expect my site to be returned. It certainly used to be, and still is for some players.

The advise I have received so far, although well-intentioned and certainly relevant, just isn't helpful. People say "include a bunch of text with the player profile". Sure, that might work - but that isn't what my site is about, and I am not being outranked by sites that do this. I'm being outranked by junk sites, or sites that cloned Wikipedia. Or they give the more generic "your content is too thin". Sure, I don't have a 500 word essay on each player, but neither do other sites. Or they say "you have above the fold advertising on your site" - but so does everyone else. 

I have made some changes since April 24. 
  • I put rel=nofollow on the 10-12 links on my links page. I originally had put those links there organically, but this is the advice that people gave me.
  • I solved a bunch of duplicate title/description problems, such as when two players have the same name.
  • I included a canonical tag on all my player pages so that a slightly different query string would not make it appear as though there were two pages.
  • I completely blocked a development server which had some (but not many) pages indexed in Google - duplicating the page on my main server.
  • I removed some ads on player pages where the player doesn't have much of a career. Those pages have just 1 ad on them now.
  • I have put "noindex" on some pages that were automatically generated, such as a team that has no player information on it, etc.
  • I have asked some forum sites where my site is linked like a blogroll to remove the links. That had resulted in profiles such as 25,000 links to a single page on my site.
  • I have a widget-type javascript tool where someone can embed stats from my site on their own site - with links to my site - I changed all links to rel=nofollow, even though even Danny Sullivan himself doesn't do this on his widgets. The tool was never meant to try and capture pagerank, the links are there to get people to click them and visit my own site.
Nothing seems to have worked though, and no one has been able to offer anything more than generic suggestions such as "remove the ads" or "beef up the content".

Thanks for any advice you can supply.

Ralph

Re: AngryPenguin: looking for Penguin advise (links or otherwise) symbiotic 7/4/12 9:34 AM
EMERGENCY YOU HAVE NO DOC TYPE IN YOUR page SCRIPT at the top of the script is always the first line. am i right or wrong.

<html>
<head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">



The DOCTYPE Declaration (DTD or Document Type Declaration) does a couple of things:

  1. When performing HTML validation testing on a web page it tells the HTML (HyperText Markup Language) validator which version of (X)HTML standard the web page coding is supposed to comply with. When you validate your web page the HTML validator checks the coding against the applicable standard then reports which portions of the coding do not pass HTML validation (are not compliant).
  2. It tells the browser how to render the page in standards compliant mode.

What Happens If the DOCTYPE Declaration (DTD) is Not Included or is Incorrect?

If the web page coding does not include a DOCTYPE Declaration (DTD or Document Type Declaration) or it is done incorrectly:

  1. You will not be able to use a HTML (HyperText Markup Language) Validator to check the page coding. HTML validation requires the DOCTYPE declaration.
  2. The browser rending the web page will process the coding in Quirks Mode.
  3. The stylesheet may not be implemented as planned.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) symbiotic 7/4/12 9:38 AM
http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=2505 this does have doc type
http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=73773 this does have doc type

http://www.hockeydb.com/ this does not have doc type i take its HTML 5

anyway thought i would just mention that i have to go,. leave to these experts to look out
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Phil Payne 7/4/12 9:39 AM
"When I look in WMT, I see 3,711,671 links. These were all given to my site naturally ..."

Yeah, riiiight.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/4/12 9:41 AM
Phil, I would like you to retract that libelous statement.

I will swear an affidavit that I have never bought, sold, or traded links. Are you willing to do the same stating that I have?

Ralph
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Phil Payne 7/4/12 9:44 AM
http://members.multimania.co.uk/nozafoce/single-women-in-st-bernard-de-dorchester-quebec.html

<td·width="100%"·height="19">
<p·align="center"><font·color="#FF7F00">Looking·for·a·ltr·with·a·great·man!,·single·women·in·st·bernard·de·dorchester·quebec·<a·href=http://www.hockeydb.com>everly</a>,·k-ar·and·rb-sr·dating·on·prec</font></td>
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/4/12 9:47 AM
I did not put that link on that site, I and would both swear an affidavit against that and swear on my grandmother's soul. 

Ralph
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/4/12 9:49 AM
P.S. With 3 million links, including links from the Toronto Star, the Albany Times-Union, the Springfield Republican, the Calgary Herald, the Central Hockey League, the Professional Hockey Player's Association, and hundreds of other reputable sites, does it even make sense for me to drop links to my homepage on that site? 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Phil Payne 7/4/12 10:21 AM
I find the '3 million' figure so implausible that I believe it CANNOT possibly be real.  It's more than CNN has.  My own site has been up as long as yours and I think I have about 600.

I agree that most of the links I found were pretty relevant and on hockey-based or at least sports-based sites.  But 3 million is just so far off the scale ...
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/4/12 10:48 AM
Here is an image of my WMT screen. I'll explain at least the first 20 links.

1) Proboards.com. This is a general purpose forum site with sub-domains for more specific forums. On one of the sub-domains, for the Halifax Mooseheads, in the header of every forum post they had a dropdown with links to each season of the Halifax Mooseheads. After numerous pleas, they finally took them down last week.

2) Rangerland.net. They have a blogroll-type link on every forum post. I have asked them to remove it, but they haven't responded.

3) Jerseydatabase.com. They used my "embed statistics" widget to put a player's statistics on every jersey page they have in their database. Each widget has a link to a number of hockey teams. I made these rel=nofollow right after Penguin (I didn't know about that attribute before that since I don't closely follow SEO issues). On each page, they also have another link without rel=nofollow.

4) webleedblue.net. Another forum site that has a link to my site on every forum page. They took it down about a month ago, but it still shows in WMT.

5) Wikipedia.org. I am the genesis of much of the hockey-related information on the internet. Wikipedia has referenced my site for years.

6) blogspot.com. A number of blogs have me in their blogroll, so I appear on every post of those blogs.

7) 216.92.0.161. That one is actually the IP address of my web server. It is a relatively recent phenomenon, most of the links are within my VBulletin software. It seems have happened since I upgraded about a month ago.

8) outsidethegarden.com. Another forum site that links to me like a blogroll, one link per forum post. I asked them to take it down, not sure if they have yet.

9) sensagent.com. This site has copied a lot of hockey content from Wikipedia, so they also copied my links.

10) timesunion.com. A blog there has linked to me in a blogroll.

11) ask.com. Wikipedia cloned information that clones my links there.

12) ebay.com. Many people selling hockey stuff on eBay link to my site. 

14) tfode.com. Wikipedia clone

15) milanosiamonoi.com: Italian hockey database site that has linked to me like a blogroll on every page.

16) philipperiverin.com: some kind of french Canadian hockey site that linked to me like a blogroll on every page.

17) pipl.com. Spammy site that scraped my content and created "profiles" of all the people they found on my database. I understand they are willing to remove links, but will they remove 32,573 links at once, or do I need to ask individually?

18) NFHL.com. A hockey pool site that used my Embed widget on every player in their hockey pool. On a side note, the embed feature was meant to prevent people from simply copying my content - I figured that if I gave sites a way to put a widget on their own site with up-to-date info on a player, they'd stop copying. From that perspective, it worked.

19) northjersey.com. A blogroll link.

20) Coloradoeagles.com. They link to my player pages when they do a press release, so that their readers can click on the players they sign or trade to get their stats.

So these links are all relevant, and were all placed by the owners of those sites. They link to my site because I am a trusted resource and because they find my information valuable and relevant to their sites.

I'm sure I can get some of these sites to remove the links, but some simply don't respond. 


Re: AngryPenguin: looking for Penguin advise (links or otherwise) KCle 7/4/12 11:09 AM
It all looks natural to me.   Clearly this guy isn't spamming.  He doesn't need to.

I think you're on the right track in getting some of those sitewides removed.  I would continue doing that and wait for the next data refresh which should be any day now.  
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Splugged 7/4/12 11:14 AM
I don't see spam tactics.
Sorry for the intrusion but I checked RalphSlate's site in many backlinks checker tools (SeoMoz, Ahref, Majestic etc...).
It has a respectable link profile, better than other websites in its niche.

I would look in other directions...
Re: AngryPenguin: looking for Penguin advise (links or otherwise) dyoc 7/4/12 11:40 AM
I'm in the same boat. Online since '99, never bought a link (except tons of AdWords), no ads, whitest possible hat, and Penguin decimated me. I've done the same kinds of tweaks you have and so far no luck.

Have you tried the Penguin feedback form? I have a feeling the solution is going to be waiting for the next update and hoping for more accurate results. In the mean time, did you know that cat food isn't bad fried?

Re: AngryPenguin: looking for Penguin advise (links or otherwise) Geminineil 7/4/12 11:43 AM
Klark0... why do you think the next Penguin update will be in a 'day or so' ?


Re: AngryPenguin: looking for Penguin advise (links or otherwise) StevieD_Web 7/4/12 12:15 PM
>but will they remove 32,573 links at once, or do I need to ask individually? 

Individually is always best.  I don't remember my specific conversation with pipl.com, but I have always provided a detailed list of links found/reported by Google with each of my take down requests.  I always request the site remove any links that I may have missed.  I suspect the site's removal process is scale-able so once a single link is being removed the others follow right behind.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) paintingoilart.com 7/4/12 12:28 PM
http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=2505 this does have doc type
http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=73773 this does have doc type

You Have to Check the Code and advised use some Tags of files.
http://www.paintingoilart.com website ,you can reference that code and Tags of oil paintings ,use the tags of products. .
 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/4/12 2:31 PM
What other factors would have caused a Penguin hit? Is my situation somewhat unusual so that I'm being hit by the algorihm even though my links are legit?
Re: AngryPenguin: looking for Penguin advise (links or otherwise) StevieD_Web 7/4/12 5:13 PM
I have made some changes since April 24. 
  • I put rel=nofollow on the 10-12 links on my links page. I originally had put those links there organically, but this is the advice that people gave me.
  • I solved a bunch of duplicate title/description problems, such as when two players have the same name.
  • I included a canonical tag on all my player pages so that a slightly different query string would not make it appear as though there were two pages.
  • I completely blocked a development server which had some (but not many) pages indexed in Google - duplicating the page on my main server.
  • I removed some ads on player pages where the player doesn't have much of a career. Those pages have just 1 ad on them now.
  • I have put "noindex" on some pages that were automatically generated, such as a team that has no player information on it, etc.
  • I have asked some forum sites where my site is linked like a blogroll to remove the links. That had resulted in profiles such as 25,000 links to a single page on my site.
  • I have a widget-type javascript tool where someone can embed stats from my site on their own site - with links to my site - I changed all links to rel=nofollow, even though even Danny Sullivan himself doesn't do this on his widgets. The tool was never meant to try and capture pagerank, the links are there to get people to click them and visit my own site.
Nothing seems to have worked though, and no one has been able to offer anything more than generic suggestions such as "remove the ads" or "beef up the content".

Thanks for any advice you can supply.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Stop and wait.

Lets say for a moment that duplicate content is an issue.......  You, " completely blocked a development server which had some (but not many) pages indexed in Google - duplicating the page on my main server."...... if this site created duplicate content issues then you need to WAIT until Google has had a chance to discover the blocked site is indeed been removed, that it is not likely to come back and your real site is the original AND desired source of the content.

This takes time.  More so because Google is not eager to visit penalized/demoted sites as fast they visit CNN.

And then there is the time for the penalty/demotion to expire.

If there is a Panda or Penguin component to the penalty/demotion then you may need to wait until the next run of P or P ......plus wait out any time-out penalty attached to P or P.

And all of this waiting is based upon the date when you performed this task or that task or those tasks together.... You might have started April 24, but the XYZ task was not finished until just last week..


I would NOT make any major changes to the site for the next 30-90 days.  Let Google discover what you have done and let the site recover...if it is going to recover based upon the tasks already performed.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/4/12 6:44 PM
Seriously? Up to a 90 day feedback window? And what if the changes I make weren't it? Wait another 90 days? That is absolutely horrendous. 

Google did immediately deindex my development server, and they have crawled my site repeatedly since April 24, picking up nearly all my title changes, etc. I'd be a little surprised if that was the problem because there were only a few hundred duplicate pages indexed. 

I understand that Google doesn't want to give up too much information to prevent its algorithms from being gamed, but since they are behaving like FICO and assigning us an internal score, I think that they need to step up to the plate and let us know the information they are using, and let us correct/dispute it with a much better turnaround time than 90 days. 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) ShopSafe 7/4/12 7:35 PM
Hello Ralph Slate, :)

Following on from the excellent advice above, I have noticed something odd. This may be something or it may be nothing but when I view your pages with the SEMrush toolbar, it causes most of the links to disappear on the toolbar ie there is something in your css files that is causing this. 

The links appear on mouseover but become transparent when not in focus. I don't see that behaviour with links on your site so maybe in your css there is a workaround that googlebot is reading as "hidden links", I do not see this behaviour on the SEMrush toolbar when viewing all other sites.

(Others have already mentioned it but why no doctype on the homepage of such a good site?)
Re: AngryPenguin: looking for Penguin advise (links or otherwise) StevieD_Web 7/4/12 7:56 PM
>Seriously? Up to a 90 day feedback window? And what if the changes I make weren't it? Wait another 90 days? That is absolutely horrendous. 

Yes.  Seriously.

Take a step back and think for a second.....what have done for the past 2 months?  Made change after change and banged your head against a hard wall in the process.

Maybe change XYZ you made back on May 15th was the cats meow and will cure all your ails.

All is good.

But you making changes.  And on June 14th, 'cause the May 15th change hadn't fixed your problem, you undid XYZ.


Personally, I don't think you have "undone" your prior good, but you have spent two months making fixes.  You are frustrated and need a break.

So take a step back, take a break and let Google evaluate your site.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/4/12 8:11 PM
Shopsafe, I had a design firm redo my site a couple of years ago. Although their visual design was good, their coding wasn't. They coded the site with GIF spacers and the like. Although I was easily able to fix the rest of the site and redesign with CSS (bringing the size of a pageview from something like 200k to 20k in the process), I never tackled the front page because it was harder to fix. It might be time to tackle this.


Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/4/12 8:36 PM
Stevie, given everything that I have posted, what is your opinion of my link profile? Is there anything obvious there that I should tackle? 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) StevieD_Web 7/4/12 11:51 PM
>Is there anything obvious there that I should tackle?  

Blogs. Check each blog to make sure it is more or less appropriate (ie no low quality pharmaceutical blogs with a link to your site to balance out their spammy activity).  Anything you don't like do a removal request.

Beyond that.... what are you going to do, tell Wikipedia to stop linking to you via nofollow links?  

Seriously, the crap that you can't control (Wikipedia scrapers for example) is just that.... crap.  Google is going to ignore it... besides the links are nofollow.

As you are chasing links (which I do out of general principle) .... don't worry about common crap linking to everybody such as pipl.com, updowner etc.  These sites are linking to EVERYBODY which means the number of links to your site is no different than the number (proportionally) of links to Wikipedia, NYT, Harvard etc.

Sure, I do a removal request for those sites (again, general principle), but I don't loose sleep if the crap site won't remove the links.


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

I would slow down a redo of the front page.  Changing CSS elements, eliminating a table or whatever... that is fine.  Don't change text or navigational components.  Let Google absorb the changes you have made. 




 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/6/12 7:58 AM
I'd like to float this up one more time, in the hopes that Angry Penguin would like to comment on it.

Although I can see that StevieD_Web's advise makes sense, on the other hand he basically has said "You've made some minor tweaks - now wait 90 days to see if they work". No disrespect to him, but that is really not an acceptable solution.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) symbiotic 7/6/12 10:07 AM
I see you got you DOC TYPE SORTED
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
I checked w3c validation

Congratulations

This document was successfully checked as XHTML 1.0 Transitional!


results are here perfect score well done
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.hockeydb.com%2F&charset=%28detect+automatically%29&doctype=Inline&group=0
Re: AngryPenguin: looking for Penguin advise (links or otherwise) StevieD_Web 7/6/12 11:15 AM
No disrespect taken.

A)  I think you have made more than minor changes and  B)  up to 90 days (30 more likely).

Look at the issue this way... would you change physicians because the penicillin he prescribed to you didn't clear up your STD in 24 hours or would you wait out the 72 hours the physician said the penicillin would take to clear out your pipes?

Granted your site didn't have the drip, but some things do take time to fix.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/6/12 11:23 AM
Let me offer an alternative example. If you suddenly lost 40% of your body weight and couldn't do anything to get it back up, and you lost your strength and could barely stand up, couldn't work, couldn't pay the bills, and you went to to a doctor, and that doctor told you that he heard of someone who lost weight because they didn't eat 10 grams of dairy each day, and told you to do this, but wait 30 to 90 days to see if things got better, what would you do?

I'd try to find another doctor who could offer a more definitive answer.

When your site, traffic, and revenue is on the line, waiting it out on a hunch isn't something that people can accept.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Geminineil 7/6/12 11:28 AM
People with analogies shouldn't be going to see a doctor!
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/6/12 11:34 AM
Yes, I did fix the Doctype -- although there is now a minor rendering glitch on Internet Explorer (I converted the layout from table-based to DIV-based).

There does not seem to be any consensus on if I should work on cleaning up my link profile - which is most likely the reason for my Penguin dropoff. I find this to be frustrating, because the signals from Google are mixed. Although the rule is "paid/traded links are bad", Google can only guess if a link is paid. When even a human being (Phil Payne) believes that my many links are the result of payments, and not natural links (none are paid or traded), I question whether an algorithm can make the right call.

Google keeps saying "links should be natural, and link without nofollow when you "vouch" for a site", but their algorithm appears to act otherwise - using the WPMU.org example, WPMU.org was penalized for having links to their site on templates that they distributed. If someone installs a Wordpress template and leaves it installed, then isn't that vouching that you trust the site, not spam?

This particular issue isn't quite so black and white. Everyone wants links, both to get traffic from other websites and yes, because Google has said that links are a sign of a good site and rewards such sites with high SERP placements. But let's list out some ways to "get" links.

1) Make your site link-friendly - for example, returning pages with GET instead of POST, so that the URL can be copied/pasted. Or short URLs. Or not using Flash to build your site. Is that gaming? I doubt anyone says that this is, but the fact is the site is doing something to make linking easier.

2) Adding a button on your site that allows the page to be linked somewhere else - like a "Digg this" button. Again, the site is doing something to make linking easier - is this unnatural linking? Probably not, though Google has changed their minds before.

3) Webrings/link exchanges. Google now says some (if not all) of these are unnatural, even though such links were around before Google existed.

4) What about adding a helpful link on a message forum or in your signature? I've been surfing the internet since 1993 (Usenet) and signatures were always used to put out information about you and your interests. When I launched the first incarnation of my site, back around 1995, I publicized it by posting on Usenet and posting the information to some related mailing lists that I participated in. Was this spam? No, because I was a contributor to those forums and this was an acceptable practice. It is now seen as spam primarily because spammers have abused it.

5) What about twitter or Facebook? Is posting your link to either of those mediums considered spam? Maybe not now, but what about the future?

6) I have a tool that is designed to let a writer run their article through an encoder so that any hockey player names become links to the player profile on my site. Phil Payne suggested that this was spam - but someone else said that it was a clever way to get both traffic and links. Is it spam, or just a way to make linking easier? Keep in mind that the user chooses to use this tool specifically because they think the links are a benefit to the users - the links are the sole benefit.

7) How about widgets that are designed to be put on other websites? For example, Danny Sullivan has an infographic on his site that is the Periodic Elements of SEO. He makes a script available for you to put it on your own site. The script contains a link back to his own site, and it is not Nofollowed. Is this spam? Keep in mind that the site owner chooses to put the script on his site and knows about the links. Should they be nofollowed, which effectively says "I do not vouch for this website"? If you don't vouch for it, then why are you using their widget?

7) Link trading. Is it really spam? I don't think it always is. It depends on how whore-ish the trading partners are. If I have a list of 10-15 websites that I am willing to put my seal of approval on, then why is it considered bad that we link to each other (for the record, I don't do this)? On the other hand, if I have 500 websites, that shows that I have no standards and thus my opinion really doesn't matter. Can Google get this right with an algorithm though? I'm not convinced.

That's the theoretical discussion - which is likely completely divorced from Google's algorithm, because their rules are based on intent, and an algorithm can't determine intent 100% of the time. That is where Google is falling down here, and where they are abusing their power, because they should really have a white list or a human reviewing false positives - but they're not, and that is what is the most frustrating.




Re: AngryPenguin: looking for Penguin advise (links or otherwise) Free2Write 7/6/12 11:45 AM
The analogy only applies to the waiting period not the diagnosis. Hunches and a form of differential diagnosis are what's available. Maybe you're looking for better hunches. But better hunches still require time and a waiting period with uncertain results. A definitive diagnosis can only come from Gregory House or someone inside Google. Both of which are unlikely in the real world.

Pure statistics are a good source of traffic away from a site. Try adding a more well rounded or pop-culture attraction other than the statistics. Or be satisfied that other people are using your good works.  The other sites are doing it argument is almost never worth the effect of the thought unless everything internal to Google and the entire history of the other site is known. Local newspaper, TV, and radio exposure can be free and are often worth the effect.


Re: AngryPenguin: looking for Penguin advise (links or otherwise) Grandmaster Flash 7/6/12 12:06 PM
Ralph,

When did you disallow the /stats/ directory in robots.txt? 
Disallow: /stats/
There's a lot of content indexed in that directory
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/6/12 12:10 PM
Angry Penguin, the /stats/ directory is an old (like 2000) instance of Webalizer. My content is in a directory /ihdb/stats/.

I only did that a couple of weeks ago, I did it because I was getting a lot of bad links showing up in WMT, and they were coming from that directory.

Did I get that wrong? I don't think so, since I see thousands of Google crawlings in the /ihdb/stats/ directory each day.

Ralph
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Grandmaster Flash 7/6/12 12:24 PM
I'm not positive but it looks like the user agent name for ConveraCrawler may be incorrect.

Suggest you dive into Webmaster Tools and verify whether any of your good pages are blocked accidentally.
I used a free online checker which indicated you were blocking googlebot but I cant vouch for the accuracy of the tool.

most flaws I'm finding such as indexed content under mail dot hockeydb.com and non-www is relatively insignificant
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/6/12 12:25 PM
Free2Write, I understand what you're saying, but my site's attraction is, and always has been, the statistics. This is what people want, it is what people like, and is what people come back for. Yes, there are other sites out there that have the text blurb, etc., but if you look at those sites, their statistics are basically incomplete. There are other sites that have "user generated content" along with their player profiles (one has buttons such as "Love Him/Hate Him". That is junk, in my opinion, and, also in my opinion, this takes away the credibility of the information presented. Imagine if Wikipedia put user comments directly on its articles - it would devalue the site.

I also appreciate what you're saying about "other sites are doing it" argument, however there is a point where you just can't ignore the other sites because those other sites are beating you. Yes, I get that it is like speeding, but there is a difference between occasionally being the one who gets the ticket and being harassed by a cop who doesn't like you and pulls just you over every day.

Plus, it's an algorithm making the call -- not an understaffed police force. There's a higher expectation for fair treatment, and if there is no appeal process, then the algorithm is expected to be right 100% of the time. If the algorithm stays "keyword stuffing gets a penalty", then I should not be beaten by a keyword-stuffing competitor. If the algorithm says "quality original content is king", then I should not be beaten by Wikipedia clones or pages that are empty stubs. If Google says that excessive advertising is bad, then I should not be beaten by sites that have 10 ads per page. I know that the algorithm is more complex than that, but if I have quality pages being beaten by junk pages or even irrelevant pages this tells me that pages on my site are being penalized or that Google is ranking something that it is not telling us.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Grandmaster Flash 7/6/12 1:44 PM
Ralph,

The considerable sitelinks I see your site generating suggest that your site is viewed favorably. 
I'm seeing branded queries as a key traffic driver as well which is a strong quality signal.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/6/12 1:50 PM
Let me ask a more focused question, specifically about Danny Sullivan's Periodic Elements of SEO. Here is his "embed" page:

http://searchengineland.com/seotable/download-periodic-table-of-seo

He provides code to put the graphic on your own site or blog. The links in the code are not nofollow.

He says that if you can't use his code, "However, we ask that you link somewhere from the page that uses the image to http://searchengineland.com/seotable. You can copy the link below with suggested text below:"

Here's my question: Is that spamming or behavior that would get an algorithmic penalty?

The reason I ask is that I have a similar widget, and I changed a lot of links to nofollow a couple of weeks after I heard that Penguin was penalizing sites that did not have nofollow links. I have a widget that is being used on a number of websites right now - so I basically turned off a lot of links.

My question is, should I have done that?

You can see the widget in action here:

http://leafs4life.net/luke-schenn/

The entire stats portion of that page is delivered by the site owner putting a javascript script on their page that calls my site. The links go to my site. Since that widget is generated on the fly, I was able to change the links to nofollow. But was that a mistake, or did their original inclusion earn me a penalty?

The widget isn't heavily used because I don't really promote it much on my site. However, if Google doesn't discount the signal of someone trusting my site enough to put a javascript widget on their own blog, I'd like to take advantage of it.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Grandmaster Flash 7/6/12 2:06 PM
I know some will disagree with me, but I see no issue with your widget at all. I could care less whether the link is follow or nofollow. 
Widgets (IMO) are a legitimate tactic as long as their placement is not compensated. Matt Cutts has even advocated their use at times although I don't have a link to document that statement.

Unless there is some wacky implementations somewhere, I would say it is highly unlikely this is your issue.
I have seen sites that built widgets that hid links in widgets get whacked but that is not the case here.





Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/6/12 2:54 PM
I think that maybe a compromise is the best answer. My widget contained a number of links - because it emulated my site. So, for example, the widget links to my homepage (stats provided by Hockeydb.com) the player's profile, the NHL draft he was selected in, and a link for each team/season that he played with. Internally, and contextually those links make sense. Externally, they may not - because if a player played for the Boston Bruins for 10 straight seasons, on my site, the phrase "Boston Bruins" is linked 10 times - because each row in the table is for a separate season. Externally, that may appear like link spam. 

The compromise I'm going to go with is to make most of the links nofollow - except to the player profile. When a site uses a lot of player widgets, I was seeing some strange WMT link profiles - because if you link to 1000 players, that amounts to maybe links to 1 home page, 1000 links to just 20 draft pages, 1000 links to just 500 team/season pages, and 1000 links to 1000 player pages. That probably trips something somewhere. Instead, the most logical follow link is to the player profile - the rest get nofollow. 

Since the widget is delivered via Javascript, I've already made the changes. That seems almost too powerful - I checked, and yesterday I had about 600 different active sites using the widget. In theory, I could put some kind of spam link on all those sites in less than a minute if I wanted to - but those sites trust me not to do this, and that is why they chose to include such code on their own site.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/8/12 7:11 AM
The more I look at things, the more I believe that my deranking is due to a Penguin/Panda one-two punch.

When I pick the rosters of NHL teams and run them though Google, the results are pretty good - I'm often on the first or second page. When I add "hockey" to the search, I'm often on the first page. 

I think that my traffic dropoff was in the fringe players - players who didn't have a very long career and who played in low minor leagues or junior leagues. This is where my site really excels - I have hockey players from as far back as 1910. I used to get a lot of traffic from people Googling their name, finding their information, and then going on to explore my site as they looked for their teammates, etc., or people Googling obscure players which Google does not yet recognize as hockey players by adding "hockey" to the name. 

A good example is a player named Mike Vitale. Here is my page:


If you search for [Mike Vitale hockey], I don't come up. However, another page comes up as #2:


My page is definitely light - but there's not much I can do about that. The player's career is just 5 seasons. There is no way for me to add original textual information about this player since that just isn't the focus of my site. Anything I do is going to have to be within the confines of the information that is already on the page. This is actually a decent amount of info on the player - often I don't even know when and where they were born, or what position they played, and sometimes the player has just 1 or 2 seasons. 

The USCHO page has just about the same amount of basic information as me, but they rank #2. Most of the rest of the results after the #1 spot are barely relevant, basically junk. 

What I need to figure out is why my page is no longer showing up as of April 24. Thin page implies Panda, not Penguin. But I didn't see a Panda penalty and my dropoff was April 24. There is likely some cross-play taking place here.

Is the USCHO site making it over the bar because of the sheer volume of words on their site? If you look at my page in terms of words, there are maybe under 20 words on the page - the rest are numbers. On the USCHO site, they have sidebars such as "Tufts Men's Hockey Team Page", "Tufts Men's Hockey Statistics", etc., and then they have a section titled "Latest Tufts and NESCAC News" which just contains links to other articles. So they mayvbe have 200+ words on their page, even though the words aren't that related to the search term.

For thinner players, might it make sense to do something like that simply to get more words on the page? I could certainly have links such as "view information on Tufts" or "View NESCAC history". I could put up links like "View all players from Berwyn, IL". 

I don't feel like I'm entitled to the #1 spot, or even to beat USCHO.com - however, I need to understand why my on-topic page does not show up at all in the SERPS. This kind of stuff isn't high traffic, but with 150,000 players in my database, I was always a long-tail site, and that tail has been chopped off. 



Re: AngryPenguin: looking for Penguin advise (links or otherwise) JohnMu 7/11/12 6:05 AM
Hi Ralph

Thanks for posting all of these details. Looking at your site, I don't see any specific technical issues, or general issues with the links to your site. I can, however, imagine that our algorithms might have some trouble understanding the unique value of your website in comparison to other, similar sites (especially considering that the content is primarily aggregated statistics). My general recommendation would be to continue working on your website, making it the best site of its kind. There's no single change that you'd need to make, so I'd really look at your site overall and see where you could make improvements on a general level -- you mentioned that you might have some thin pages, perhaps that's a place to start (or at least, to try things out with A/B tests, etc). 

Cheers
John
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/11/12 11:54 AM
Hi John --

Thanks for replying.

I feel like I need to start out by offering some background on my site. Feel free to skip this next section on my site's pedigree if you're already convinced of it.

My site is a well-respected reference site, the first of its type on the web for hockey (starting 1998) and quite possibly the first sports statistics database on the web. I have seen more competition recently (unfortunately some of it from sites scraping data from my site), but hockeydb.com is known in hockey circles as the go-to site for historical hockey information because it is so comprehensive. I have created complete statistical profiles of virtually everyone who has played at nearly every level of organized hockey since 1910, compiled from almnost 20 years of research - of course, there is always more research to be done, but my site is very comprehensive, the most comprehensive on the web, and people in the hockey world know this. Although ultimately you can say the content is aggregated statistics, that's a little like describing Wikipedia as just aggregated information. My site doesn't just regurgitate Elias Sports Bureau feeds - the information is compiled from years and years of research. I'm not some brand new site which just scraped some other sites to create what I have. Much of what you see on the web about hockey history - including what is on Wikipedia - and also most historical hockey logos, and most hockey card checklists - originated on my site.

 I'm not telling you this to suggest that I am entitled to high search placement; I only offer it to add perspective to the discussion, to show that I am not "just an aggregator". You mentioned making my site the best of its kind; although I have some competitors, my site is far better than most of them (like Yahoo or ESPN), and I believe it is better than my closer competitors (hockey-reference.com). I think that many others believe this too, evident from my backlinks and from my site's great reputation in the industry (the Southern Professional Hockey League even has my site written into its bylaws as the way to determine how many professional hockey games a player has played, to determine their veteran status).

What puzzles me is that many of my pages are simply no longer returned even with very targeted searches. I did read a recent interview with Matt Cutts and I understand that Google strives to present a basket of varied results in the hopes of fulfilling a potentially ambiguous query, and that it may choose one site from a handful of similar sites and throw away the rest. I can appreciate that for competitive niches where there is just too much good content from which to choose.

I am very surprised that this would be happening for non-competitive niches. Using the [Mike Vitale hockey] search above, I have no real problem with USCHO being the third result; they are a go-to site for college hockey news and articles (even if their stats are incomplete). What does trouble me is that my page, which has more broad, but somewhat less deep information, is not returned at all. Irrelevant results show up at the #3 position because this is a long tail search - I'm talking about results where the word "Mike" and the word "Vitale" are nowhere near each other - proximity of terms is Search Engine 101. I believe that my page is somehow being explicitly eliminated in the search results, and I'd like to know why and how to fix this.

My Google referrals dropped dramatically on April 24, as you can see from the graph linked above, which implies that my problem is at least somewhat related to backlinks. I have not bought, spammed or traded for any of my backlinks, the only gray area is my player stats widget which is designed to be placed on another site - this does contain links back to my site because the links are a vital part of the player's profile and also because I offered the widget to be an advertisement for my own site. I think it's safe to say that if someone puts a stats widget on their site, they trust and respect the site that is the source of that widget.

I kind-of wish you did find a problem with my backlinks because at least then I would have something to fix.

I will continue to add more content, and will make some tweaks, but it is very frustrating for so many pages to be eliminated from Google searches and not have a direction to follow other than "keep working".
Re: AngryPenguin: looking for Penguin advise (links or otherwise) gary- 7/12/12 7:11 AM
I don't think it's the sitewides or the rest of the suggestions people have made. I think you're assuming it's something you've done wrong / that's in your control when in plain fact Google borked their algo update, killing good sites like yours. The solution is to build your email list and not rely on google. 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Grandmaster Flash 7/12/12 8:13 AM
Ralph,
Just a thought. Your site does not perform especially well on mobile devices and traffic from them is increasing dramatically on my sites. Considering what I perceive is your demographic, I wonder if your engagement levels versus the competition have degraded over time?

If I've learned anything from Panda, it's that user engagement, time on site, repeat visits etc. are critical for success. With a fairly sound technical background in SEO, my peers and I have had to have a crash course in more traditional marketing methods to maintain our competitive advantages. I'm currently fighting an uphill battle against a site that is arguably weaker technically but has superior marketing skills.

The last thing you want to hear is a suggestion to pour more money into the site but I think you'd really benefit from a responsive design, (one that scales to browser size).  

I wish you luck, and just liked you on Facebook:)
  
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/17/12 12:11 PM
Master Shifu, I'd like to continue this discussion a bit. To answer your last question, I don't think my engagement levels have degraded over time. Before Penguin I was seeing an average of a 10% increase in unique visitors compared to last year and last year was 10% more than the year before. Following Penguin, I am seeing a 20% decrease compared to last year. Pages-per-unique (a sign of engagement) were steady for the past 2 years. They have increased since Penguin, which I attribute to having fewer casual users directed by Google.

Mobile is on my radar, but scaling a tabular site is very difficult because a row of data takes up more space than a mobile device can offer.

I'd like to focus back on the Panda vs. Penguin thing. My dropoff came on April 24, there is no doubt about that. That's a Penguin impact. John from Google said that he didn't see any problems with my backlinks, but doesn't Penguin virtually guarantee a backlink issue?

I'd also like to talk about competitors. Although pointing to competitors is a bit like complaining to the cops that everyone else is speeding, one competitor is bugging me a bit because his site seems to fly in the face of all the advice given to me. That site is called dropyourgloves.com, and I strongly believe that he created his site based on scraping my site about 3-5 years ago. His site is coming up very frequently in the top results at a time when mine is lower or nowhere to be found. It is as if the algorithm is doing this on purpose.

Let me throw a few queries out there so you can see what I mean.

[Ray Winterstein hockey]. His site is #1, my site is #11. My profile contains 3 more seasons worth of data than his does.

[Glen Seperich hockey]. His site is #1, my site is #6. My profile contains the vital stats of this player, his does not.

[Gord Redden hockey]. His site is #3, my site is #17. Both pages contain equivalent information.

[Dale Yutsyk hockey]. His site is #2, my site does not rank. Both pages contain equivalent information.

[Jim Maertz hockey]. His site is #3, my site is #6. I have one more season of data compared to him.

[Doug Keeler hockey]. His site is #2, my site is #14. My profile contains his vitals and draft status, his does not.

[Steve Letzgus hockey]. His site is #2, mine is #3. Not much of a difference, but mine has a photo of the player; his does not. My profile also has several seasons more information.

[Jack Byerly hockey]. This one is particularly annoying. I have 7 seasons worth of information on this player and am ranked #6. He has 1 season on this player and is ranked #1.

The reason this puzzles me is that when you compare my page with his, although his pages contain similar information, they are never more complete than mine and are frequently less complete. His page is god-awful ugly, with a huge palate of jarring colors, 1990's design, etc. My meta-description is more informative than his.

When you compare backlinks between his domain and mine, my domain (not the individual page in question though) has orders of magnitude higher levels of backlinks. On Majestic, his site shows 7,187 backlinks from 396 domains. Mine shows 529,767 from 5,287 domains. He gets similarly low authority rankings on Open Site Explorer.

When given the choice, people choose my "brand", so to speak, because it has a sterling silver reputation for being the most complete and informative. It has been the authority for years, and continues to be outside of Google. Why is Google putting its thumb on the scale by burying my results?

Dropyourgloves.com is nowhere near the acknowledged reference site that mine is, but he is at the top of the search results. How is this possible?  Why is his page being promoted to the top and mine being pushed back? I used to be in the position that he was in. I no longer am, and I think this is a substantial reason for the dropoff in my Google referrers. I think that my long-tail searches have been cut off by this effect. If I can answer this question, I think I'd be back where I was.

Remember, this happened on April 24, so that is contrary to the Penguin versus Panda conventional wisdom.


Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/19/12 1:14 PM
Advanced User "squibble" has deemed that it is not appropriate for me to ask questions about how to determine if a link is good or bad, since JohnMu has stated "I don't see any specific technical issues, or general issues with the links to your site." I guess I'm banned about asking about link quality.

So with that in mind, can someone explain to me how the Penguin algorithm, run on April 24, was not responsible for a 50% to 80% loss in Google referrers to my site precisely on that date, and that instead it was Panda? Please refer to the above graph to see what I'm talking about.

This doesn't add up, and no one is willing to talk about it. Instead, I'm being forced to accept the opinion of several people here that my trusted, popular and well-respected hockey statistics site is worthless, and instead I should provide users with 1000-word articles on the players instead because giving complete statistical history of each player is "thin content".

Is it true that the Panda algorithm would push relevant content below spam links and irrelevant content? Can someone explain to me why a search for [Lakeland Ice Warriors hockey cards] would result in the #1 link being a content-less page with those words on it, results 2-10 being either spam or scrapings of my site (pipl.com), followed by my page on page 2 (hey, it's up from page #3 where it was yesterday!)?

My page is the definitive page on this topic, but it is ranked lower than spam. On other search engines it is #1 because it is the only page on this topic on the internet.

This isn't an isolated case either, just one I'm highlighting. Another example is [Baltimore Clippers Hockey cards], which does not bring up the page on my site, the definitive page on that subject, at all.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Alex813 7/19/12 2:21 PM
Ralph, have you even considered

Removing Your Links Page (Or Atleast not having it in the menu)?
You have it in your menu, and the footer. But it does not seem that valuable.
A valuable editorial link that YOU give should be in the context of the topic you are writing about.
Why not actually write some content articles where you can reference these links?


Remove the 2nd ad in your header?
Or change it to an image ad at least.
Google image ads can often make a site look less "boring".

You have 8K people on your facebook.
Harness that people power by writing some "kick hockey" content, and let the people know about it.

Give them something to click about on your frontpage, and keep coming back.
"Did you know xxx player was originally born in xxx, click for more"

Actually write something about the player on there page.

RE:"All Time Record Book"
When I go to "All Time Record Book"
And Click "goals" and "search", WITHOUT choosing a league,.
The page is blank.

When i go to "All time records" and search, and actually get relevant results. There is no page title.

Are these results ALSO, NOT available through a static url that google would be more than happy to return to people searching for them?

Improve your site

Re: AngryPenguin: looking for Penguin advise (links or otherwise) dyoc 7/19/12 2:52 PM
Ralph, I feel for you. I'm in the same boat -- a huge drop with Penguin despite no SEO -- and it's baffling to me that the general response here is "you need a better site." What happened on April 25th that made my site suddenly less competitive?

We know that Penguin is primarily about "bad" backlinks. Bad in the sense that Google suspects they're unnatural, though not to the degree that they'd send you a "you have unnatural links" message.

So the position you and I are in is having to guess which inbound links might be bothering Google and ask (and possibly pay) to have them removed. And if we get it wrong, we've lost even more ranking. Fantastic!

Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/20/12 4:30 AM
Alex, many of those are good suggestions - any website can be improved. Have you tried to access eBay with an IPad? The "My eBay" page is impossible to read - and this has been going on for over a year. They're a billion dollar company. 

I don't think a few minor glitches on a few marginal pages explains why I had such a large dropoff on April 24. I am confused because everyone here is trying to explain a Penguin dropoff with Panda suggestions, and that doesn't add up. 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Alex813 7/20/12 4:56 AM
Ralph,
You have to give google content that can rank. They have to be able to decipher what the page is about.
On page content is worth more than ever now.
If you pickup your panda pages, you will pickup your penguin pages.

For example: if you actually had "clickable" links for individual record stats, people would share them, and link to them. As of right now, if they "think" they are sharing them, they will just send the user to a dead end.


Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 7/20/12 5:03 AM
I'm not necessarily concentrating on those records pages right now - I know that if I used GET instead of POST I'd have a shareable URL. It's a very minor area of things.

People do share my player pages, all the time in fact. That isn't helping them because there is some kind of Penguin penalty in effect, perhaps on top of a Panda penalty. Fact of the matter is that I will not be able to write 1000 word biographies of 150,000 different hockey players. That doesn't make the data any less valuable to people. Wikipedia is ranked #1 for most players - but people link to my site when discussing players, not Wikipedia (which gets its statistical information from my site). That has to be worth something in terms of authority.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/1/12 8:53 AM
Instead of starting a new thread, I'd like to resurrect this thread.


I do not think that John Mueller's assessment is quite accurate. Here is why: John says:

Looking at your site, I don't see any specific technical issues, or general issues with the links to your site. I can, however, imagine that our algorithms might have some trouble understanding the unique value of your website in comparison to other, similar sites (especially considering that the content is primarily aggregated statistics). My general recommendation would be to continue working on your website, making it the best site of its kind. There's no single change that you'd need to make, so I'd really look at your site overall and see where you could make improvements on a general level -- you mentioned that you might have some thin pages, perhaps that's a place to start (or at least, to try things out with A/B tests, etc)

John is correct in saying that I don't have any specific technical issues or general issues with links - I have not received any kind of WMT warnings from Google. However, the fact of the matter is that my Google referrals are down 50-80% starting on April 24. That signals Penguin to me.

I have spent many hours analyzing things to try and figure this out. I learned about the "-950" penalty" that people talk about - where Google puts your page 950 pages back in the results. I believe that Penguin has a "-1" penalty, which is capable of putting your site from page 1 to page 2 or even further back. I base that on the fact that for many search terms, where I likely used to rank #1, I am now ranking precisely at #11.

I also can not believe John's statement of "our algorithms might have some trouble understanding the unique value of your website in comparison to other, similar sites". Google should not have that much trouble making that judgement call - it has always made the call based on backlinks. If people are organically linking to my site very heavily and people are not linking to a "similar" site at all, if Google is going to pick a site to show (instead of showing both) then Google should pick my site.

I don't believe the theory that Google would filter out sites it deems "similar" for long-tail searches though, where there are maybe 2-3 sites that fulfill the query. I think the Penguin penalty is the problem here.

Here are some great long-tail examples where Google is picking a lower quality/popularity site over mine (a site that likely copied data from my site to boot!):

[Pat Carli hockey]


[Rob Gador hockey]

[Terry Madson hockey]


[Kelly Szautner hockey]


Pay close attention to that last example: My site has 5x as much information as the site that Google returns, but my site is #11 when it should be #1.

It is very apparent to me that this page should rank #1, and Google would normally think that it should be #1, but there is obviously a penalty assessed against my site. That means that no matter what I do, I cannot get higher than #11 for that page.

I know that people are giving me Panda-based advice such as "make your site better", but with a penalty such as this, I can't rank for many queries.There is a thumb on the scale - a thumb which is not being applied to that competing site, a site that is not as complete, not as well-designed visually, and not nearly as popular as mine.

I believe that the Penguin penalty can be overcome by organic backlinks, but for long-tail searches, organic backlinks are not a very good quality signal - how many people are going to link to a page about Kelly Szautner? Likely zero. So Google can't use page authority as a signal, so it should default back to domain authority - my domain should be clearly more of an authority than dropyourgloves.com - but that domain signal is being tampered with by Penguin somehow. My domain authority has been set to a penalty for some reason. So that is why dropyourgloves.com is showing on page #1 and I'm either on page 2 or lower.

That is what I believe the problem is here. And while yes, good advice is to get organic backlinks for pages that you want to overcome your domain authority penalty, if you can't get those, your page will be beaten out by spam, irrelevant links, and scrapers.

So let's move on to my problem: why is Penguin penalizing my domain's authority? It must be based on backlinks, and something in my link profile is triggering it. But what? I'm not buying, selling, or trading links.

My only guess is that the Wikipedia clones are killing me. See, my site is such an authority that Wikipedia has been scraping it and using it to fill in their hockey-releated information. They leave a backlink - nofollowed - but then other sites scrape Wikipedia and they replicate that backlink. These scrapers are often spammy sites, using Wikipedia information to try and rank.

I took a look at how many links were added to my site on 7/17/2012. There were 450 links added. About 60% of the links were Wikipedia clones. This pattern is happening every single day.

How can I possibly combat this? How can I overcome the Penguin penalty that is likely coming from this relative unique situation?
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Free2Write 8/1/12 7:53 PM
What is the point of this post?

Someone at Google has already indicated exactly how to combat the issue, "to continue working on your website, making it the best site of its kind."

Are you looking for some magic answer? Do exactly "this" and the website will return to some previous ranking. Is there some idea that inspecting poor quality sites will uncover some magic formula to overtake that specific result.

Trying to deduce Google's internal algorithms is a waste of time. Google could toss out what you discover tomorrow. Indeed, every time someone does deduce the algorithm the entire point of the next round of algorithmic changes is often to thwart those very specific deductions and changes that inevitably create spam by others not better websites.

Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/1/12 8:12 PM
Seriously? You consider that to be actionable advice? "Continue making your site better" Wow. I wish I had thought of that, it's like magic.

That was not advice at all. It was a tacit acknowledgement that he does not know why my results are tanked. He also said "There's no single change that you'd need to make". Why not fixate on that? 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Free2Write 8/1/12 8:17 PM
What is the point of the post?
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/1/12 8:28 PM
The point of the post is that I have learned more in the past 2 weeks. I have learned that pages on my site with specific backlinks are still ranking just fine. I have learned that pages without backlinks are still buried, and that adding content to those pages does not allow them to rise above position #11 until another site links to them.

I have also learned, via the "recent backlink" tool, that 60% of my recent backlinks are due to wikipedia clones, and that even though I am registering hundreds of new organic backlinks per day, pages on my site without specific backlinks are not ranking well. 

I have also come into the belief that my site's authority has been penalized (not just reset to normal), and that this is why I am seeing the things I am seeing. I am speculating that the wikipedia clone backlinks are causing it. I have to believe that a site with so many organic backlinks should be considered an authority site, and that when put in a head-to-head competition with a site that has similar content, the authority should, at the very least, put me in close proximity to the other similar site, and should more properly put me above that site. However, using the examples above, I can show how a less authoritative site with less content and a poorer design is consistently outranking me on long tail queries. 


Re: AngryPenguin: looking for Penguin advise (links or otherwise) Free2Write 8/1/12 9:00 PM
If there's some insistence on correlating "no single change" to a Google algorithm then I believe that specific comment means, because your website is primarily statistics, the content must be improved so that Google's algorithms have more high-value signals that exposes the unique value of the site.  There is no one specific content improvement but A/B testing of the changes may uncover the content improvements that have the greatest impact.

I believe that if you continue to concentrate on technical issues and link details rather than focusing on how to expose the usefulness of the raw statistics by explaining their worth, then this discussion will have little impact on the site's results. Unless by happenstance Google changes their algorithm to favor statistical data or there is a sudden pop-culture rush to tweet hockey data from the site or all other hockey data sites disappear or some such similar external event. Even then, traffic that utilizes the statistics will far exceed traffic to the site or remaining on the site or any related ranking.

Or, buy a TV spot during the Olympics or Super Bowl.


Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/1/12 9:15 PM
I think you're ignoring a couple of vital facts.

1) My site used to rank well, and has done so since 1998.

2) My site is very popular, and yes, links to my site are tweeted about 15 times per day. They are also posted on facebook, message forums, blogs (by the author), newspaper sites, and official team sites. Wikipedia has heavily referenced my site, considering it to be an authority. These people aren't linking to the soft "here's a few generic paragraphs on the hockey player" pages because the users want hard data, not feel-good opinions.

3) I am being outranked by other sites which also offer only statistics, and in most cases, less complete statistics, with poorer design and less authority (i.e. fewer people linking to those sites).

4) Many of my pages rank very well. I have lost most long-tail traffic.

5) When I try and improve long-tail searches by adding content to the page, the page never rises above position #11, and this spot is where is rests most often.

Re: AngryPenguin: looking for Penguin advise (links or otherwise) Free2Write 8/1/12 9:23 PM
What do you want the answer to be?
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/1/12 9:43 PM
I would like to share my observations here so that others can get some data points to aid their own analysis. Ultimately, I would like corroboration of a domain-based penalty either with specific advice as to how to get it lifted or acknowledgement from Google that spammers cloning Wikipedia and thus causing a penalty against sites referenced by Wikipedia is a hole in their algorithm that they will plug. At the very least, I'd like Google to take a look at the situation in more detail - more than just the cursory glance that John Mueller performed to tell me what I already knew, that my site has no manual actions taken against it.

Or maybe I'd like someone from Google to step in and say "the Wikipedia clone stuff isn't hurting you, it's the 565,595 dofollow blogroll links from proboards.com [one per forum post] that caused your site to have its domain authority penalized. 

My issue is that I have followed the rules, and I have built a great site that many people link to organically. Not to brag, but the Southern Professional Hockey League has written my website into its bylaws, stating that my site is the final arbiter as to how many professional games a player has played when determining veteran status. The Central Hockey League links to my site when players get signed. The American Hockey League checked with me before they switched companies that compile their stats, to make sure that it wouldn't cause a disruption in their stats being shown. My site is an authority site that has been knocked down a notch or two below "non-authority" status. I'd like that problem corrected. 

Google is operating in the same way as a credit rating organization such as Fair Isasc & Co., passing judgments on websites and marking some as "spammers" to remove them from the search results. Personally, I think that they should have a customer service department that can investigate issues such as mine to make a determination. I mean, seriously - Google is a multi-billion dollar operation, they have just 2-3 employees offering limited service here plus a handful of special-status volunteers who don't have access to enough information to do specific investigations. That's pretty weak. They are actually offering less customer service than I do to the hockey players and fans that write me, and they earn at least 100 million times the revenue that I do. 


Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/1/12 10:17 PM
On Thursday, August 2, 2012 1:43:41 AM UTC-3, RalphSlate wrote:
Google is a multi-billion dollar operation, they have just 2-3 employees offering limited service here plus a handful of special-status volunteers who don't have access to enough information to do specific investigations. That's pretty weak. They are actually offering less customer service than I do to the hockey players and fans that write me, and they earn at least 100 million times the revenue that I do. 
 
Ya free isn't worth very much. But then again... you got free advice on a free forum, free access to WMT, free access to Analytics, free access to a toolbar, free access to Google Website Optimizer for A/B Testing... that all wasn't free to develop exclusively for you... but it is all "still free".
 
 
 
 
 
 
 
 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) StevieD_Web 8/1/12 10:38 PM
>Or maybe I'd like someone from Google to step in and say "the Wikipedia clone stuff isn't hurting you, it's the 565,595 dofollow blogroll links from proboards.com [one per forum post] that caused your site to have its domain authority penalized. 


I would like a date with Scarlet Johansson but my wants and desires have about as good of a chance of happening as your wants and desires.  But what the heck, we all gotta have our dreams.

BTW, Google doesn't normally provide 1=1 information to website owners.  Maybe, once in blue moon, John Mu will provide some general guidance of check your links.  Or you might get Matt Cutts at a conference to wave his magic wand in the direction of some footer links (not that you have that issue)  But 1=1 advice is pretty darn rare.  Why?  To keep us (the collective us) honest, to prevent us (the collective us) from trying to game the system with a specific piece of knowledge.


So can/does the Wikipedia cloning stuff harm websites?  Yes.  But unless your content is a direct copy of the Wikipedia content, the negative effect is directed at Wikipedia not your site.  Sure, sure Wikipedia is linking to you (with a nofollow link) and the wikipedia clone is now linking to you.  Who cares.  The effect is got to be minor because Google is going to discount the Wikipedia clone site (and any links coming from it) pretty darn quickly.

Could you have been hurt because the links from the clones is suddenly devalued?  Maybe.  But then the value of those links was false to begin with and the your demotion (if that is what you want to call it) really is just a resetting of the proper position of your site.


I realize none of what I just said is helping you.  At this point I don't think much of what anybody is saying is going to help you.  Not that we are wrong, just that your mind is unable or unwilling to accept anything less than 1=1 causation. 

You keep blaming Penguin.  OK, so it is Penguin.  Links.  Penguin is about links.  So lets say it is about links.  You expect 1=1 causation when the causation may be far more complex.   A links to B.  B links to C.  C links to D.  And so forth. Q links to Ralph.  And Google just found out that A is a kissing cousin to D and M is paying L for those links.  Suddenly your whole backlink profile, sites you never heard of, sites with little or no relation to you etc has been devalued.

How unfair?  Not really.  For the last month/year/decade you received an extra boost of traffic from all those invalid links.  Today, your site responds in the proper order of websites based upon a weaker backlink profile.... issues with your site, be they minor or major.... are harder to overcome because your backlink profile has been demoted.

It is not where you want to be.  Great.  Lets do something about it.  Some of the suggestions have been ad density, ad placement, thin internal pages, weak landing pages, poor navigation structure, lack of mobile browser adaptability, duplicate content etc etc.  Some make sense.  Some I agree with, some I don't.  We all have opinions.  I think you disagree with all the suggestions because you have lost site of the forest because of all the trees standing in the way.

Are you going to recover?  Not for me to decide.  At this point I suspect you are too frustrated to think straight and flailing out at everybody and every possible causation is not helping the matter. 






Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/1/12 11:23 PM
Stevie, I'm rational enough to evaluate what people say and see if it makes sense to me. I agree that Google is complex, and I agree that the A->B->C->D->Me devaluation scenario is of course possible. If I had 10 backlinks, that would be a great place to search. I have way more than 10 backlinks. I'm pretty heavily linked in the hockey world. Unless everyone gets devalued, it isn't likely that I'm specifically getting devalued because sites that link to me are devalued.

Sure, no one thinks that they got "extra" traffic from Google, and I don't either. But I really don't think I have a false sense of the authority of my website in the hockey world. I'm still getting 20k unique visitors per day in the offseason. Sure, without Penguin I would be getting about 35k, but 20k isn't anything to scoff at. I'm set back about 2 years in traffic. I just don't want a thumb on my scale so that I have to work 10x the amount of others to be equal to them.

I understand that no one here can give 1:1 advice. I think that's a lousy policy but that's the way it is. Why am I posting here? Many reasons. Venting. Trying to gain converts. Trying to get the attention of Google. Trying to learn things - and I have learned a lot, by the way, to the point where I think that once my site penalty is lifted, I will be crushing for many keywords. A lot of people think that Google can do no wrong. I used to think that. I don't anymore, because I was hit by an algorithm that targeted webspam, and I have no webspam. 


Re: AngryPenguin: looking for Penguin advise (links or otherwise) Brian Ussery 8/2/12 2:37 AM
RalphState,

Back on Jul 10 I pointed out that utility and thin content are issues for this site.  I mentioned also that Ad layout is probably an issue as well but things still seem the same?

When it comes to search quality, page utility based on the query is critical.  When someone searches for sports players they want the most accurate info about that player that they can get in a page.

Now let's compare the URLs again that were previously asked about:
1. http://www.uscho.com/stats/player/mid,15442/mike-vitale/
2. http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=146971

- URL 1. lists the correct hometown whereas URL 2. does not according to http://ase.tufts.edu/athletics/old/menIceHockey/profiles/year/2009-10/vitale.htm. (Panda keys on factual errors)

- URL 1. provides every 2010-2011 match in a single landing page whereas URL 2. does not. (I have not been able to find it on the site either.  Yes, I'm aware uscho.com only includes only college stats.  That is because it is the "US College Hockey Online" website.  Including non-college stats in the US College Hockey website would make it less relevant in terms of content scope.)

- URL 1. contains rich markup that search engines can use for rich snippets in search results to increase CTR but URL 2 does not appear to include the same.

- URL 2 includes more ad real estate above the fold which can be a real problem.


I recommend focusing on the real issues and resolving those not thinking about backlinks.


-Brian




Re: AngryPenguin: looking for Penguin advise (links or otherwise) Suzanneh 8/2/12 3:19 AM
I give up.  You're going to believe what you want  to believe despite what anybody says.  Okay, you believe you've got some -1 page penalty. You could be right; you could be wrong. Nobody here can change that, and we certainly can't give you any advice because people don't seem to have heard about it.  There hasn't even been a Penguin refresh in awhile, so you can't see if any of the changes you made helped (if you made any Penguin-related changes).

I hate to say it, but the phrase "stop wasting people's time" is now coming to mind.

Suzanne
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/2/12 4:40 AM
Hi Brian --

0.9) I have removed a small adsense text unit from every page on the site, so I did reduce the advertising.

1) The correct birthplace of this player is (not hometown - they are different) of Mike Vitale is Berwyn, IL. That was reported when he played in the EJHL with the Harbour Wolves. The EJHL reports birthplace, the NCAA reports hometown.

2) You're right that USCHO provides detailed 2010-11 stats and I do not. You're also right that my site provides stats from other leagues and USCHO does not. Google can't determine which is better, only users can.

3) I was not aware of rich snippets, that is something I will look into. I am not aware that they affect search results.

4) I'm not sure what you see, but USCHO has a Leaderboard and a 300x250 above the fold; my site has those ad units as well, plus a 125x100 box which occasionally shows an ad but usually points people to my Facebook page. That is not excessive advertising. USCHO has 3 more 300x250 ads plus another leaderboard plus an adsense unit as you scroll down the page. My site does not.

So what do we have here when someone searches for Mike Vitale? We have a virtual tie. Two similarly credible sites, both with similar amounts of information, and both with similar amounts of advertising above the fold. What should Google do? Would you expect that it would serve up one page near the top and then not show the other page at all, even when it starts showing irrelevant links? Should it show a page from NHL.com where the last name "Vitale" is on the page, and also the first name "Mike", but nowhere near each other? Should it show four pages from mikevitale.com? Should it show duplicate pages from pointstreak.com ( http://easternjhl.stats.pointstreak.com and http://ejhl.stats.pointstreak.com which have duplicate content on it)?

Because that's not what I'd expect. 

So what again are the real issues? I should remove all advertising, ditch all the statistics, and write a couple of paragraphs about this player in order to rank?
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Brian Ussery 8/2/12 10:54 AM
Ralph Slate,

1.) When in doubt go with the way that the 100 year old, recognized, authoritative source for information about college sports does it.  A few weeks ago, Google reported that Michael Jordan went to law school because of an issue just like this one. Algorithmically, appearing to disagree with the most respected sources for this information could make you appear to be a less authoritative source. 

Always specify what is being displayed.  How is an engine to "know" what you mean?  Regardless of what others do, for college sites hometown is more relevant than birthplace because that is usually where athletes play before college.

2.) Google makes assumptions based on user actions like long clicks and short clicks.  Because there is less visible content on your site your short click count is probably higher than the other site.

3.) Yes implement rich data so engines can return rich snippets to users.  Rich snippets help users and increase CTR.

4.) Go to http://browsersize.googlelabs.com/ compare both pages at 95% and watch what happens.  The other site provides visible content where as your site shows nothing but ads.


Focus on providing the best authoritative content and most utility for each individual page based on the perceived intent of the user's query.



Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/2/12 11:48 AM
Hi Brian --

1) Yes, you're in many ways right about the birthdate versus hometown issue. This is a thorny issue; I try and show birthplace whenever available because a player can have just one birthplace whereas a hometown can (and frequently does) change. I leave it explicitly vague because I do have a hole in my process because I don't always know if the field that any given league reports is a birthplace or hometown. That is a very complex data cleansing issue, made complicated by the fact that if you look at the season-based source data for many players, you are apt to come up with up to six different places associated with that player. (By the way, my site has the player's birthdate; USCHO does not, so score one for me!)

2) Google may or may not use "long clicks" versus "short clicks". I feel that this data would be easy to misinterpret because a short click could mean that the data isn't relevant or it could mean that the query was satisfied.

3) I have added rich text snippets to the player pages, the standard "author" schema object plus the "person" schema object to mark up the biographical information. USCHO only has the "author" snippet, which is really pretty meaningless in that context since it is the same on every page. We'll see how that works out.

4) I'm not sure what you're seeing, but I have uploaded a screen shot of the 95% area of each site trimmed out. Although I will concede that my site has about twice as much space devoted to ads in that view (20% versus 10%), I am not most definitely not showing "nothing but ads", and as you can see from the screenshots, the same basic information is available in that view on both sites - the vital stats. USCHO spends more real estate promoting other sections of their site by having 2 menu bars and more blank area.

Now I realize that no one knows what Google uses for an algorithm, but in your opinion do you think that those differences, which I consider to be relatively minor, are likely to completely remove my site from the Google SERPs? I know that anything is possible, but do you think that is the likely answer?

If you'd be willing, could you look at one of the examples I showed above? http://www.dropyourgloves.com/Players/Player.aspx?Player=48337
 versus http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=7587? It might be interesting to pretend that the tables are turned and that dropyourgloves was the site complaining about not being ranked, because I think you could have a field day with that site.

Have you ever seen a "#11" penalty before? I am seeing many pages that are not so thin but have no direct backlinks ranking at #11. I feel like there is something penalizing my site to cause this effect. If I knew what the problem was, I would easily solve it, but because it could be either Panda or Penguin, and the solutions are so different, I don't really know the direction in which to turn.

The advice given to me by John Mueller of "make your site the best of its kind" is too vague to act on, because in my opinion of "the best of its kind" happens to mean "include information about as many players as possible" whereas Panda may say "Bzzzt. By adding more players, that is increasing your thin content, so we're going to hit you even harder". It would be nice to not have to care about this and go about building the site in the way I feel is the best, but financially I just can't ignore a drop of 15,000 visitors per day and to boot, so many people not being shown my site to give them the chance to like it.


Re: AngryPenguin: looking for Penguin advise (links or otherwise) Pelagic 8/3/12 2:02 AM

Hi Ralph, the answer to your issue has already been provided to you very succinctly by a Googler (more than once), not to forget quite a bit of really good advice from other regular members here as well, yet you still seem to totally ignore it and instead are convinced its some form of collateral damage due to recent algo updates, trust me its not, your site has huge amounts of issues, which fortunately for you are not insurmountable ;) 

Thin Content, duplicated content, yep you have a ton of that, I'll provide some examples for you >

Calendar Pages
e.g


Yearly Attendance Graph Pages
e.g.


Logo Pages
e.g.


Real-Time Script Pages
e.g.

These are now blocked by robots.txt, previously they weren't

The same applies to these two types of pages>



Duplicate indexed (printable) versions of your forum pages>
e.g.
Also note thats its indexed with and without WWW, and with various parameters ;(


Duplicated Sub-Domain>
I note you have very recently 301 redirected it, cool ;)


Totally Empty Pages> 
(<html><head></head><body BGCOLOR="#CCC08F"><div><!-- GB H9 44 --></div></body></html>)
e.g.


Totally Empty Pages with Ads
e.g
<html><head></head><body BGCOLOR="#CCC08F"><div><!-- GB H9 45 --><!-- Frequent --><script>document.write('<scr' + 'ipt src=http://ad.doubleclick.net/adj/hockeydb.fsv/ros;sect=ros;fantasy=no;game=no;tile=1;dcopt=ist;sz=728x90;ord=11969?></scr' + 'ipt>'); </script> </div></body></html>

<html><head></head><body BGCOLOR="#CCC08F"><div><!-- GB H9 48 --><!-- Frequent --><script>document.write('<scr' + 'ipt src=http://ad.doubleclick.net/adj/hockeydb.fsv/ros;sect=ros;fantasy=no;game=no;tile=1;dcopt=ist;sz=728x90;ord=12542?></scr' + 'ipt>');</script></div></body></html>

<html><head></head><body BGCOLOR="#CCC08F"><div><!-- GB H9 24 --><!-- FASTCLICK.COM 468x60 v1.4 for hockeydb -->
<script language="Javascript"> ... ...> <!-- FASTCLICK.COM 468x60 v1.4 for hockeydb --> </div></body></html>

Somehow I guess these were accidental ? I'll refer to the page titles > "Stop Stealing Content with Site Suckers"


Then there are the Banner Link Pages which link to internally 302 redirected pages
e.g.

<html><head></head><body BGCOLOR="#CCC08F"><div><!-- GB H9 49 --><!-- Frequent --><center><a border="0" target="_new" href="/ihdb/banners/marker.html"><img border="0" width="728" height="90" src="/ihdb/banners/marker.gif"></a></center></div></body></html>



And a few more indexed pages that might surprise you !

"Please select a search option" > doesn't work !  

"The were a hockey team based in , playing in the from -1 to ."

Old version of your homepage

Old version of your homepage retrieved and republished from the Internet Archive (check the nav links ;)

More crap >


Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/3/12 5:12 AM
Pelaic, this is the best post I have read - because I can actually do something about those problems. I like it a lot better than the post telling me I should stop being a hockey data site that focuses on obscure players and turn into a hockey article site that focuses on the top players :-)

I hope you can help me one more time, to discuss what method to use on each of those problems. Yes, they are caused by old code and sloppiness - my site has been on this particular server for 12 years, a lot of stuff builds up.

1) In general, how did you find all those pages? Did you just search the Google index for them? I'd like to try and find more.

2) Vbulletin. It is more headaches lately that it is worth, particularly with all the spam links it used to attract. Do you think it is wise to just remove it from the index? I can't possibly figure out all the little stupid pages that product is capable of generating (like printable versions).

3) Yes, the yearly attendance graphs are thin when a team has played for just one year. The graph is just too big to put on another page though; I feel like the best user experience is for it to be standalone. What is the best way to handle that situation? I would like someone who is searching for "Jacksonville Barracudas attendance" to get to my site because I am actually the only site who has compiled all that attendance data. And when I do search for that phrase, that page (plus the 2 other incarnations of the barracudas) rank #3, #4, and #5. That's pretty much what I want, isn't it? I can certainly put more text on the page, perhaps some internal links to give the users somewhere else to go.

4) I think I may have asked this question about the logo pages before; the question is, what is the best way to handle a situation where you have a gallery page showing thumbnails, but you want a separate page for a full-size image? Having the logos on a standalone page makes the most sense because they are too big (and some teams have 5-6 of them). So what should I do with them? Again, I want someone searching for "Jacksonville Barracudas logo" to get to my site because I'm offering that logo.

5) Regarding the real-time script pages, yes, they are blocked by robots (I think they have been for a while). What more should I do about them? Should I put in a removal request to Google for that particular URL? Will I need to do one for each of the pages, i.e. hockeydb.com/ihdb/stats/embed.php?pid=1, hockeydb.com/ihdb/stats/embed.php?pid=2, etc.?

6) I am pretty sure I have had the "viewastext.php" script blocked with robots.txt for a long time. Those URLs are gibberish because they contain a code designed to prevent someone from spidering the text versions of the site - the code is a hash of the date and IP. Again, what is the best way to get them removed from the index?

7) The 302 redirect is how I track clicks when someone buys an ad on my site. I put the ad link to a non-existent page which I then redirect to the client's site. I can then track how many clicks the ad got. Those are all leftover. I actually searched the Google index for those pages the other day but didn't find them. Is that page in the index? How did you find it if it is not?

8) Since my site is database generated, I need to handle situations where people put in bogus querystrings. What is the best approach there? Should I give a friendly 404 with a suggestion where the user can go? Or should I 301 to a somewhat relevant page? Google seems to discourage the latter, saying that it could confuse their crawler and also confuse users.

So I guess most of my questions can be summed up as:

A) What is the best way to remove "crap infrastructure" pages from the index? Putting a meta "noindex" on the page? (Blocking with robots doesn't remove them). 301 them to another page? 

B) Is Vbulletin more trouble than it is worth? Is it possible to deindex all those little calendar pages, etc.

C) Should I 301 when someone hits a bad URL or should I 404? 

D) How should I handle content that Google may think is thin but which I think is still relevant, content that can't logically be combined onto another page? 

Thanks!
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Pelagic 8/3/12 5:56 AM
"my site has been on this particular server for 12 years, a lot of stuff builds up", understandable, but now is the time to get rid of all the dead wood that has accumulated over the years ;)

A.  If the old/crap pages are removed then 404 or 410 them

With ultra-thin pages that you don't want indexed, personally I wouldn't block crawling, instead I would use meta or x-robots noindex

B. Should be very easy, check on the VB forums

C. 404 (custom/friendly ;)

D. Same as A, noindex

With the Banner Link Pages, apply noindex and nofollow, better still get rid of them and link directly to the 302, which incidentally you should have blocked from crawling in the first place. ps personally I prefer direct external links (with nofollow when appropriate ;)

No need for removal requests, just unblock and noindex them ;)

I don't know why anyone would enter bogus querystrings, its irrelevant anyway unless they actually link to those bogus urls, I would simply 404 them.

Some help articles>

Parameter Handling

Canonical Link Element


Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/3/12 7:34 AM
PS, to whoever is trying to use Xenu on my site, you got hit with my governor (you hit the page limit). I have unblocked your IP, so if you want to try again, please feel free to do so.


Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/3/12 12:43 PM
OK Pelagic, I have managed to use apache to put a noindex into the http header for the irrelevant vBulletin pages. I used the same technique for the "embed.php" page. I think that the vbulletin fix will remove about 85,000 pages from Google's index, and the embed.php will remove about 22,000 pages. There is some debate from other forum watchers as to the benefit I will see, but I'm willing to take a shot here.

I have a stickier situation with the "viewstext.php" page. Here's why:

First, Google has indexed around 8,500 instances of that page. However, when I originally designed that page, I did something non-standard in an attempt to prevent scraping the page (since it is a text version of my player stats page, it would be easier to scrape - keep in mind I did this 8 years ago).

The URL to that page is formed like this:

http://www.hockeydb.com/ihdb/stats/viewastext.php?ff9ae37c=f756a676&pid=117583

the "ff9ae37c=f756a676" is like a passcode - it is generated with a hash using your IP address and the hour of the day. If you click on the link from my site, you will get the text stats page. If you send the link to someone else, paste it in a forum, etc., you'll be 302 redirected to the non-text main player page. The link is meant for one unique person to click it, limited time only.

Here's the problem: If I tell Apache to serve the "Noindex" directive when someone requests the viewastext.php page, this is what they get for a header:

Status: HTTP/1.1 302 Found
Location: /ihdb/stats/pdisplay.php?pid=117583 [the redirect to the main stats page]
X-Robots-Tag: noindex, nofollow

Here's my worry -- is the X-Robots-Tag of noindex nofollow going to be applied to the viewastext.php page, or the main stats page that you're being 302 redirected to? I can't have any margin of error here, because I really don't want to remove the main stats page from the index.

Any thoughts on that arcane issue?
Re: AngryPenguin: looking for Penguin advise (links or otherwise) KCle 8/3/12 1:06 PM
Nothing to add really ...other than a question.   How much money has this site made over the years ?  Why not hire a professional take care of the mess that is?  If the site was profitable and worthwhile, I would have been getting experienced people in on it ..rather than taking suggestions from people on the outside looking in and making deep stabs in the dark.


Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/4/12 8:14 PM
Pelagic, here is something I discovered in the cleanup, it may be relevant, perhaps crucial to my situation.

A while back, possibly Feb-March, I changed one of my pages, a page that showed the standings for a hockey league for a particular season. I added a link on this page so that someone could easily move to the prior season or the following season. The boundaries should have been the first or last season that the league existed; the html file incorporates the season, so for example, one file is named wha19731973.html, and a file exists going up to wha19731979.html. To break that last one down, the league ID is WHA1973 and the season is 1979. 

Apparently my code wasn't tight enough, and I think that Google somehow broke through and started requesting files in that format for leagues that did not exist, for example, requesting wha19721973.html. My code still returned a 200 page for that request, and unfortunately, since the league didn't exist, there was no limit on the first or last season. It was basically infinite. 

As part of my cleanup that I've been performing, 2 days ago I set things so that if a file is requested for a league/season combo that doesn't exist, it returns a 410 error. And lo and behold, today I get a ton of "not found" errors in WMT for files such as wha19721691.html - which translates to an unknown league for the year 1691. The page has all my headers on it, but no data.

So basically, I exposed thousands, maybe tens of thousands, maybe even more pages of emptiness to Google. Nothing on those pages except 2 links - one to the year before, one to the year after. Like an infinite loop. 

I can't find any of those pages in the index, but I wonder if this is why I saw a sudden Google referrer dropoff - maybe all those pages triggered something? 

Now I'm 410'ing them, hopefully at some point Google will clear itself of those pages. 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/4/12 8:29 PM
It's neither here nor there.
 
Duplicate content is simply ignored... it doesn't get penalized or devalued the first page is kept and the rest of are ignored.
 
The only time you would every notice a problem is when you are attempting to actually rank such pages or had ranked pages that later got ignored..
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/4/12 9:15 PM
What is the definition of "attempting to rank such pages"? I'm a bit puzzled - I hear that Panda penalizes sites for having too much thin content, but if you're asserting that Google ignores thin content pages, then how did so many people get nailed by Panda, claiming that even their good pages got hit? It seems like you don't believe that there is such a thing as a site-wide penalty. 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/4/12 9:29 PM
By the way, what do you make of this query? Search for [Gaylord Grizzlies] on Google. My site occupies the top 3 spots. That is out of character with how Google has been ranking other searches for teams. The only difference I see is that Wikipedia doesn't have a page on the Grizzlies. 

Same thing when I search for [Saginaw Gears NAHL]. Again, I'm on the top of the charts. Again, no Wikipedia entry.

Yet when I search for [Lansing Lancers hockey], my team history page is on page #5, despite my page having more content than the Wikipedia entry. My team scoring page is on page #2 though, but still, I'd figure the team history page, after I did research to expand the writeup, and after I consolidated the season records onto that page, would beat at least a couple of the page 1 items, such as the 2 empty pages from sportslogos.net, or the other sites that just cloned Wikipedia. 

It would be troubling if Google is putting me head-to-head against Wikipedia in a death match, and if they Wikipedia has a page on the search, my site gets buried. I know about the "search for frogs" thing, where they want to diversify the results, and JohnMu did call my site "a stats aggregation site" or something - I wonder if they have classified sites into buckets and their classification algorithm can't tell the difference between a site like ESPN, which simply regurgitates current stats from a feed, and my site, which compiles historical stats which aren't available in those feeds.
fathom 8/4/12 10:00 PM <This message has been deleted.>
Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/4/12 10:09 PM
These ARE NOT PENALTIES!
Let's start small... you have a blog and you add 1 post... that posted content, is the homepage, category page, archive page, and post page... one original and 3 copies. If there long enough Google will simply ignore three pages because it does not need 4 pages of the same thing to represent your domain... so the homepage probably has the most link juice (higher PageRank) so it keeps that and ignores the rest.
You add another page in a month and now the 2 post pages are different from all other pages so it now indexes both, the second post was added to the same category which means the homepage and the category pages are identical so it continues to ignore the category page and the archive pages are the same as the post pages so those are ignored.
...but you still have rankable pages for all phrases you can rank for... you didn't lose a thing.
In your current case you were not trying to rank 10,000 X 2 link only on the pages and Google agrees that those are duplicated so it just ignores them as it normally would.
Your PANDA problem - is not the same thing. You don't have duplicated content you have large swaths of similar content that most likely was in Google good graces prior to PENGUIN because you had more link juice to support a Google belief that such pages were useful for Google users. But after PENGUIN (purely theory though) a volumes of links to you lost link juice because other domains in your backpath (father up the PageRank river) dropped due to PENGUIN thus providing you less overall, and many of your similarly style data pages didn't seem as useful to Google anymore so along comes PANDA to devalue them.
You're confusing yourself with this stuff because you classify what occurred to you as a penalty... it is not... Google simply believes other webpages represent your past (lost) queries better than your pages so it promoted other domain pages over yours. At work... if a colleague get promoted... that isn't a negative reflection on you it is a positive reflection on them.
Lastly, in your noindex, follow approach to the forum... you changed nothing because you still have 85,000 pages sharing & passing PageRank even though they won't appear in the index anymore... meaning all those pages with queries you lost traffic on (the ones now at #11) that needed more of a lionshare to have a chance of recovering won't.
If you made it the Meta noindex, nofollow... you might have a chance at recovery although in my wealth of experience once pages (the ones at #11) get devalued it is next to impossible to recover without substantial editing them, in and of themsevles... e.g. getting caught shoplifting - they don't allow you to keep the goods, but since the goods are back in the store technically the store never got anything shoplifted... but they don't see your crime that way.
Google's PANDA doesn't let anyone off the hook without upping you game first. IMHO! 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/4/12 10:20 PM

On Sunday, August 5, 2012 1:29:21 AM UTC-3, RalphSlate wrote:
By the way, what do you make of this query? Search for [Gaylord Grizzlies] on Google. My site occupies the top 3 spots. That is out of character with how Google has been ranking other searches for teams. The only difference I see is that Wikipedia doesn't have a page on the Grizzlies. 

No idea... it is what it is... be happy that it is.  Course there's an opportunity as well! You should your stats for wiki what if you researched something for wiki and got a another link? 
Same thing when I search for [Saginaw Gears NAHL]. Again, I'm on the top of the charts. Again, no Wikipedia entry.
 
There's an opportunity! Same as above. Sure they are nofollow but others research through wiki and dofollow links to other resources.
Yet when I search for [Lansing Lancers hockey], my team history page is on page #5, despite my page having more content than the Wikipedia entry. My team scoring page is on page #2 though, but still, I'd figure the team history page, after I did research to expand the writeup, and after I consolidated the season records onto that page, would beat at least a couple of the page 1 items, such as the 2 empty pages from sportslogos.net, or the other sites that just cloned Wikipedia. 

It would be troubling if Google is putting me head-to-head against Wikipedia in a death match, and if they Wikipedia has a page on the search, my site gets buried. I know about the "search for frogs" thing, where they want to diversify the results, and JohnMu did call my site "a stats aggregation site" or something - I wonder if they have classified sites into buckets and their classification algorithm can't tell the difference between a site like ESPN, which simply regurgitates current stats from a feed, and my site, which compiles historical stats which aren't available in those feeds.
 
Wikipedia is not the best resource around it has editorial flaws but it is a powerful tool to use if you leverage it correctly. 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/6/12 7:59 PM
Fathom, here's something I still don't understand.

You speak about of link juice. Although I understand the term, I don't understand the concept as well as you do - but from what I gather, you get link juice by other sites linking to you. The more reputable the site, the more juice you get, right? 

On my "recent links to your site", I am getting about 2,000 links per day from other sites. I have 55,000 links listed as gained in the past 2 months. They come from approximately 2,800 different domains. Now certainly a lot are not from top-notch sites, but I do have links coming in from the Toronto Globe and Mail, Montreal Gazette, Canada.com, Pittsburgh Post-Gazette, etc. I even have links from the official sites of the Montreal Canadiens, Minnesota Wild, and Vancouver Canucks, as well as links from dozens of official hockey team sites at lower levels of hockey.

Shouldn't I be getting a fair amount of link juice from this, perhaps enough to overcome Panda? I mean, that's a pretty good number, isn't it? Even if 60% of them are from people cloning Wikipedia, isn't 800 links per day a very good number of links to be receiving organically? 
Parkin 8/6/12 8:36 PM <This message has been deleted.>
Re: AngryPenguin: looking for Penguin advise (links or otherwise) dyoc 8/6/12 8:47 PM
I think that may be a little out of date in two ways: first, nofollow still apparently does remove (perhaps less?) link juice from the source page, and it's fairly accepted now that "negative linkjuice" exists that actually penalizes the destination page, allowing negative SEO.

Good times.

Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/6/12 10:38 PM

On Monday, August 6, 2012 11:59:28 PM UTC-3, RalphSlate wrote:
Fathom, here's something I still don't understand.

You speak about of link juice. Although I understand the term, I don't understand the concept as well as you do - but from what I gather, you get link juice by other sites linking to you. The more reputable the site, the more juice you get, right? 
 
Link juice commonly refers to a collection of merits provide by a link pattern (some are just synonyms for others as well)... PageRank, revelancy, relatedness, weight, trust, prominence, authority, etc., and the pattern(s) is the link anchor text and the actual link type (both of wish Google seems to identify easily).
On my "recent links to your site", I am getting about 2,000 links per day from other sites. I have 55,000 links listed as gained in the past 2 months. They come from approximately 2,800 different domains. Now certainly a lot are not from top-notch sites, but I do have links coming in from the Toronto Globe and Mail, Montreal Gazette, Canada.com, Pittsburgh Post-Gazette, etc. I even have links from the official sites of the Montreal Canadiens, Minnesota Wild, and Vancouver Canucks, as well as links from dozens of official hockey team sites at lower levels of hockey.
 
Much of which, the higher authorities will certainly offset the lack of trust on the others... which was why we all agreed you don't have a PENGUIN issue.

Shouldn't I be getting a fair amount of link juice from this, perhaps enough to overcome Panda? I mean, that's a pretty good number, isn't it? Even if 60% of them are from people cloning Wikipedia, isn't 800 links per day a very good number of links to be receiving organically? 
 
This is where indepth data mining need to be done, but if I may, higher authority domains point to NHL pages and also the premiere players this you can infer they have the supporting link juice to keep PANDA at bay (even with limited contextual information on the page.
 
In contrast, phrases that you seem to be a little devalued (#11) are obscure clubs and players.
 
While getting more links to these would normally help that action is "preventive maintenance" IMHO. I've never seen a PANDA recovery occur from more links without substantial page edits.
 
In theory, it may work if the NHL league site of franchise clubs sites links to it... but I suspect that will never happen. 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/6/12 10:50 PM

On Tuesday, August 7, 2012 12:47:31 AM UTC-3, dyoc wrote:

it's fairly accepted now that "negative linkjuice" exists that actually penalizes the destination page, allowing negative SEO.

 
That's just a bad interpretation IMHO.
 
Negative Linkjuice?
 
What happens if you took all links that are considered unnatural and added rel="nofollow" to them?
 
You lose all that link juice... correct? ... this is what PENGUIN does.
 
If you actually truly add rel="nofollow" to some of the links you drive your unnatural pattern below PENGUIN detection threshold thus recovery is possible without editing all.
 
That is not a negative thing.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/6/12 11:11 PM
You're right, I am primarily talking about pages that no one links to - and I don't expect people to link to those pages. Surely there is a small benefit of having tens of thousands of sites linking to my root domain on a monthly basis though? Wouldn't that create a substantial amount of link juice which would make my domain a more trusted source, thereby allowing the thin pages to at least be on par with similarly thin pages from other sites that don't have nearly as much domain link juice? 

There is one other site which provides less complete profiles of such players in question (I think because they seeded their database by scraping my site about 5 years ago and I have spent those 5 years adding and refining my data) but is consistently ranked at the top for those long-tail queries. You've had a field day criticizing my site, I'd be curious as to your opinion as to why a site like this isn't being hammered down as well:


versus:


I realize that Google's algorithm doesn't know which page is more comprehensive, but that site seems to have a lot less going for it than mine, especially in terms of domain-based backlinks. Yet I am consistently seeing them ranking for the types of queries that I want to rank for. Their long-tail pages have the same number of backlinks as me: zero I realize the answer may be "Google just hasn't caught up with them yet", but why not? 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/7/12 12:45 AM

On Tuesday, August 7, 2012 3:11:39 AM UTC-3, RalphSlate wrote:
You're right, I am primarily talking about pages that no one links to - and I don't expect people to link to those pages. Surely there is a small benefit of having tens of thousands of sites linking to my root domain on a monthly basis though? Wouldn't that create a substantial amount of link juice which would make my domain a more trusted source, thereby allowing the thin pages to at least be on par with similarly thin pages from other sites that don't have nearly as much domain link juice? 

Well sure and that would normally be the case BUT you are missing an important piece of puzzle. You lack information and you pages that are about obscure topics are obscure in information and obscure in links... how can you expect returns if you offer nothing but a page with nothing much supported by no links to fool Google that the page(s) are more important that what they are.
 
You got to have something... don't you think.
 
Contrarily if you have all they other stuff... why in the world are these so important?  
There is one other site which provides less complete profiles of such players in question (I think because they seeded their database by scraping my site about 5 years ago and I have spent those 5 years adding and refining my data) but is consistently ranked at the top for those long-tail queries. You've had a field day criticizing my site, I'd be curious as to your opinion as to why a site like this isn't being hammered down as well:

versus:
 
Sure... but honestly, you want everything for yourself and you are not willing to up your game for it. I suspect you are getting volumes of traffic and a fair revenue stream... hire a couple of bio researchers/writers and increase these obscure pages... if they are not worth that... stop worrying about it.
I realize that Google's algorithm doesn't know which page is more comprehensive, but that site seems to have a lot less going for it than mine, especially in terms of domain-based backlinks. Yet I am consistently seeing them ranking for the types of queries that I want to rank for. Their long-tail pages have the same number of backlinks as me: zero I realize the answer may be "Google just hasn't caught up with them yet", but why not? 
 
I'm sure they would love to have the ranks you have... why do you deserve everything if you are not willing to do everything to achieve what you want.
 
You can dance around the real issue all day, week, month and year but until you up your value add per page you aren't going to get what you desire because you are perfect either and you got more than the next guy... and you state that yourself. 
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Pelagic 8/7/12 8:50 AM

Hi again Ralph, 116,000 pages of crap/empty/duplicated addressed already, wow that was quick, now slow down a bit, deal with each issue one step at a time ;)
 
Whilst many of those urls should be canonicalised, some no-indexed, some filtered from crawling via parameter handling in WMT and some actually removed, its always far better to correct it at source rather than applying those band-aids, as otherwise at the very least it wastes your crawl budget , therefore recovery will inevitably take considerably longer.

As its a 302, the x-robots noindex should work regardless and should not apply to the destination urls

1691 > Yep small coding errors can create an inifnite number of pages, typical calendar issue, and yes its also very relevant to your site's issues amongst the other issues already mentioned ;) Assuming that you have fixed those internal links, don't worry about the crawl errors reported.

As Johnmu pointed out, you need to make your site better as a whole compared to similar primarily data aggregated sites ;)

Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/7/12 10:48 AM
I wonder if that is part of the problem here - if my site is being classified as a "data aggregated site" which implies that I am simply taking data from other sites and aggregating it. Although in some sense that is true - I didn't actually invent the numbers I present - on the other hand 80% of what I have did not come from online sources; it is the product of research, compilation, and data entry by me. Others (including Wikipedia) then grabbed the data to build their own sites, but people in the hockey world have always known that my site was the source (and Wikipedia at least credits me with 80k+ nofollowed links).

I wonder if I am being considered by Google to be a follower here rather than a pioneer in this field on the internet?
Re: AngryPenguin: looking for Penguin advise (links or otherwise) dyoc 8/7/12 11:00 AM
This may have already been suggested, but have you looked into adding rel="author" links to a Google+ page?
Re: AngryPenguin: looking for Penguin advise (links or otherwise) StevieD_Web 8/7/12 11:11 AM
>I wonder if that is part of the problem here - if my site is being classified as a "data aggregated site" which implies that I am simply taking data from other sites and aggregating it.  

OK, I will bite, where do you get your data?  

A)  Do you (or some sort of staff hired and controlled by you) watch each game either live (in person) or on TV and record/generate the data?

B)  Do the teams or leagues provide a data summary to you?

C)  Do you extract the data from newspapers and other written sources?





Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/7/12 11:27 AM
Yes absolutely...
 
You need to objective look at your vaslue to the web in general. Your problem isn't today... your real problem is some time in the future.
 
As much as you have tons of high quality links attempting to provide you authority you have "nothing that is uniquely yours". How much original "COPYRIGHT PROTECTED" pages do you have? From where I sit 263,000 pages of public domain info... and 0 pages that are uniquely owned by hockeydb.com
 
That is you #1 problem here.
 
As I said, you can fool PANDA with links from great websites but that is simply a placebo trick and it will not last. Take a lesson from ezine articles on the Farmer/PANDA update... they lost 48 million page views in a month down to 1 million (96% of their traffic)... that's you in the coming future. You may indeed be more specialize... but you aren't that specialized in an ojective sense.
 
If you don't get real copyright owned pages in your website soon... PANDA 4.0 could be an absolute catastrophe for you. You could be safe until maybe PANDA 5.0 but the writings are on the wall.
 
You're obsolete already, and you won't see it coming until after it is too late ... and then it's all over but the crying.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/7/12 12:14 PM
I obviously don't watch the games and create my own data. The data comes from a lot of places - yes, aggregated, but not "point my browser at a website and suck down the data, slap my headers on it and just regurgitate it". Far from that. My sources are:

1) Printed publications dating back to 1900 - primarily guides for every league or team imaginable, and when no guide is available, game programs which may contain prior year's data. A lot of the stuff is very rare, few people have seen it or have access to it.

2) When a league didn't release a guide, they often released their information to the local newspaper, so the official information is retrieved from the microfilm (or more recently, from the digitized microfilm). Sometimes I have to go to 10 different papers because each paper only printed the local team information.

3) Sometimes the leagues prepared year-end releases for the press but the press didn't print them. I have a lot of those, purchased on the collector's market. In some cases I have had old statisticians pull stuff out of their basements and send me their original mimeograph copies, in some cases I may have the only surviving copies of the data.

4) In some instances, the player stats have to be recreated from boxscores published in newspapers. Other times the game results have to be compiled from newspapers. Sometimes only the rosters of the teams are available. I have done some of this work, others have done this as well and sent me the information. In many cases I am assembling data from a variety of sources, for example, a NCAA media guide may list the hometown of a player, a game program may list his height and weight, and a newspaper article may list his birthdate. Or sometimes the family contacts me and tells me where their grandfather was born. (I never take their word for more than vitals though, since oral history is unreliable).

5) For more recent information, probably 1992 forward, most leagues have used various third-party companies (more lately using rink-based software) to record and publish their stats. Those leagues have provided me with both the electronic copies, and in many cases, with special access to the software so I don't have to "scrape" the league websites (since the leagues find the "aggregation" to be a valuable service).

I wouldn't say that the data is exactly "public domain". Although US copyright law doesn't recognize "sweat of brow", international copyright law does have some protection against copying substantial portions of database, even if such databases are made available online. Additionally, I think there is some protection under a business tort of unfair competition, meaning someone profiting from the work of a direct competitor. I am very careful about not using the work of a direct competitor, which I define as another website or paper publication whose purpose is the same as mine - to publish hockey data. I don't consider an official league to be a competitor because 1) leagues have been open to me and 2) their primary business is not publishing statistics, it is running a hockey league. Yes, that could be the edge of a razor, but no one has complained yet.

So, for example, the website "dropyourgloves.com" has some data that I don't have - but I am not going to suck that data into my database for a couple of reasons: first, I am very proud of the trusted lineage of my data. I have had hundreds of players and agents try to supply me with false information, so they can further their careers (since my site is the trusted resource for this kind of stuff). I don't know what controls those sites have. But even if I trusted them, I don't think it is fair to use the work they have done to compete against them. Lastly, I am proud to have the reputation of a leader, not a site-copier.

Fathom, you have made your disdain for my site clear - but the only original site is a fiction site. In theory I could hire someone to write an article about Wayne Grotski, but how will he get the data for the article? He will go to newspapers, magazines, other websites, etc., and will take the facts (i.e. data) he learns, spin them into words, and put them into paragraph form. If he writes enough of such articles he will follow a basic template - just like most newspaper reporters do (I've read a lot of newspaper articles on hockey, the templates become obvious). Do you know that a company called Pointstreak has written software that will take a boxscore and spin it into a written article?

There is very little difference between data and prose when it comes to information. For you, or Google, to determine that my site is not worthy enough to even appear before the public, preventing them from choosing data over prose, is quite frankly the height of arrogance.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/7/12 12:29 PM
P.S. - here's a data point for you. Since my site went live in 1996, I have never, ever, ever had a visitor suggest to me "your site is pretty good, but instead of the numbers, I'd like to see some biographical information on these players instead". That is across probably 100,000+ different emails I have received.

I don't think your strategy is a good one, and I wish you'd stop assessing my site using the tunnelvision that the only good site is a prose site.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/7/12 12:50 PM
Ralph you're telling me truth... and I agree... and here is some more truth.
 
Since 1996 you have never, ever, ever had volumes of people saying "I want that obscure player or that obscure team or that obscure league"... they want what they want and you for the most part seem disinterested with that value you offer.
 
You haven't been here... drippling about your loses to hockey in general or to star players... everyone got what they wanted... why are you so intense on providing what they don't?
 
You're page one for everything that matters... and you're here complaining about things that don't.
 
Get all those great domains to link to the all that obscure stuff if that is what's important to you where you don't want to add biographical information.
 
Problem solved.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/7/12 12:58 PM
BTW... Your website is immaterial to my likes or dislikes.
 
You don't have a problem that is worthy of free advice from here IMHO.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/7/12 1:12 PM
Fathom, here's where you're wrong.

The primary theme of the emails sent to me are about obscure players, teams, and leagues. I have never received an email about Bobby Orr. Not one. Why not? Because everyone knows who Bobby Orr is and what he did. Yet I've received over a dozen emails about the Decatur storm of the obscure Continental Hockey League. Why? Because I'm the only site on the internet that cares about that team.

Did you read the other post, where I explained the impetus of my site? My passion is about the other players, not the famous ones. Prior to April 24, my site was returned by Google for those players, and the people coming loved it - plenty of praising emails such as "my kids never believed I played hockey...", etc.

The other players are what matter to me - not the NHL stuff, which everyone has and knows about. That's why I'm fighting this, because my site isn't a NHL player site. It's a historical hockey player site, 95% of which have never played in the NHL (that's an accurate number too - 150,000 players in my database, 7,000 NHL players across history). People value my site because it has data on 150,000 players, not biographies of 7,000.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/7/12 1:30 PM
You got emails from people that already know your website exists... GREAT JOB BRANDING!
 
You certainly don't get a single email from users that didn't know you existed. 
 
I realize you desire a 100% marketshare.
 
Up your game.
 
Reading your OP
 
I have made some changes since April 24.
  • I put rel=nofollow on the 10-12 links on my links page. I originally had put those links there organically, but this is the advice that people gave me.
  • I solved a bunch of duplicate title/description problems, such as when two players have the same name.
  • I included a canonical tag on all my player pages so that a slightly different query string would not make it appear as though there were two pages.
  • I completely blocked a development server which had some (but not many) pages indexed in Google - duplicating the page on my main server.
  • I removed some ads on player pages where the player doesn't have much of a career. Those pages have just 1 ad on them now.
  • I have put "noindex" on some pages that were automatically generated, such as a team that has no player information on it, etc.
  • I have asked some forum sites where my site is linked like a blogroll to remove the links. That had resulted in profiles such as 25,000 links to a single page on my site.
  • I have a widget-type javascript tool where someone can embed stats from my site on their own site - with links to my site - I changed all links to rel=nofollow, even though even Danny Sullivan himself doesn't do this on his widgets. The tool was never meant to try and capture pagerank, the links are there to get people to click them and visit my own site.
Where in the list did you up the anti for the obscurity in your website?
 
That's the problem you need to solve.
 
You don't want more text... no problem... how do you get others to link to these obscurities?
 
The answer IS NOT NO TEXT, VALUE ADD & NO LINKS!
Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/7/12 1:33 PM

On Tuesday, August 7, 2012 5:30:10 PM UTC-3, fathom wrote:

The other players are what matter to me - not the NHL stuff, which everyone has and knows about. That's why I'm fighting this, because my site isn't a NHL player site. It's a historical hockey player site, 95% of which have never played in the NHL (that's an accurate number too - 150,000 players in my database, 7,000 NHL players across history). People value my site because it has data on 150,000 players, not biographies of 7,000.
 
Here's a disconnect.
 
You're not about NHL or NHL clubs or star players... but that's where your link juice is coming from.
 
So your authority status IS about NHL or NHL clubs or star players.
 
But you could alway decline the links.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) RalphSlate 8/7/12 1:59 PM
Google used to rank me for the obscure players. People would Google their grandparents, their friends, themselves, and they would arrive at my site. They would be grateful that my site existed.  For those players, the entry on hockeydb.com with their obscure playing history (as opposed to no data available anywhere else) was the value add. It remains the value add.

When I analyzed my logs from 4/23, something like 60% of the referrals were to unique players. In other words, on 4/23, if I had 10,000 visitors from Google, 6,000 were to 6,000 different player pages on the long tail, and at the top of the tail, I maybe had 500 different visitors to a player who was in the news, 400 to the next most noteworthy player, etc. But the tail was very, very long. It remains just as long - but now just knocked down in scale by about 50% to 80%.

People don't link to a single obscure players. They mainly link to the main page. They describe the site as "amazing" - not because it contains the stats of NHL players, which has been done for years, but because it contains the stats of all notable players, which has never been done before.

You just don't understand long tail, do you?
Re: AngryPenguin: looking for Penguin advise (links or otherwise) Geminineil 8/7/12 2:33 PM
This may not be related to your problems but...
 
1. You use different 'spacers' in your Title tags... and you should try to be consistent...

<title>Hockeydb.com -- Internet Hockey Database - Statistics, Logos, and Trading Cards</title>

2. Also, using your domain name is in my mind not the best use of characters... virtually ANY search term with your domain name in would find your website...  maybe having your domain name might help with click-through rate... or not..have you done research on that? Most people would suggets using the title to represent the content of the webpage... and you could possibly increase your search net with other words in the title tag than the domain name. Just a thought.
 
 
 

Re: AngryPenguin: looking for Penguin advise (links or otherwise) fathom 8/7/12 2:54 PM

On Tuesday, August 7, 2012 5:59:51 PM UTC-3, RalphSlate wrote:
You just don't understand long tail, do you?
 
But you understand perfectly...
 

On Tuesday, August 7, 2012 2:48:26 PM UTC-3, RalphSlate wrote:

I wonder if that is part of the problem here - if my site is being classified as a "data aggregated site"

I'm not sure what you're idea of "data aggregated site" is but add the word "data" to your query... and guess what? ....you're #1. 
 
You're a data site... you're not a longtail information portal site and you clearly do not want to be a information portal.
 
While is great to say "I use to rank for"... but since that time PANDA 3,5, 3.6, 3.7, 3.8, 3.9 have come along and PENGUIN 1.0 & 1.1... so the "I use to..." is history.
Re: AngryPenguin: looking for Penguin advise (links or otherwise) StevieD_Web 8/7/12 3:23 PM
based upon what you stated, your pre 1992 data is unique or at least represents significant effort to compile.


Post 1992, you are an aggregator.

So be it.  Others are too.

If you read this forum enough, you should realize there is a large group of content aggregators who are having similar issues.  Those sites have few if any ads, very seldom do they have any rich and compelling content and once in a blue moon the sites are even technically correct.  Realtors is their name and duplicate content is their poison simply because dozens upon dozens of sites in a single market share the same MLS content feeds.  In some markets each Realtor working for a local or regional firm will have their own individual sites.  Sometimes there are 1000 or more sites sharing and republishing the same content feed.

(somebody is going to be #1 and the rest are going to be very unhappy)

The duplicate content issues are going to weigh down anything good the individual site may do.  My solution for them is the same that I offer you..... Minimize the ads if they exist (for you they do), fix any technical errors that may exist and develop significant qualities of rich, compelling content to dilute the bad (duplicate content) issues of the site.


Free2Write 8/7/12 8:10 PM <This message has been deleted.>
More topics »