Categories: Crawling, indexing & ranking :

Website categorized as pure spam

Showing 1-64 of 64 messages
Website categorized as pure spam Susanna Siebert 10/28/13 1:27 PM
I've read the FAQs and searched the help center. 

We just recently released microbialgenomics.org and I tried to get the website indexed with Google. Today I saw that it has been categorized at pure spam.

We created microbialgenomics.org as a website for our group to highlight our research work. We wanted it to be a comprehensive site where people can see the projects we work on, the publications we have published, and the microbial organisms we have sequenced. We wanted all of this content to be connected so that people can, for example, see what publications we have for a given organism or what organism is studied in a given project.

In regards, to the pure spam categorization I have read the guidelines, searched the internet and talked to a few people. We have some idea on what might have triggered the pure spam designation. 

1) We copied some content from Pubmed for the publications. We wanted to list all the publications that people in our group have been involved on the website. It's a substantial amount of publications. For each publication we show the abstract and link to Pubmed for more information. Yes, we're duplicating content (the abstract) but the abstract is the abstract and can't be changed. Plus, we (the authors) originally wrote the abstracts so, technically, Pubmed is copying from us. We are adding additional value to the pages by linking each publication to the biography of the authors on our page, as well as the related projects and organisms.

2) The project landing page (http://microbialgenomics.org/projects/) lists the diseases that we research. A colleague pointed out that this might get interpreted as spam. In addition the the pages that each disease link to could be categorized as shallow content as it's basically a list of the projects that fit this disease (e.g., http://microbialgenomics.org/disease/acne/ lists the two projects that research acne).

3) Maybe our site got hacked. The site isn't showing any signs of hacking. I compared the files with a backup we made before it went live and I didn't see anything amiss. The access logs also don't show any suspicious activities. However, I'm not an expert in that matter so it's entirely possible that I'm missing something.

These are our guesses. I would greatly appreciate it if I could get some pointers from people that are more experienced with Google's indexing mechanism as to what might have prompted Google to characterize this page as spam and what I can do to fix this issue. I've updated the robots.txt file and sitemap to exclude the publication pages from being indexed in the hopes of that helping with the first issue. I'm not sure if that would help in the website not being categorized as spam. 

This is a legitimate site. We're not trying to sell anything. There are no ads, referral links or similar on these pages. We're simply trying to present our research in a comprehensive manner. 

Thanks,
Susanna


Re: Website categorized as pure spam ets 10/28/13 1:36 PM
Did you buy a domain with a bad history? How long has it been operational? I see it registered on these dates:
Domain Name:MICROBIALGENOMICS.ORG
Created On:07-Mar-2013 16:01:42 UTC
Last Updated On:07-May-2013 03:46:50 UTC
Expiration Date:07-Mar-2015 16:01:42 UTC

However, if I look at the Wayback Machine, I see these previous incarnations of the domain:

If you're certain you've done nothing that would merit a "pure spam" designation, submit a reconsideration request and explain to Google about your new ownership. Explain fully who you are and detail your academic affiliation with "Washington University in St. Louis" so they get the full picture.

"In addition, if you recently purchased a domain that you think may have violated our guidelines before you owned it, you can use the reconsideration request form to let us know that you recently acquired the site and that it now adheres to the guidelines."
Re: Website categorized as pure spam Susanna Siebert 10/28/13 1:59 PM
Thank you so much. I was wrecking my brain trying to come up with a reason that our website was listed as spam. This must be it. I will put in a reconsideration request.
Re: Website categorized as pure spam ets 10/28/13 2:02 PM
It should take no more than a couple of weeks, maybe less. If they decline it, please come back and we'll take another look for you.
ets 10/28/13 2:04 PM <This message has been deleted.>
Re: Website categorized as pure spam Redleg x3 10/28/13 2:17 PM
hey Susanna, not a big deal but there are some URLs floating around like

www . microbialgenomics . org /tag/welsh-corgi
www . microbialgenomics . org /tag/satin-bedding

right now if I request any of those old URLs they first 301 redirect to the non-www version of the URL and then return a 404.  Generally Google would prefer just to see the 404 without the redirect for files that do not exist.  Again not a big issue but something you should look at at some point.

welsh-corgi, decorative-pillows, satin-bedding  Must have been an interesting site before you acquired it.
Re: Website categorized as pure spam Susanna Siebert 11/4/13 12:12 PM
Hi all,

I received a message that our site is still violating the Google Webmaster Tools. Unfortunately the message is totally unspecific as to what the problem is:

"We received a reconsideration request from a site owner for http://microbialgenomics.org/.

We've reviewed your site and we believe that http://microbialgenomics.org/ still violates our quality guidelines. In order to preserve the quality of our search engine, pages from http://microbialgenomics.org/ may not appear or may not rank as highly in Google's search results, or may otherwise be considered to be less trustworthy than sites which follow the quality guidelines.

For more specific information about the status of your site, visit the Manual Actions page in Webmaster Tools. From there, you may request reconsideration of your site again when you believe your site no longer violates the quality guidelines.

If you have additional questions about how to resolve this issue, please see our Webmaster Help Forum."


Redleg x3 already mentioned the 301 redirect. I'm actually not quite sure how to fix it, so if anybody has any pointers how to do that in Wordpress, that would be great. 

It would be great if you guys could have another look at my site to determine what is wrong with it. I mention a few concerns in my original message so that would be a good place to start.


Thank you,

Susanna

Re: Website categorized as pure spam ets 11/4/13 12:33 PM
Did you do this:

For more specific information about the status of your site, visit the Manual Actions page in Webmaster Tools. From there, you may request reconsideration of your site again when you believe your site no longer violates the quality guidelines.

Do you see anything further there... or just "pure spam"?








Re: Website categorized as pure spam Suzanneh 11/4/13 12:34 PM
>>3) Maybe our site got hacked.

Did you check Security Issues in Webmaster Tools?

Suzanne
Re: Website categorized as pure spam Ben Griffiths 11/4/13 12:34 PM
The fact Redleg had a look probably means you can rule out a hack, or they'd have picked up on it. But yeah new-and-improved WMT has a specific section for that.

I'd say it's autogenerated/scraped content, but that is a guess. Very harsh penalty if so, IMO; even if the site has little merit from Google's POV that's what the algorithm is for (certainly a lot of the site is very thin eg http://microbialgenomics.org/disease/nonalcoholic-fatty-liver-disease-nafld/ )




Re: Website categorized as pure spam Susanna Siebert 11/4/13 12:43 PM
Just "pure spam"
Re: Website categorized as pure spam ets 11/4/13 12:43 PM
That's a fair guess. Let's investigate that:

Susanna....  How is the site assembled? Where does the content come from? Are pages like this...


...pulled in by some automatic process from things like pubmed?: 

My guess was maybe some connection between the medical content and, say, a Viagra/cialis/drug type spam hack - but with the site totally deindexed, it's hard to know. The bing cache is pretty empty.
Re: Website categorized as pure spam Susanna Siebert 11/4/13 12:46 PM
Ben, do you think that removing these thin sites would make a difference?
Re: Website categorized as pure spam Susanna Siebert 11/4/13 12:46 PM
Suzanne, I checked that and it doesn't pick up on anything.
Re: Website categorized as pure spam Ashley 11/4/13 12:50 PM
when you get it taken care of - look into latency. That nearly-empty page Ben linked to above took forever to load for me. 
And make the titles/images clickable so I can get in


Curious about why you're disallowing Google to crawl your people profiles? 


Definitely still an issue, and not just for Google. Check out what Bing has indexed

They do seem to be returning 404s properly now - so you may need some patience. But that doesn't totally explain why none of your content is otherwise indexed other than Google must see something pretty bad. 




Why would I go to your site instead of http://genome.wustl.edu/ which seems to be the official site?

Same content, and indexed well on the main domain.

Your strategy of another site needs some serious rethinking. 
Re: Website categorized as pure spam Susanna Siebert 11/4/13 12:52 PM
ets, yes Google thinking that we're victim of a drug hack was our concern too. The publication information are pulled in through a semi-automated process. We pull down the information through the pubmed API, put it in a spreadsheet, and upload it to our website via a csv importer. So the information is not pulled from pubmed in real-time but rather stored separately in our database.
Re: Website categorized as pure spam Ben Griffiths 11/4/13 12:56 PM
Ben, do you think that removing these thin sites would make a difference?

Not likely, possible though. NOINDEX would be my preferred route, but "Pure spam" is very very unlikely to be just that you have a bunch of thin pages.

I looked for ExitJunction code, old spam URLs not returning 404s, all sorts, nothing. It could be that the old version got hit with a Spam penalty and whatever secret tools the Spam Team uses is picking up something that jibes with the original cause, but explicitly telling them you're the new owner should overcome that you'd think.

I'd imagine someone somewhere is going to bump this up the ladder if none of us can put our fingers on it - although without your credentials it would just be A N Other scraped site, so maybe not.
Re: Website categorized as pure spam Susanna Siebert 11/4/13 12:59 PM

On Monday, November 4, 2013 2:50:52 PM UTC-6, Ashley wrote:
when you get it taken care of - look into latency. That nearly-empty page Ben linked to above took forever to load for me. 
I know that latency, unfortunately, seems to be an issue for us but I'm not sure what causes it. 
 
And make the titles/images clickable so I can get in
Can you tell me which titles/images are not clickable for you? 


Curious about why you're disallowing Google to crawl your people profiles? 
Unfortunately, this had to be done because Wordpress's custom post types work in weird ways. We wanted to be able to link our collaborators to our projects/publications etc but we didn't want to have individual posts for them. Unfortunately, you can't have the first without having the latter. So we made posts but we don't want people to actually go to them. It's certainly not ideal. 


Definitely still an issue, and not just for Google. Check out what Bing has indexed
How can I get rid of those? Disallow? 


They do seem to be returning 404s properly now - so you may need some patience. But that doesn't totally explain why none of your content is otherwise indexed other than Google must see something pretty bad. 




Why would I go to your site instead of http://genome.wustl.edu/ which seems to be the official site?

Same content, and indexed well on the main domain.

Your strategy of another site needs some serious rethinking. 
The main site does not represent our work adequately. We needed our own website that focuses on microbial work only, to fulfill outside requirements. 
Re: Website categorized as pure spam ets 11/4/13 1:05 PM
Can you please do a fetch as Google on your home page and paste the entire thing in here, headers and code? (gosh I am going to be popular) I'm just thinking "rule out IP-address based cloaking/hack").
Re: Website categorized as pure spam Susanna Siebert 11/4/13 1:10 PM


On Monday, November 4, 2013 3:05:07 PM UTC-6, ets wrote:
Can you please do a fetch as Google on your home page and paste the entire thing in here, headers and code? (gosh I am going to be popular) I'm just thinking "rule out IP-address based cloaking/hack").

Fetch as Google

This is how Googlebot fetched the page.

URL: http://microbialgenomics.org/

Date: Monday, November 4, 2013 at 1:09:51 PM PST

Googlebot Type: Web

Download Time (in milliseconds): 1870

HTTP/1.1 200 OK
Date: Mon, 04 Nov 2013 21:09:52 GMT
Server: Apache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
X-Pingback: http://microbialgenomics.org/xmlrpc.php
Set-Cookie: PHPSESSID=eighllrgjfjubbf4fdo0ih8kq2; path=/
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 5212
Keep-Alive: timeout=10, max=30
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8



<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8" />
<title>Center for Infectious Disease Genomics | at Washington University in St. Louis</title>
<link rel="profile" href="http://gmpg.org/xfn/11" />
<link rel="stylesheet" type="text/css" media="all" href="http://microbialgenomics.org/wp-content/themes/twentyten_layout4/style.css" />
<link rel="pingback" href="http://microbialgenomics.org/xmlrpc.php" />
<link rel="alternate" type="application/rss+xml" title="Center for Infectious Disease Genomics &raquo; Feed" href="http://microbialgenomics.org/feed/" />
<link rel="alternate" type="application/rss+xml" title="Center for Infectious Disease Genomics &raquo; Comments Feed" href="http://microbialgenomics.org/comments/feed/" />
<link rel='stylesheet' id='cntctfrmStylesheet-css'  href='http://microbialgenomics.org/wp-content/plugins/contact-form-plugin/css/style.css?ver=3.5.1' type='text/css' media='all' />
<link rel='stylesheet' id='wp-lightbox-2.min.css-css'  href='http://microbialgenomics.org/wp-content/plugins/wp-lightbox-2/styles/lightbox.min.css?ver=1.3.4' type='text/css' media='all' />
<link rel='stylesheet' id='tablepress-default-css'  href='http://microbialgenomics.org/wp-content/plugins/tablepress/css/default.min.css?ver=1.0' type='text/css' media='all' />
<link rel='stylesheet' id='taxonomy-image-plugin-public-css'  href='http://microbialgenomics.org/wp-content/plugins/taxonomy-images/style.css?ver=0.7.3' type='text/css' media='screen' />
<script type='text/javascript' src='http://microbialgenomics.org/wp-includes/js/jquery/jquery.js?ver=1.8.3'></script>
<script type='text/javascript' src='http://microbialgenomics.org/wp-includes/js/comment-reply.min.js?ver=3.5.1'></script>
<script type='text/javascript' src='http://www.google.com/jsapi?ver=3.5.1'></script>
<script type='text/javascript' src='http://microbialgenomics.org/wp-content/themes/twentyten_layout4/oneSimpleTablePaging-1.0.js?ver=1.0'></script>
<link rel="EditURI" type="application/rsd+xml" title="RSD" href="http://microbialgenomics.org/xmlrpc.php?rsd" />
<link rel="wlwmanifest" type="application/wlwmanifest+xml" href="http://microbialgenomics.org/wp-includes/wlwmanifest.xml" /> 
<link rel='prev' title='Sample Page' href='http://microbialgenomics.org/sample-page/' />
<link rel='next' title='Resources &amp; Environment' href='http://microbialgenomics.org/resources-environment/' />
<meta name="generator" content="WordPress 3.5.1" />
<link rel='canonical' href='http://microbialgenomics.org/' />
<style type="text/css" id="custom-background-css">
body.custom-background { background-color: #f1f1f1; }
</style>
</head>

<body class="home page page-id-1757 page-template page-template-main_page-php custom-background">
<div id="wrapper" class="hfeed">
	<div id="header">
		<div id="masthead">
			<div id="branding" role="banner">
								<h1 id="site-title"
          style="background: rgb(40, 40, 40);
                 width: 940px;
                 height: 95px;
                 margin: 0px;
                 ">
					<div style="position: relative; top: 15px; left: 20px; width: 600px">
						<a href="http://microbialgenomics.org/" title="Center for Infectious Disease Genomics" rel="home" style="color: rgb(141, 198, 63);">Center for Infectious Disease Genomics</a>
            <p style="font-size:12px; color: rgb(218,218,218); padding-top: 0px; margin-top: 0px">at Washington University in St. Louis</p>
          </div>
<a href="http://www.genome.wustl.edu" target="_blank"><img src="/wp-content/uploads/2013/04/tgi_logo_shadow1.png" style="border: none; position: relative; top: -85px; left: 720px; width:200px"></a>				</h1>
				

										<img src="http://microbialgenomics.org/wp-content/uploads/2013/04/cropped-header-940-x-98-Staphylococcus_on_catheter.png" width="940" height="100" alt="" />
								</div><!-- #branding -->

			<div id="access" role="navigation">
			  				<div class="skip-link screen-reader-text"><a href="#content" title="Skip to content">Skip to content</a></div>
								<div class="menu-header"><ul id="menu-header" class="menu"><li id="menu-item-2907" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2907"><a href="http://microbialgenomics.org/our-mission/">Our Mission</a>
<ul class="sub-menu">
	<li id="menu-item-2832" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2832"><a href="http://microbialgenomics.org/resources-environment/">Resources &#038; Environment</a></li>
	<li id="menu-item-2906" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2906"><a href="http://microbialgenomics.org/contact-us/">Contact Us</a></li>
</ul>
</li>
<li id="menu-item-14" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-14"><a href="/people">People</a>
<ul class="sub-menu">
	<li id="menu-item-91" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-91"><a href="/group/lead-investigator/">Lead Investigators</a></li>
	<li id="menu-item-90" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-90"><a href="/group/microbial-analysis/">Microbial Analysis</a></li>
	<li id="menu-item-93" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-93"><a href="/group/research-laboratory/">Research Laboratory</a></li>
	<li id="menu-item-1738" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1738"><a href="/group/co-investigator">Co-Investigators</a></li>
	<li id="menu-item-1739" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1739"><a href="/group/collaborator">Collaborators</a></li>
	<li id="menu-item-2823" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-2823"><a href="/group/strain-collaborator">Strain Collaborators</a></li>
	<li id="menu-item-1866" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1866"><a href="/group/alumni">Alumni</a></li>
</ul>
</li>
<li id="menu-item-17" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-17"><a href="/projects">Projects</a></li>
<li id="menu-item-16" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-16"><a href="/organisms">Microbial Genomes</a></li>
<li id="menu-item-15" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-15"><a href="/publications">Publications</a></li>
<li id="menu-item-2831" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2831"><a href="http://microbialgenomics.org/methods-protocols/">Methods &#038; Protocols</a></li>
<li id="menu-item-5440" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-5440"><a href="http://microbialgenomics.org/media/">Media</a>
<ul class="sub-menu">
	<li id="menu-item-1740" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1740"><a href="/news">In The News</a></li>
	<li id="menu-item-2835" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-2835"><a href="/press-releases">Press Releases</a></li>
	<li id="menu-item-2836" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-2836"><a target="_blank" href="http://www.youtube.com/user/cidgatwashu">Videos</a></li>
</ul>
</li>
<li id="menu-item-5427" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-5427"><a target="_blank" href="http://genome.wustl.edu/outreach/">Outreach</a></li>
</ul></div>			</div><!-- #access -->
		</div><!-- #masthead -->
	</div><!-- #header -->

	<div id="main">
<div id="container">
  <div id="content" role="main">
    <h1 style="border-bottom: 1px solid rgb(141, 198, 63);">Home</h1>
    <div class="entry-content">
    	The Center for Infectious Disease Genomics is a consortium of clinicians and basic scientists using genomic approaches to understand problems in medical microbiology and improve diagnosis and treatment through applied genomics. The CIDG encompasses a large number of projects in the hospital, public and global health arenas. Projects focus on bacterial, viral, fungal, and parasitic agents, as well as the microbiome, the microbial content of the human body. The role of the host genotype is also studied. State of the art genomic procedures, with an emphasis on next generation sequencing, as well as advanced bioinformatic and statistical analytic methods are routinely employed. A major goal of the CIDG is to improve medical practice through the application of high-throughput DNA sequencing and associated computational methods in the clinic.
    </div>
    <h2 style="border-bottom: 1px solid rgb(141, 198, 63);">Microbes of the Month</h2><div style="width: 100%; float: left; margin-bottom: 10px;"><img width="100" height="100" src="http://microbialgenomics.org/wp-content/uploads/2013/05/MRSA-150x150.jpg" class="alignleft wp-post-image" alt="Colorized scanning electron micrograph (SEM) depicts a grouping of methicillin resistant Staphylococcus aureus (MRSA) bacteria." /><div style="float: right; width: 80%"><h1 class="entry-title" style="border-bottom: 1px solid #ddd;"><em>Staphylococcus aureus</em></h1><em>Staphylococcus aureus</em> is a Gram-positive, non-motile bacteria that forms grape-like clusters.   <em>S. aureus</em> form bunches, because they divide along two planes, while <em>Staphylococcus</em> that form chains, divide along only one plane.  <em>S. aureus</em> are facilitative anearobes.  <em>S. aureus</em> is a very important pathogen because of its acquisition of Methicillin and vancomycin resistance.<span style="float:right;"> <a href="http://microbialgenomics.org/organisms/staphylococcus-aureus/">Continue reading <span class="meta-nav">&rarr;</span></a></span></div></div><div style="width: 100%; float: left; margin-bottom: 10px;"><img width="100" height="100" src="http://microbialgenomics.org/wp-content/uploads/2013/04/vibrio_harveyi-150x150.png" class="alignleft wp-post-image" alt="Vibrio harveyi" /><div style="float: right; width: 80%"><h1 class="entry-title" style="border-bottom: 1px solid #ddd;"><em>Vibrio harveyi</em></h1><em>Vibrio harveyi</em> is a species of Gram-negative, bioluminescent, marine bacteria in the genus <em>Vibrio</em>. Groups of <em>V. harveyi</em> bacteria communicate via quorum sensing to coordinate the production of bioluminescense and virulence factors. <em>V. harveyi</em> is responsible for luminous vibriosis, a disease that affects commercially farmed penaeid prawns.<span style="float:right;"> <a href="http://microbialgenomics.org/organisms/vibrio-harveyi/">Continue reading <span class="meta-nav">&rarr;</span></a></span></div></div>    <div style="clear: both"></div>
  </div><!-- #content -->
</div><!-- #container -->


		<div id="primary" class="widget-area" role="complementary">
			<ul class="xoxo">
      <li id="search-2" class="widget-container widget_search"><form role="search" method="get" id="searchform" action="http://microbialgenomics.org/" >
	<div><label class="screen-reader-text" for="s">Search for:</label>
	<input type="text" value="" name="s" id="s" />
	<input type="submit" id="searchsubmit" value="Search" />
	</div>
	</form></li><li id="latestnewswidget-2" class="widget-container LatestNewsWidget"><h4 class="widget-title"><h3 style="border-bottom: 1px solid rgb(141, 198, 63); margin-bottom: 5px;">In The News</h3></h4><div style="width: 100%; border-bottom: 1px solid #ddd; margin-bottom: 5px;"><a href='http://www.sciencenews.org/view/generic/id/351638/description/Genetic_test_fingers_viral_bacterial_infections'  target='_blank'>Genetic test fingers viral, bacterial infections</a><br>Science News<span style="float: right;">07/16/13</span></div><div style="clear: both"></div><div style="width: 100%; border-bottom: 1px solid #ddd; margin-bottom: 5px;"><a href='http://www.abc.net.au/science/articles/2013/07/16/3803445.htm'  target='_blank'>Gene activity could reveal cause of fevers</a><br>ABC News<span style="float: right;">07/16/13</span></div><div style="clear: both"></div><div style="width: 100%; border-bottom: 1px solid #ddd; margin-bottom: 5px;"><a href='http://news.stlpublicradio.org/post/wash-u-center-aims-increase-collaboration-global-health'  target='_blank'>Wash U Center Aims To Increase Collaboration On Global Health</a><br>St. Louis Public Radio<span style="float: right;">04/12/13</span></div><div style="clear: both"></div><span style="float: right; margin-bottom: 18px;"><a href="/news">Browse/Search News</a></span></li><li id="latestarticleswidget-2" class="widget-container LatestArticlesWidget"><h4 class="widget-title"><h3 style="border-bottom: 1px solid rgb(141, 198, 63); margin-bottom: 5px;">Recent Press Releases</h3></h4><div style="width: 100%; border-bottom: 1px solid #ddd; margin-bottom: 5px;"><a href='http://news.wustl.edu/news/Pages/25610.aspx'  target='_blank'>In children with fever, gene profiling distinguishes bacterial from viral<br />infections</a><br><div style="width: 100%; text-align: right;">07/15/13</div></div><div style="clear: both"></div><div style="width: 100%; border-bottom: 1px solid #ddd; margin-bottom: 5px;"><a href='http://news.wustl.edu/news/Pages/25589.aspx'  target='_blank'>Powderly named director of WUSTL’s Institute for Public Health</a><br><div style="width: 100%; text-align: right;">07/02/13</div></div><div style="clear: both"></div><div style="width: 100%; border-bottom: 1px solid #ddd; margin-bottom: 5px;"><a href='http://news.wustl.edu/news/Pages/25462.aspx'  target='_blank'>Better detection for elephantiasis worm infection</a><br><div style="width: 100%; text-align: right;">05/20/13</div></div><div style="clear: both"></div><span style="float: right; margin-bottom: 18px;"><a href="/press-releases">Browse/Search Press Releases</a></span></li><li id="latestpublicationswidget-2" class="widget-container LatestPublicationsWidget"><h4 class="widget-title"><h3 style="border-bottom: 1px solid rgb(141, 198, 63); margin-bottom: 5px;">Recent Publications</h3></h4><div style="width: 100%; border-bottom: 1px solid #ddd; margin-bottom: 5px;"><a href='http://microbialgenomics.org/publications/f11r-is-a-novel-monocyte-prognostic-biomarker-for-malignant-glioma/'>F11R Is a Novel Monocyte Prognostic Biomarker for Malignant Glioma.</a><br>PLoS One<span style="float: right;">10/11/2013</span></div><div style="clear: both"></div><div style="width: 100%; border-bottom: 1px solid #ddd; margin-bottom: 5px;"><a href='http://microbialgenomics.org/publications/the-cancer-genome-atlas-pan-cancer-analysis-project/'>The Cancer Genome Atlas Pan-Cancer analysis project.</a><br>Nat Genet<span style="float: right;">09/26/2013</span></div><div style="clear: both"></div><div style="width: 100%; border-bottom: 1px solid #ddd; margin-bottom: 5px;"><a href='http://microbialgenomics.org/publications/a-novel-intronic-snp-in-the-myosin-heavy-polypeptide-4-gene-is-responsible-for-the-mini-muscle-phenotype-characterized-by-major-reduction-in-hindlimb-muscle-mass-in-mice/'>A Novel Intronic SNP in the Myosin heavy polypeptide 4 Gene Is Responsible for the Mini-Muscle Phenotype Characterized by Major Reduction in Hindlimb Muscle Mass in Mice.</a><br>Genetics<span style="float: right;">09/20/2013</span></div><div style="clear: both"></div><span style="float: right; margin-bottom: 18px;"><a href="/publications">Browse/Search Publications</a></span></li>			</ul>
		</div><!-- #primary .widget-area -->	</div><!-- #main -->

	<div id="footer" role="contentinfo">
		<div id="colophon">



			<div id="footer-widget-area" role="complementary">

				<div id="first" class="widget-area">
					<ul class="xoxo">
						<li id="text-2" class="widget-container widget_text">			<div class="textwidget"><a href="https://twitter.com/CIDGatWashU" target="_blank"><img src="/wp-content/uploads/2013/09/twitter.jpg" /></a>
<a href="https://plus.google.com/116424703731846464420" target="_blank"><img src="/wp-content/uploads/2013/09/google_plus.png" /></a>
<a href="http://www.youtube.com/user/CIDGatWashU" target="_blank"><img src="/wp-content/uploads/2013/09/youtube.png" /></a></div>
		</li>					</ul>
				</div><!-- #first .widget-area -->

				<div id="second" class="widget-area">
					<ul class="xoxo">
						<li id="text-4" class="widget-container widget_text">			<div class="textwidget"><div style="width: 500px;">Copyright © 1993-2013 Washington University in St. Louis. All rights reserved.</div></div>
		</li>					</ul>
				</div><!-- #second .widget-area -->

				<div id="third" class="widget-area">
					<ul class="xoxo">
						<li id="text-3" class="widget-container widget_text">			<div class="textwidget"></div>
		</li>					</ul>
				</div><!-- #third .widget-area -->

				<div id="fourth" class="widget-area">
					<ul class="xoxo">
						<li id="text-5" class="widget-container widget_text">			<div class="textwidget"><div style="width:260px;"><a href="http://medschool.wustl.edu/" target="_blank">
<img src="/wp-content/uploads/2013/05/wustl-med-school-logo.png" alt="" style="width:260px; height: 37px; position: relative; left: -40px;"/>
</a></div>
</div>
		</li>					</ul>
				</div><!-- #fourth .widget-area -->

			</div><!-- #footer-widget-area -->





		</div><!-- #colophon -->
	</div><!-- #footer -->

</div><!-- #wrapper -->

<script type='text/javascript'>
/* <![CDATA[ */
var JQLBSettings = {"fitToScreen":"0","resizeSpeed":"400","displayDownloadLink":"0","navbarOnTop":"0","loopImages":"","resizeCenter":"","marginSize":"0","linkTarget":"_self","help":"","prevLinkTitle":"previous image","nextLinkTitle":"next image","prevLinkText":"\u00ab Previous","nextLinkText":"Next \u00bb","closeTitle":"close image gallery","image":"Image ","of":" of ","download":"Download"};
/* ]]> */
</script>
<script type='text/javascript' src='http://microbialgenomics.org/wp-content/plugins/wp-lightbox-2/wp-lightbox-2.min.js?ver=1.3.4.1'></script>
<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-44527620-1', 'microbialgenomics.org');
  ga('send', 'pageview');

</script>
</body>
</html>
Re: Website categorized as pure spam Ashley 11/4/13 1:10 PM
If you insist on having two sites, then you need to fill it with UNIQUE content. What percentage of the new website is totally unique? Because I'm not seeing anything..... 
Re: Website categorized as pure spam Ashley 11/4/13 1:11 PM
I'll do a few different header checks for various UAs/referrers and see if I find anything... 
Re: Website categorized as pure spam Ashley 11/4/13 1:12 PM
You are on Wordpress 3.5.1

Wordpress is on 3.7.1

You NEED to update otherwise you'll keep getting hacked. 
Re: Website categorized as pure spam Susanna Siebert 11/4/13 1:21 PM
Hi Ashley,

You're bringing up a valid point. Things that are totally unique are:
- Our Mission
- Resources & Environment and their subpages
- All the projects
- Most of the organisms
- The team biographies
- All the linking between the all of these different parts - this is an added value that may not be apparent to the outside observer but on this website we have everything on one page, all linked up so that collaborators, researchers etc can access everything about our work in one place.

Yes, publications and organisms are an issue. For publications especially since there are >1000 that were published by us and were imported but there is just no good way to present them in any other way. All this information is static. So unless we want to scrape all the publications, which would dramatically reduce the informational content of our site (i.e., the linking I talked about) I just don't know how to fix it. What do you guys suggest? I'm also interested in knowing how other sites can reproduce publication abstracts and it not being counted as duplicate content? I mean that kinda stuff is everywhere on research websites.

For organisms we can rewrite the content but even then I'm afraid that Google would count it as scraped. Do you guys think rewriting this content would help?

Thanks,
Susanna


Re: Website categorized as pure spam ets 11/4/13 1:23 PM
Well apart from what Ashley said, that reveals nothing.

Have you tried downloading your htaccess file from the server and looking in there for any suspicious redirect code? It is important to look at the version on the server not the one you upload (if you upload one).
Re: Website categorized as pure spam Susanna Siebert 11/4/13 1:23 PM
Ashley, we plan to update once we can make sure that our site will be staple and that all of our plugins will still work. 

You NEED to update otherwise you'll keep getting hacked. 
 
Are you saying that we definitely have been hacked? 
Re: Website categorized as pure spam ets 11/4/13 1:25 PM
We're trying to find out :)
Re: Website categorized as pure spam Ben Griffiths 11/4/13 1:28 PM
I ran a couple of malware scans fine, but it did pick up on the outdated wordpress.

For what it's worth, the site used to be an Amazon affiliate auto-gen site:


That would tie in with the penalty being for this

Little or No Original Content:
https://support.google.com/webmasters/answer/66361?hl=en
Re: Website categorized as pure spam Susanna Siebert 11/4/13 1:31 PM
I looked at it back when this first became an issue. I didn't find anything suspicious. Unfortunately, I have to leave work for today so I asked one of my colleagues, Michael, to have another look. He'll update this thread with his findings.



On Monday, November 4, 2013 3:23:26 PM UTC-6, ets wrote:
Well apart from what Ashley said, that reveals nothing.

Have you tried downloading your htaccess file from the server and looking in there for any suspicious redirect code? It is important to look at the version on the server not the one you upload (if you upload one).
Re: Website categorized as pure spam Susanna Siebert 11/4/13 1:33 PM
Ben,  that makes sense. How is this still affecting the site today since I made the reconsideration request?
Re: Website categorized as pure spam Ben Griffiths 11/4/13 1:35 PM
It's not, but given, as Ashley points out, the site is a near-duplicate of an authoritative source it could well be that the penalty is still relevant.
Re: Website categorized as pure spam ets 11/4/13 1:36 PM
Yes, but Susanna put in a RR last week saying she'd taken over the site from the cushion and pillow spammer who owned it before. 
Re: Website categorized as pure spam Lysis 11/4/13 1:44 PM
Bing doesn't like the site either.
Re: Website categorized as pure spam Ben Griffiths 11/4/13 1:45 PM
Yeah, to the extent that I was using Yandex to try and diagnose it. That only has 7 pages indexed too though.
Re: Website categorized as pure spam Michael Kiwala 11/4/13 2:01 PM
I just looked at the .htaccess file.  Nothing suspicious, just the typical WordPress boilerplate.
Re: Website categorized as pure spam ets 11/4/13 2:04 PM
So why are no search engines indexing it when it clearly has plenty of pages?

It's not as if Bing is particularly fussy about "pure spam" - ho ho - yet manages to index only the home page?

If if it were Google saying "The pages are thin/duplicate", it would still be abundantly indexed on Bing.
Re: Website categorized as pure spam Susanna S 11/4/13 4:40 PM
Good question. I did submit a sitemap to Bing. I now also submitted the main urls of the landing pages to be indexed. Do you guys think that the past content resulted in the robots not crawling the page?
Re: Website categorized as pure spam ets 11/4/13 11:52 PM
In the absence of any other ideas, as Ben suggested, you could try noindexing: perhaps noindex everything off /publications/ and put in another RR. Does it matter to you whether that subdirectory is indexed or not? The content is all on pubmed anyway. You could even remove the entire subdirectory from Google's index (or selectively noindex it for Googlebot):

Remove a page or site from Google's search results

When NOT to use the URL removal tool

However, IMHO, in cases such as this, the manual webspam team should actually say explicitly what is wrong. They should have some leeway. I appreciate that the idea is to be fair by treating everyone the same - but everyone is not the same. You are a university department researching infectious diseases, not a bunch of Pokemon spammers in Jakarta. I think a steer is appropriate in this case, please, Google?
Re: Website categorized as pure spam Susanna Siebert 11/5/13 7:41 AM
Hi ets,

Thanks for the suggestion. I really appreciate it.

Am I understanding this correctly that if I disallow the pages with duplicate content in my robots.txt and block the urls in the webmaster tools, that Google will not take those pages into consideration when making their "pure spam" determination? We don't really care for all the publications to be indexed. The only page we care about is the landing page at microbialgenomic.org/publications/. Similarly for organisms. Do you think it would be a good idea to do the same there? What about the thin pages? 

Thanks,
Susanna
Re: Website categorized as pure spam ets 11/5/13 8:02 AM
What's the consensus here? Should Susanna "noindex" her thinner stuff? 

I would say "yes" on the grounds that it will never help the site to have an exact duplicate of something as authoritative as pubmed.
Re: Website categorized as pure spam Ashley 11/5/13 8:15 AM
I don't know - I'm honestly still confused as to why there should be two separate sites! If it's the same org/school - why are you not creating a single website? That is so much less confusing for bots and users. 


Seems to me like there's some serious hangup on the actual domain - but even if that is cleaned up I'm still struggling with the decision to go forward with these two sites and a large amount of duplication and thin content. 
Re: Website categorized as pure spam Susanna Siebert 11/5/13 8:55 AM
It wasn't our choice. We have to have our own website. It's a requirement for our work. Unfortunately, I can't go into detail as to why it's a requirement. You will just have to trust me that it is.

In the context of the existing genome.wustl.edu website our work there is not presented adequately. Let me give you an example where the true value of our site is coming in. Let's say someone is looking for information on research concerning Daptomycin-resistant Enterococci and they end up on our Daptomycin-resistant Enterococci project site (http://microbialgenomics.org/projects/daptomycin-resistant-enterococci/). They can see a whole bunch of information about the project, who our collaborators are, what organisms are involved in this research, and what publications where written for this project. If they now want to know more about one of the disease agents, lets say E. faecalis, they don't need to google it. They can just click on the link (http://microbialgenomics.org/organisms/enterococcus-faecalis/) and learn all about it. Here they not only see a comprehensive description of this microbe, they see all of our other projects that research this microbe, all of our collaborators, and all of the publications that we have written about it. In addition if they want to know what exact strains we have sequenced, this information can also be found here. And instead of having to go to the NCBI website to search for those strains, we provide links to all the important information about the strains. You see the added value really is that we have pulled in all the information about our work from all different places to be accessible in one place for our collaborators and other researchers; something that has not been done elsewhere. In addition, our content in regards to microbial research goes well beyond what has been provided at genome.wustl.edu. I'm sorry that you don't agree with us that we provide useful content to our users but I disagree with your assessment -- our site is NOT a duplicate of genome.wustl.edu.

It looks like the biggest hangup are the publications. Would people/Google feel better about our site if we didn't have individual publication pages and just linked to pubmed directly? However, then I wonder why it's ok for genome.wustl.edu to basically replicate pubmed? What's the difference?

Secondly, it seems like the organism pages may be a problem. Although I described above all the additional information we present on these sites, would it be helpful if we rewrote all the descriptions instead of citing other sources?


Re: Website categorized as pure spam ets 11/5/13 9:09 AM
Bing is doing something different today: it is actively noting that is suppressing results from your site:

The "some results have been removed" link explains the grounds for removing results, which include illegal content and web spam.
Re: Website categorized as pure spam Susanna Siebert 11/5/13 10:16 AM
Could it also be that this shows up because I put in a request yesterday to have the outdated pages from the previous owner removed?
Re: Website categorized as pure spam Free2Write 11/5/13 10:35 AM
Re: Website categorized as pure spam Suzanneh 11/5/13 11:46 AM
>>Could it also be that this shows up because I put in a request yesterday to have the outdated pages from the previous owner removed?

What do you mean?  How did you do that? Did that have to do with Bing or Google?

Strange that Bing is shutting you out, too -- you sure you don't have any spam on the site?  Or maybe Bing is checking Google's results and doing a "If not in Google, don't show in Bing."  ;-)  (I say that last one mostly in jest.  But you never know...)

Suzanne


Re: Website categorized as pure spam Suzanneh 11/5/13 11:51 AM
There isn't any code on the site that redirects users if they hit the back button?  Something like Exit Junction?

Suzanne
Re: Website categorized as pure spam Susanna Siebert 11/5/13 11:53 AM
As of yesterday Bing still had some spammy urls from the previous owner indexed (e.g. www . microbialgenomics . org /tag/welsh-corgi, www . microbialgenomics . org /tag/satin-bedding). I used the Bing Webmaster Tool's Block URL functionality to block these so that they are excluded from Bing's index.
Re: Website categorized as pure spam ets 11/5/13 12:03 PM
Whatever is troubling Google is also troubling Bing. 

Ben checked for ExitJunction Suzanne. :)

Re: Website categorized as pure spam ets 11/5/13 12:26 PM
Ashley's right to emphasize the need to update WordPress immediately. 

But in the meantime, have run any malware scanners against WordPress? I'm not a WP person, but there are things like this:
Re: Website categorized as pure spam Eric Kuan 11/5/13 12:33 PM
Hi Susanna, 

It looks like there may have been an error when processing your reconsideration request. We're re-processing it now, and you should see a change in the Manual Actions Viewer in the next couple of days.

Eric
Re: Website categorized as pure spam Ashley 11/5/13 12:33 PM
Did you ever use the URL removal tool in WMT?
Re: Website categorized as pure spam Robbo 11/5/13 1:04 PM

Here are a few examples of the sort of link text (anchor text) that was being used in link from other site/s to the domain:

"rio home bedding silky satin duvet cover sets 5 pieces -queen …"    

and the target URL was:

http://  microbialgenomics.org/    37-rio-home-bedding-silky-satin-duvet-cover-sets-5-pieces-queen-light-pink.html

which is of course now 404 Not Found.


And looking at the page on the other site the spammy link was on, I see they are authoritative [ :-) ]  for things like:

psychic dreams
party wear
toner cartridges
house decoration
printed designs on coolers
koozies






Re: Website categorized as pure spam Susanna Siebert 11/5/13 1:07 PM
Is there a good way to find URLs like these so that I can block/remove them?
Re: Website categorized as pure spam Susanna Siebert 11/5/13 1:09 PM
Thanks Eric. I appreciate it. If there should still be a problem could I get a more precise list of reasons as to why my site is categorized as pure spam? That would really help in fixing these problems.

Thanks again,
Susanna
Re: Website categorized as pure spam Susanna Siebert 11/5/13 1:13 PM
I removed the spammy URLs from the previous owner. I have not yet removed the publication pages etc.


On Tuesday, November 5, 2013 2:33:42 PM UTC-6, Ashley wrote:
Did you ever use the URL removal tool in WMT?
Re: Website categorized as pure spam Suzanneh 11/5/13 1:19 PM
I don't know about Bing, but with Google if a page is returning a 404 Not Found (which tag/welsh-corgi is), there shouldn't be a problem.

It's not the old site that's the problem; otherwise, Google would have reincluded the site.    It would be either a hack on the current site (which doesn't seem to be the case) or, as Ashley is suggesting, thin, duplicate content.  I haven't delved into the content, but I'd trust Ashley's opinion of it.

>>Ben checked for ExitJunction Suzanne. :)

Thanks, ets!  Getting to be a long thread. :-)

Suzanne
Re: Website categorized as pure spam Suzanneh 11/5/13 1:21 PM
Undo that.  Those pages don't exist anyway. Leave them as a 404; they'll eventually drop out.

You can deindex your site if that tool isn't use properly.

When Not to Use the Remove URL tool:  https://support.google.com/webmasters/answer/1269119?hl=en

Suzanne
Re: Website categorized as pure spam Susanna Siebert 11/5/13 1:25 PM
Suzanneh, it looks like there may have been a problem with my RR (See Eric's message below).

Also, I canceled those removal requests.
Re: Website categorized as pure spam Susanna Siebert 11/6/13 6:04 AM
Hi all,

As of this morning the "pure spam" designation has disappeared from the Manual Actions tab in the Webmaster Tools. I haven't received any messages or emails, however. So far none of my pages have been indexed. 

Does this mean that Google will start to index my site now? Or was this just removed while the site is being reviewed?

Thanks,
Susanna
Re: Website categorized as pure spam Ashley 11/6/13 9:32 AM
Hey Susanna - 

It generally means you're good. I'd give it a few days to crawl the site a few times and re-evaluate. 

Re: Website categorized as pure spam tucsonadventuredogranch 11/8/13 3:43 PM
Eric,
           I have been desperately trying to find an answer to a question regarding two disavow link messages I have received in my Webmaster tool account.
           Any response or help you can provide will be greatly appreciated.  (See below.)
                                                                                                                                        Thanks,
                                                                                                                                           Thom

I hope you can assist me in resolving an issue to which I cannot find an answer. Via my Webmaster tool account, I submitted a disavow link .txt file. I received the following two messages - which seem to contradict each other:
Results for the submission on November 5, 2013 3:42:39 PM UTC-7
You successfully uploaded a disavow links file (www.tucsonadventruedogranch.com_google domain and link disavowal file 11022013.txt) containing 135 domains and 1 URLs.
On 11/06/2013 this message appears:
The file containing disavowed links to

http://www.tucsonadventuredogranch.com/ has been updated. If this is unexpected, it may have been updated by another site owner. For more information, visit the Disavow links https://www.google.com/webmasters/tools/disavow-links?siteUrl=http://www.tucsonadventuredogranch.com/ page in Webmaster Tools. Details: You successfully uploaded a disavow links file () containing 0 domains and 0 URLs.
I cannot find an answer anywhere to my question: was
the file of 135 domains and 1 URL accepted, as suggested in first message; or does the "0 domains and 0 URLs" part of the second message mean the file did was not accepted? Very confusing.
Thanks,
Thom

                                                                                                                                    
Re: Website categorized as pure spam travler. 11/8/13 3:51 PM
Hi Thom


It's best to keep your issues separate from other threads with different issues. Hijacking is not a good thing :-)

Someone should be able to take a look at your thread and offer suggestions.

Thanks!
Re: Penalty on Brand Name! Asi Tisona 11/12/13 11:03 PM <This message has been deleted.>
More topics »