Categories: Crawling, indexing & ranking :


Showing 1-31 of 31 messages
opinion zbve 1/16/13 2:43 AM
I've read the FAQs and searched the help center. 

we are a business directory in the netherlands and do rank currently quite poorly.

we have made many changes since it went online.

we are combining social media information together with business hours, vat numbers, type of company, adres details, contact details etc.

example page

cant it survive?

yes, it has duplicate content but since we are combining data ( adding value )... it should be ok, right?


Re: opinion zbve 1/16/13 4:20 AM
Re: opinion Suzanneh 1/16/13 4:48 AM
Here's the thing:  you can do everything "right" and still not rank highly.  You need to use other means for people to want to use your site; you can't rely on Google.

Re: opinion zbve 1/16/13 4:55 AM
Thanks suzanne, yes, i know that. 

This is why we are extending data at the moment ( data is currently 30% complete, display etc is implemented, also the layout has gotten another overhaul ) 

But still, i see a lot of bad company directories way outranking us. 

They just print an scraped address. 

We will be using other channels as well ( thanks for the advice ). 

Mayb i am impatient... we are just a couple of months online and popularity is gaining.

If someone else has advice as well

We are open to all criticism.... also if its negative.

Re: opinion JohnMu 1/16/13 5:31 AM
Hi zbve

With directories like that, I always worry that it's very easy to create a ton of content that doesn't have a lot of value yet. For example, I clicked through to a number of businesses, and apart from the general information such as the address and occasionally some keywords, I am missing anything unique, compelling, and of high quality. I realize it's not always possible to have that from the start, but if the majority of your site's content is like this at that point, you can imagine that it can be very hard for our algorithms to judge your website overall. One solution could be to only include content that you're certain provides something unique; another possibility could be to keep those pages on your site (perhaps users will add more content over time?), but to block them from being shown in the search results (perhaps with a noindex robots meta tag). 

Another issue I ran into was that many of your pages are essentially search results pages (eg ). In our Webmaster Guidelines ( ) we recommend: "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines." 

Hope this helps!

Re: opinion zbve 1/16/13 5:56 AM
thanks John. Appreciate it a lot than someone from Google took the time to actually look at it.

We will noIndex the search pages, consider that done.

The unique value here is that we are adding social media links ( and data later on ), real business hours ( so users will be able to lookup )
and try to provide a complete summary of a company incl its social media campaign and ... and ... and... ( there is more, its not complete yet and cant say too much ;) )

The text, indeed, is something we programmatically took off of the sites linked and is not unique, its not our intention that google sees that text as the value of the site.

We would perhaps want to exclude the textblock from indexing if possible, since in the medium-long term, its just an additional information block... 
I dont know if this can be done using some surrounding tags

We are working hard to get this right, so i hope Google ( and ofcourse our users ) will appreciate the effort we are doing in the somewhat medium-long term.

Thanks again.
Re: opinion Lysis 1/16/13 6:51 AM
JohnMu has such an awesomely political correct way of saying "scraper."
Re: opinion zbve 1/16/13 9:51 AM
yes, we scrape and combine data into a new set of data.
Re: opinion zbve 1/16/13 9:56 AM
ps. all these sites outranking us do the same. they just take the data as-is and republish it unmodified.
Re: opinion Jamie Roszel 1/17/13 9:34 AM

I have had a similar experience to zbve, I operate a website that has an industry directory (opt in once a company provides their information to us - not scraped). Since 1995 we were always ranked very high within the search results and now we rarely rank on the first page of results if at all. The directory provides a valuable service to our users who are searching for companies within industry segments. Although the directory is a supplement to our marketplace a great deal of our 90% free user activity occurs within the directory.

I understand your algorithmic difficulties to rate this often shallow content as strong, however it is to our users. I have been struggling to recover from these changes and you are pushing our business to the brink of no longer being viable. How can your programs arbitrarily decide that our website which has provided an invaluable service to an industry for 18 years is suddenly not worthy of being listed in your directory? How does this provide your users with the best results for their search?
Re: opinion zbve 1/17/13 10:40 AM
Hi Jamie,

We are currently testing the suggestions of John. 

I think they are valid and this is why we decided to implemented these changes.

We have noindexed non-value pages and blocked nearly 80% of our pages through robots.txt

Ofcourse we (partly) rely on Google for getting users in that want to subscribe and modify/extend their data ( thereby helping other users, and ofcourse themselves ).

That wont happen if it cannot be found.

So i am very curious what these changes will bring us.

We are not dependent on the success of this directory but would be nice if it brought us some $$$'s, because of the work involved

Lets c what happens from now on....

Re: opinion zbve 1/18/13 1:54 AM
Thanks lawrence, we are currently working on that... 
Re: opinion JohnMu 1/18/13 4:51 AM
Thanks for the update, zbve - I'd love to hear how things settle down over time, feel free to update this thread :)

Jamie, it would help to have your site's URL, so it might make sense to start a separate thread about that, so that the people active here in the forums can take a look.

Re: opinion zbve 1/18/13 5:46 AM
Thanks John. Will do. I have starred the discussion and have made some notes in my agenda and will update this thread over 1 month, 3 months and half a year. 
Re: opinion Lawrence Shaw 1/18/13 7:01 AM
No problem and good luck.

Btw i used to get an idea of your link profile. 

Definitely worth a look. Seems like theme related startpagina's in the Netherlands are a good place to start ;).

Re: opinion zbve 1/19/13 3:58 AM
i started blocking urls with robots.txt

User-agent: *
Disallow: /bedrijven/regios/
Disallow: /bedrijven/inschrijving.html
Disallow: /bedrijven/rubrieken/
Disallow: /bedrijven/zoeken/
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/

containing entries such as

i get the following warning

URL geblokkeerd door robots.txt.
Sitemap bevat URL's die worden geblokkeerd door robots.txt ( translated: sitemap contains urls that were blocked by robots.txt ).

dates are 18 jan ( yesterday ).. didnt change anything.

Re: opinion zbve 1/19/13 4:53 AM
john, i had to revert some of the changes because googlebot is throwing errors at me that i dont understand.

Re: opinion zbve 1/19/13 4:59 AM
testing through robots.txt tool shows that these urls are allowed.

so i dont understand what's going on.

might be bug?
Re: opinion zbve 1/20/13 3:23 AM
the 10000 warnings are gone. i removed the line Disallow: /bedrijven/zoeken/ and made the searchpages noindex,follow instead.

no more need to answer this one.

i will leave it this way.

Re: opinion Jamie Roszel 1/22/13 10:13 AM
Zbve it will be interesting to follow your success / failures.

Good luck and I look forward to your updates.
Re: opinion zbve 1/22/13 3:39 PM
we had something stupid done, this is why we got all the warnings, errors, etc.... since yesterday we are error free and we see googlebot now indexing the stuff we want it to index.

will take time to work through the index... had indexed > 2 million pages, is now coming down... should end at around 500k

so now its johnmu compliant ;)

will update the post if i see a noticable positive difference.

Re: opinion landed 1/23/13 12:00 AM
Well in a set of results google doesn't show any 'mashup' just scraped content from description and title - So Lysis you should globally apply your opinions to G as well. Just because someone once said scraping was automatically bad doesn't mean it is (price comparison tools ???). Google shouldn't IMO be allowed to like certain styles of sites and its algorithm must not implement something like this to stop other search engines based on 'style' of content ! That is monopolisation and isn't my opinion that is a fact.
I have cited cases where duplicate content is useful to the reader. Imagine being on a long train platform and you have to walk right down the other end to get the poster with train times ! Sometimes for convenience it is ok to show a hotel blurb that is already online in many other places or directories. So I think a blanket ban or war on duplicate content is not good. I am not saying that google is doing this just yet - but a lot of people take some phrases that Google come out with then just go steroid on them.
Re: opinion landed 1/23/13 12:48 AM
Hi John
It may be easy to create a ton of content that seems to have no apparent use, but isn't that up to the reader to decide ? Why are google bringing a domain down because of these pages ? I think a fairer solution is for google to ignore these pages. Sites like this and mine got caught up in this 'spam trap' ? As yes maybe there are gross abusers , but you need to think of the innocent sites too. I am an individual who's sites have fallen through these cracks - I still fight now through other avenues and hope that the sites can survive sans google. My site is very similar to this one in case and provides a mashup of data (not scraped) but on the surface you CAN say thin content for many detail pages - but I dont think G has the right to penalise the domain (site) for it.
I have asked here how to tell google to ignore these and it seems zbve might have found a winning way (not use robots but use no follow I just dont know how to do that for x1000's of links ?) - no one replied to my question with anything that was doable for thousands of pages like this. I have since been able to add canonical tag but that is a separate issue. And sorry I digress.

I wanted to make the point that google should be a search engine and not be a web police. And G is so powerful it has responsibility to be fair and just.

"our algorithms to judge your website" .. but I do thank you for coming out and posting here. I disagree with the ethos of the G Algo and not you John.

Is there a way I can use a part of a url to stop it getting crawled easily and safely ? not url by url.. 

/thin-content-iknow-/* kind fo thing
Re: opinion landed 1/23/13 12:49 AM
Can you be more specific - I need to do this I think but its not clear how you do a 'noindex,follow'
(unknown) 1/23/13 3:35 AM <This message has been deleted.>
(unknown) 1/23/13 3:38 AM <This message has been deleted.>
(unknown) 1/23/13 3:45 AM <This message has been deleted.>
(unknown) 1/23/13 4:05 AM <This message has been deleted.>
(unknown) 1/23/13 2:31 PM <This message has been deleted.>
(unknown) 1/25/13 4:53 AM <This message has been deleted.>
(unknown) 2/18/13 7:58 AM <This message has been deleted.>
More topics »