Categories: Crawling, indexing & ranking :

Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt

Showing 1-29 of 29 messages
Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 3/30/13 11:30 AM
I've read the FAQs and searched the help center. Yes

Hi, posted this in Webmaster Tools, but deleted it to re-post here.

We added some pages to our robots.txt file early in March.  All of the pages that we wanted to block were in our root (/media.php as an example.)  We went through to make sure there were no pages that would cause any issues, but I am guessing maybe there was some odd page that we blocked that was unknowingly used by our CMS to connect with other pages.

The lion's share of our links would be coming from either our product pages, or our search result pages.  So, when we initially added in the items to robots.txt, we made sure to test with these pages and they came back as ALLOWED and even AFTER we discovered that, so many pages were being blocked, we again tested with a couple of example pages and again, they said "ALLOWED" which was very odd making us ask, "What is exactly being blocked here?!"

On March 27th, we just removed all of these pages from robots.txt so that all that was in there was:

User-agent: *
Disallow: 



However, the next day, in spite of this change, the number of blocked pages was still growing! 

March 28th: 910,273
March 29th: 932,678. 
March 30th: 940,000 

On the Index Status page we show that we have 963,248 total pages with 822,126 pages blocked by robots.txt.

I located a similar post where this guy's number of pages blocked exceeded the number of pages indexed after robots.txt was fixed and it appears in his case, that while his traffic seemed to approve, the number of blocked pages still seemed to increase as well: http://productforums.google.com/forum/#!search/blocked$20by$20robots.txt/webmasters/dUGG9sDCdfA/zVat_nsp4PMJ and there didn't seem to be a resolution or at least he didn't post so.  Could this be a bug or something?  



Thanks very much for your assistance!

Craig
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt rapsearch 3/31/13 5:10 AM
Im experiencing this same issue...is there news on a fix?
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/1/13 8:28 AM
As an update, we are at 941,364 pages blocked by robots.txt today.
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt rapsearch 4/2/13 11:59 AM
anyone knows?

is it some kind of manual penalty?

For me it keeps ring and rising as well altough my hosts confirms my robots is as it should be.
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt Ashley 4/2/13 12:14 PM
No - it's not a penalty.

Have you tested the latest robots.txt file so it's showing success?
Can you include some of the URLs that it says are blocked?
Have you tested these pages using Fetch as Googlebot?
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/2/13 1:02 PM
Hi Ashley, 

Thanks for posting.  Today we are at 941,266.

So, we the number of blocked pages is down just by 100 pages today, but at least that is something and it isn't still going up.

Yes, we have tested robots.txt.  The odd thing is, we tested it before all of these pages started showing as blocked.  We still don't have any idea why these numbers are high like this.  

In fact, we had one page we blocked and shouldn't have r.php.  It is a page we use for shortening URLs.  Google immediately warned us about it and we removed it from the robots.txt that day and all was good.  So, it seems a bit odd that this many pages are blocked without a similar warning.

Now as far as what URLs have been blocked, I am actually not sure where to find that?  In webmaster tools, it appears to just show how many URLs.

If I look under Crawl Errors, nothing of interest there.  It isn't showing pages "blocked by robots.txt" or anything.   

It would be very helpful to know which URLs are being blocked.  Do you know how it is possible to view the actual URLs that are being blocked?

Thanks!

Craig
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt Ashley 4/2/13 1:42 PM
hm... I thought you could but now that I'm tinkering I'm not sure if it easily tells you which ones are blocked. 

Can you identify any pages on your site that are not indexed that you think should be?
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/2/13 2:37 PM
No, that's a tough one.  Timing is not the best on this one as we had some title issues with our pages that we had JUST fixed when we made the robots.txt mod, so it appears that as our rankings were going up over the last month, they were also going down because of robots.txt...  So, in theory, we have a lot of pages that should be indexed, that shouldn't be.  It's whether or not they were indexed before the robots.txt issue that is the question.

Thanks!
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt rapsearch 4/3/13 5:01 AM
Ashely,

In my case nothing strange is in the robots, see attached file.

i can not find a way to see what's being blocked as well as that would solve this whole issue... you just get a number and have to deal with that so i keeps a guessing game :(
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt Ashley 4/3/13 7:05 AM
Is this the same person under a different ID or a different person posting?

If different, please start your own thread. 
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/3/13 8:07 AM
While, I am happy to share my thread with rapsearch, no that's not me. :)

Ok, so today we are at 941,005, only down by 200 or so.  

The frustrating thing about that, is that when we made the initial robots.txt change early in March,  within 6 days Google had already blocked about 400,000 pages.  Here it is, 7 pages since we removed the content from robots and we are only down 200 items?! :(

Not bueno. :(

With stuff like this where it is clearly either an honest mistake on our end, or a bug on Google's end, is there any way to get in touch with someone at Google who could help?  Is it worth sending in a site reconsideration request just to get it in front of a Google staff member? 

We aren't a small little spammy site or anything, but a well established high quality site that has been around for 7 years.  (Not to mention spent enormous sums of money on Adwords....)

Thanks again for your input and feedback!

Craig
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt Ashley 4/3/13 8:27 AM
A reconsideration request isn't the right move as you have no manual penalty. I'll escalate this thread as it's a bit odd to me. However, posting some actual URLs that are blocked/not indexed that should be would be immensely helpful.

My gut says that WMT is just delayed and that in reality, there's no issue with actual indexation numbers. You have over 1.6M pages indexed currently. 


P.S. - AdWords spend has nothing to do with organic search, so that's not a factor in this discussion. 

Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt rapsearch 4/3/13 8:35 AM
Hi Ashley,

My apology if i interviened in this thread but its exactly the same problem i have so i figured i'd join the conversation.

I really hope its what you say as my site is indexed for like 300k pages and this number has always been like that and lately even growing a little... but so are the blocked urls and like i said before, no clue where that comes from. I would love to post up blocked urls but they are a mystery to me...

it might be a good addon for the webmaster tools...to see what's been blocked so you can fix them..... so.. google, where you're at?? ;)
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/3/13 8:58 AM
Hi Ashley,

Thanks for the response.  Indexation isn't really my concern as much as ranking.  I know we have a lot of pages indexed.  As mentioned before, we had the title issue previous to this and so a lot of our pages that should have been ranking much higher were not even showing up or were just showing up way behind our competition.  When we fixed this title issue, which was almost the same day we made the robots.txt change, we started seeing improvements almost immediately.  

Here is an example:


Before we made our title fixes, we only had 1 of  OUR items showing up for this search.  As you can see from the example, the items that show up are from spam affiliate sites that we DO NOT want.  They are basically taking our content and posting it on their spam site and are showing up INSTEAD of us in search. :(

So, we fixed our title issue around the 26th.  Within about a week, if you did the same search above, instead of just seeing one of our items, you saw about 15 of our items on the first and second pages.

However, I suppose that as the month went on and the robots.txt issue kicked in, all of this changed.  A few days ago, we only had 2 items showing up there and they were at the top of the page.  Now, that has changed again so that the spam sites again shows up at the top instead of us and there are a few more of our files, but nothing like it was at the beginning of March.

So, it is more of a ranking issue than an index issue.  

I totally get what you are saying on the adwords.  Just trying to make the point that  we are a legit business and in our space a well known brand.

Thanks for the escalation.  We can really use the help, so I really appreciate that.  Nice to know we are being heard! :):):)

Craig
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt webado 4/3/13 9:26 AM
>>Indexation isn't really my concern as much as ranking

There is no ranking without indexation!
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt webado 4/3/13 9:32 AM
Webmaster Tools lags in showing issues on the site by several days. When you first saw the message with so many blocked urls it was based on what was in effect a few days before you changed your robots.txt file to remove blocks. Subsequent messages are still based on those original blocked urls. You need to be patient until Webmaster Tools flushes them out.

Your site is indexed copiously. Doing a site: query you see  About 1,620,000 results 

Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/3/13 9:38 AM
What I meant was, we are already indexed for most of our files.  Sure, having every single one indexed would be great, and if any of our files were de-indexed due to robots.txt, then yes, I would like to know which files those are exactly.  As mentioned above, Ashley says we have 1.6 million pages indexed.  (Actually that sounds a bit high though.)  

So, that is what I mean when I say indexation isn't the concern as much as ranking.  It is what happened to the ranking of those files upon the robots.txt change.
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt webado 4/3/13 9:48 AM
What was in your robots.txt file before?

Now I see nothing is blocked.
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt JohnMu 4/3/13 10:09 AM
I wouldn't worry about the absolute number of URLs disallowed by the robots.txt file, on the one hand, it can take a little bit of time for the count to catch up (remember that this is a cumulative count), on the other hand, things like URL parameters can result in a very large number of URLs being noted as being disallowed, while it may practically be the same content (we wouldn't notice redirects, rel=canonical, or duplicate content in general if we can't crawl the URLs). 

My recommendation would be to disallow crawling of URLs that are problematic for you to have crawled (for example, if they cause a high server load or unnecessarily high bandwidth), or where it's problematic for you to have the content indexed. Disallowing crawling does not mean that the URL won't be indexed, but we wouldn't be able to associate the content with the URL. If the previous robots.txt file was ok for you, then I'd use it again. 

If you want to keep track of which URLs are indexed, I'd recommend doing that through Sitemap files - using the site:-query is not a good way to get that count. The Sitemap files have the additional bonus in that you could group the URLs by logical site-section, so that you can see which parts of the site are better indexed. 

Hope it helps!
John
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/3/13 10:45 AM
Hi John.  Thanks for the post!  

Are you suggesting that while we are only seeing this number changing by 100 or 200 a day, that we could see it change in greater chunks soon?  As mentioned above, the rate at which these files blocked was enormously more than the rate at which they appear to be unblocked.

Thanks!

Craig
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt JohnMu 4/3/13 11:48 AM
Hi Craig

It's likely going to take quite some time for those URLs to either drop out of the index or be recrawled again, so I would not expect to see that number significantly drop in the near future (and that's not a bad thing, it's just technically how it works out). 

Cheers
John
TalkToMeAmigos 4/3/13 3:59 PM <This message has been deleted.>
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/10/13 8:49 AM
As an update...

We are now 15 days in, since this happened.  We did find some issues with our sitemap, so updated and fixed that, to include several hundred thousand pages it had been missing.  WMT shows that so far, about 200k new pages have been indexed as a result of this.  However, the number of blocked pages STILL has barely budged.

Every day I check expecting to see a drop, but the lowest we have gotten is 937,165.  Doesn't really make sense that at least 200,000 new pages have been crawled and indexed according to WMT, but the blocked number has not budged?

Thanks!

Craig


Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt webado 4/10/13 7:31 PM
Be aware that whatever Webmaster Tools shows is a few days old at all times and message remain for a few weeks after they no longer apply. This does not indicate an ongoing problem.

I published a new site a week ago. Previously it had a robots.txt file that blocked all robots for months. Once I removed that block, over the course of this last week Webmaster Tools started showing that hundreds of urls had been blocked, while at the same time gradually all 10 pages of  the website became indexed. I know the messages pertaining to how many urls are blocked will disappear, it's just a matter of time. But what matters is that Google, the search engine, is a few steps ahead of Webmaster Tools.

Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/10/13 8:03 PM
Ah, ok Webado.  I do appreciate that.  At the same time, I am watching some keywords by which we should be seeing some pages show up again in the search results.  Still hasn't happened yet as it should, but I hope/expect it will. 

Thanks very much for taking the time to chime in.  That perspective helps for sure.

Craig


Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/16/13 10:14 AM
Just an update on this.  As of April 16th webmaster tools is showing that we have 927,434 URLs blocked.  So, in about 17 days we have only gone down about 13,000 files.  Have to admit, it's just frustrating to see that number go down so slow when it rose so fast.   Maybe this number will pick up speed?  At this rate, it would take forever for the number of blocked pages to get to 0. 

It definitely appears to be reflected in our product page rankings as well.  Here is an example search below.  A month ago, many of our files would show up on the first page, but now we get spam affiliate sites and our files are not to be found.  


We re-subbed our sitemaps on the 7th and are showing that almost all of the files in the sitemap have been crawled and indexed since then.  

Thanks!
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt webado 4/16/13 10:21 PM
If most of the urls in the sitemap have been crawled and indexed, then you already know that the reports of those 404s are out of sync with reality.
Webmaster Tools lags quite a bit in first reporting and then clearing messages like that.
Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt TalkToMeAmigos 4/17/13 2:05 PM
True, but I have never really been so worried about 404s or the pages being de-indexed.  I am more worried about ranking issues.  As that URL above shows, our items are way back in the rankings whereas before, they were up front, and really, when including our own brand name, our files should definitely be coming up first and not the spam affiliate sites.  In fact, you don't see one of our own product pages until page 3 when running that search.  Before the robots.txt issue pretty much the first 2 pages and maybe more would be filled with only our own products.

I suppose/hope this will correct itself in the next few weeks and our products will begin re-ranking where they were before and at the least out-ranking these spammy copies of our site.

Thanks!

Craig


Re: Help from Google please? 900,000+ pages blocked and growing, from an empty robots.txt webado 4/17/13 5:29 PM
There's no ranking without indexing.

Did you find out what exactly eas wrong with your robots.txt file previously? or your server responses?

As for where your pages rank now, that would not be due to robots.txt issues. Perhaps in the past you saw them while logged into your Google account with search history and preferences enabled. That would show what you want to see, not what others see.
More topics »