Categories: Crawling, indexing & ranking :

Google is Blocking their js in Robots.txt But Tell Us Not To.

Showing 1-13 of 13 messages
Google is Blocking their js in Robots.txt But Tell Us Not To. PamS1234 6/23/14 1:10 PM
I've read the FAQs and searched the help center. Yes
My URL is: 


It would seem we dropped a little more on rankings around the release of the Panda 4.0, which indicated that we needed to unblock our js and css for layout, etc.

In our webmaster's tools, the new google fetch shows the Partial layout. I fixed all our errors cause our robots.txt was blocking the js/css, but now when I try to check again, Google's adwords remarketing js is no causing the Partial. 

What concerns me is I have read some people that have actually tested the partial robots.txt and found that there is definitely a connection with some rankings dropping because of the Panda 4.0.

So I was just curious, why would Google put into place this tool that flat out says, do not block any css and js, but yet Google is doing it. And what's worst is, it is via a paid service Google service, which we spend a fair amount of money on. Then just to be knocked down on organic rankings via Google's own guidelines. If that was truly the case, then Google is saying, 'spend money, but don't worry, we will keep your organic rankings down so you will continue to have to just spend more money to be seen.'   

It's pretty sad I think this way, but can someone enlighten me on this and if it is hurting us?. :)

Re: Google is Blocking their js in Robots.txt But Tell Us Not To. KORPG Kevin 6/23/14 1:54 PM
So I was just curious, why would Google put into place this tool that flat out says, do not block any css and js, but yet Google is doing it. 

I'm sure from Google's viewpoint they'd prefer to crawl everything regardless of what and where it is.
But that doesn't mean you have to let the bots go anywhere.

In short, Google advises you not to block those files. You're welcome to do so regardless.
All the Partial Response is informing you is that there is content on your pages Google can't crawl. That's all.

It's a warning or a notification for you to check deeper and make sure you really want to block the bots from crawling those files.

What concerns me is I have read some people that have actually tested the partial robots.txt and found that there is definitely a connection with some rankings dropping because of the Panda 4.0.

Citation please. 
Re: Google is Blocking their js in Robots.txt But Tell Us Not To. ets 6/23/14 1:55 PM
For decent help, you'll need to post your URL, please. You can disguise it with a URL shortener such as goo.gl or bit.ly if you don't want the URL (or your site name) to appear here in full and be visible through Google searches.
Re: Google is Blocking their js in Robots.txt But Tell Us Not To. PamS1234 6/23/14 2:03 PM
Hi Kevin,

This was just one of the articles I read on the topic. https://yoast.com/google-panda-robots-css-js/

est, I was wasn't asking for site help, just the question about the robots.txt blocking js and the new tool that is showing Partial. I unblocked on my end, just the Google adwords remarking remains as being blocked by robots.txt.

Google did actually change their guidelines and added about not blocking js/css, and also added the tool in webmasters, I would think they would add this tool for a reason and most likely a way for everyone to fix their issues. 
Re: Google is Blocking their js in Robots.txt But Tell Us Not To. ets 6/23/14 2:17 PM
Indeed, but your question seems to be phrased along the lines "We've suffered from Panda 4.0 and I think these specific factors might be responsible". Whereas my instinct tells me there are probably far more significant things wrong with your site that are causing the algorithmic demotion - and if you share the URL, we can suggest what they are :)
Re: Google is Blocking their js in Robots.txt But Tell Us Not To. PamS1234 6/23/14 2:33 PM
ets, Yes, I do feel we have suffered as well, however I know our site is suffering from a lot more than just the Panda 4.0. We hired an SEO company a while back and they did a lot of spammy back links and we ended up with a manual penalty. Needless to say we are in the process of trying to clean that up, but are still working on it. So I know that the spammy links are a main issue for us. 

You are more than welcome to take a look at the site http://goo.gl/wEovtD and I would love nothing more than for you to take a look. I always welcome any insight or information that may help us get this site back on track.

We had our js and css blocked in our robots.txt like I mentioned but unblocked them just the other day. The software cart we are using comes standard with their robots.txt that is block it by default.

Thanks for taking the time to look. 
Re: Google is Blocking their js in Robots.txt But Tell Us Not To. KORPG Kevin 6/23/14 3:14 PM

2009 video where Matt said not to block these types of files.
That's 5 years ago.
Re: Google is Blocking their js in Robots.txt But Tell Us Not To. JohnMu 6/23/14 11:16 PM
Allowing crawling of JavaScript and CSS makes it a lot easier for us to recognize your site's content and to give your site the credit that it deserves for that content. For example, if you're pulling in content via AJAX/JSON feeds, that would be invisible to us if you disallowed crawling of your JavaScript. Similarly, if you're using CSS to handle a responsive design that works fantastically on smartphones, we wouldn't be able to recognize that if the CSS were disallowed from crawling. This is why we make the recommendation to allow crawling of URLs that significantly affect the layout or content of a page. I'm not sure which JavaScript snippet you're referring to, but it sounds like it's not the kind that would be visible at all. If you're seeing issues, they would be unrelated to that piece of JavaScript being blocked from crawling.

Cheers
John
Re: Google is Blocking their js in Robots.txt But Tell Us Not To. ets 6/24/14 12:49 AM
It sounds like you're already working hard on backlinks, so I've skipped looking at that. Even so, you might find a quick look at Stevie's words of wisdom helpful:

Links might well be your major issue.  However, to be on the safe side, I've had a quick look at the site content and indexing. It doesn't look too bad, but there are some things to tighten up when you get a moment...

This is everything you have indexed: http://goo.gl/FF5sAz (about 2260 pages).

Paging through the results, I can see:

1. Thin category pages such as http://goo.gl/gfs4Da

I'd want to add some originally written text description to all those - otherwise they're just "thin" pages that lower the average quality of the site.

Pages like this might be considered "thin" - and not worth indexing. They're useful pages, but they probably won't bring you search result traffic: http://goo.gl/xYsqf9

Looking at a typical product page: http://goo.gl/x6aEse

I see plenty of text description there - and it seems to be original to your site, so that's great.

However, I do see places where you're using the same text on multiple pages, such as: http://goo.gl/OTDY7I

I'd try to avoid that. It could lead to Google seeing some of your pages as duplicate content. Although there's no penalty for doing that, some of the duplicate pages could get hidden in searches - or the page that gets shown in a search might not be the one you want. Use unique content on every page if you possibly can.

2. IMHO, the pop-ups/modals/live help are intrusive and annoying. They could well be making people bounce off your site, which might not be helping. If Google picks up that people aren't engaging with the site or the content.

3. You've got quite a few PDF files indexed that probably shouldn't be. See: http://goo.gl/s8TxW7

Page through and you'll find lots of assembly instructions. I'm sure these should be on your site, but is there a good reason to have them indexed? They just look like thin, empty pages to Google. I would recommend simply linking the assembly instructions from the product page and noindexing the PDFs, which you'll have to do using an X-Robots-Tag noindex:

There may be some PDFs you want indexing, but I'd think long and hard about indexing the lot.

4. If you page to the end of your indexed search results (http://goo.gl/FF5sAz), you'll see Google pulls up at about 300 pages - which is far short of the 2260 that are supposedly indexed.

That's caused by things like this: http://goo.gl/WK6zkQ and this: http://goo.gl/vwWFhP

They're you're blocking URLs in robots.txt that are indexed nevertheless. Robots.txt blocking doesn't stop things getting indexed. I'd clean that up by *unlocking* and using noindex (if the files don't need to be indexed) .

Looking at your robots.txt: http://goo.gl/MjhzBO

There are tons of blocked query string type pages in there that you should really be tackling with URL parameters or canonicals - if you're trying to stop multiple versions of the same page being indexed with parameters attached. 

For example, if you want this page indexing: http://goo.gl/aXya34
but you want to stop this duplicate of the page being indexed as well: http://goo.gl/GRLW47

You could set limit as a URL parameter if it's not changing the page content (be sure to double-check). You already have a canonical URL for the page, so if that is being recognized and respected, there is no need to block the query string in robots.txt as well.

URL parameters:

rel=canonical

In summary, I'd work through your indexing and ensure that all the pages are as strong and original as possible, each with unique text used only on that page and on no other pages or sites. I'd ensure that any pages you don't want indexing are unblocked in robots.txt and properly noindexed. I'd also noindex any generic, not terribly useful pages that don't bring search traffic (such as forms, private policies, and other general junk).

That's all I've spotted on a quick look through.
Re: Google is Blocking their js in Robots.txt But Tell Us Not To. PamS1234 6/24/14 5:53 AM
ets, Thank you so much. You have given me a wealth of information here. I can tell you took a lot of time and effort in your evaluation. 

THANK YOU!

Re: Google is Blocking their js in Robots.txt But Tell Us Not To. JohnMu 6/24/14 7:00 AM
Hi PamS1234

Looking at your site, those disallowed scripts are definitely not causing a problem -- it's primarily an issue of problematic links here. That's what I'd focus on first. Since there's a manual action involved, that's something which you can work on to resolve. Keep in mind that even after resolving the manual action, it can take a bit of time for all of our algorithms to take those changes into account (we have to first recrawl those links, see that they're removed, disavowed, nofollow'ed, take that into account in compiling the data for the algorithms, and then make those changes public -- this can sometimes take a half a year or even longer, depending on how many problematic links are out there, how long they've been there, etc). My recommendation would be to really clean up any link issues as completely as you can, so that you don't have to worry about them again in the future, so that you don't have to go through several rounds of reconsideration requests. 

Regarding your more general question of whether disallowed scripts, CSS files, etc play a role in our Panda quality algorithm: our quality algorithms primarily try to understand the overall quality of a page (or website), and disallowing crawling of individual aspects is generally seen as more of a technical issue so that wouldn't be a primary factor in our quality algorithms. There's definitely no simple "CSS or JavaScript is disallowed from crawling, therefore the quality algorithms view the site negatively" relationship. 

A lot of sites disallow crawling of JavaScript & CSS for historical reasons that are generally no longer relevant. If your JavaScript or CSS files significantly affect the content or layout of the page, we recommend allowing us to crawl them, so that we can use that additional information to show your site for queries that match content which isn't directly in your HTML responses. While unrobotting that content would make things easier for our algorithms to pick up, it would be incorrect to say that not allowing crawling would automatically trigger our quality algorithms to view your site negatively. 

For more information about our quality algorithms (which, for the moment, isn't negatively affecting your site), I'd recommend reviewing http://googlewebmastercentral.blogspot.ch/2011/05/more-guidance-on-building-high-quality.html  

Hope that helps!
John
Re: Google is Blocking their js in Robots.txt But Tell Us Not To. PamS1234 6/24/14 8:59 AM
Hi John,

Thanks for taking the time to look at that for me. We did actually have our js/css blocked via our robots.txt and I unblocked them last week. I don't mind being as transparent as possible with our site when it comes to crawling, I was more concerned about duplicate content issues, etc.Hence why a lot of disallows. 

I would like to ask if 'where' the site is hosted plays a role at all? We also noticed that since we moved this site to a Amazon server VPS, the site appears to have taken an initial drop in rankings. We were expecting a slight drop, but we did not see it recover like we were hoping. We had set in place all the 301s for the move, etc. I was thinking maybe a server configuration setting or something else might be hindering the crawl or other factors that might be causing issues.  

So what ets was saying on his evaluation holds true or you are feeling it is mostly the back links?

Thanks again for all your help. I do greatly appreciate it.


 
Re: Google is Blocking their js in Robots.txt But Tell Us Not To. ets 6/24/14 9:36 AM
So what ets was saying on his evaluation holds true or you are feeling it is mostly the back links?

I think you can be 1,000,000% confident that if John tells you it's the links, it's the links. So that's definitely your priority. I didn't examine the backlink issue because you'd already flagged it as a known problem. John has given a fairly clear steer that Panda is not your problem right now - so you can take what I said with a large pinch of salt. Even so, cleaning and tightening up your content/indexing is never a bad idea... and I think addressing some of the issues I raised (like the blocked files in robots.txt, the PDFs, and the URL parameters) is a good idea in the long run. But don't let it distract you from link cleaning.

Where the site is hosted doesn't play a role beyond any performance impact. You refer there to possible crawl issues - and that's where looking at the indexing can help. Because you clearly have a lot of URLs indexed twice (some blocked with query strings in robots.txt), which means you probably have a lot of unnecessary crawling as well. If the URL parameters were specified correctly, your crawling would be far more efficient.