| I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | jeffkemp | 9/28/11 3:47 AM | I have read the FAQs and checked for similar issues: YES My site's URL (web address) is: http://www.foothillschurch.org.au Description (including timeline of any changes made): Background: The site is based on Oracle Application Express, which uses |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | alistair.lattimore | 10/1/11 5:55 AM | You've got a couple of options to sort this out: 1) Return a HTTP 404 error to search engines when they crawl a URL with a non-zero session id 2) Return a HTTP 410 error to search engines when they crawl a URL with a non-zero session id 3) Add a meta |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | jeffkemp | 10/1/11 7:07 AM | Hi alistair, Thank you very much for your detailed reply! I'm sure all of those options may help in similar circumstances. Option 5 is, indeed, the best - I've already added canonical links to each page, with the 0 session ID. My problem is that ap |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | webado | 10/1/11 7:24 AM | Using http://web-sniffer.net it's impossible to access your site and get anywhere at all. Same with a crawler like Xenu. The first access does a meta refresh to a url with the string appended to the domain: /apex/f?p=CHURCH:1:0::::: Trying to get to th |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | jeffkemp | 10/1/11 7:47 AM | Hi webado, Thanks for the input, and also thanks for pointing out web-sniffer which looks interesting. I don't know much about http://web-sniffer.net but it doesn't seem to correctly handling the initial 302 redirect. I think it might be something to do w |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | webado | 10/1/11 8:28 AM | Web-sniffer does not have to handle anything in any special way. It's your server that has to serve the right stuff to the requests made of it. You have the same issue with Xenu. |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | webado | 10/1/11 8:38 AM | Maybe your server already has special code in place to handle Googlebot specifically, if it's managing to go through at all, while no such thing is done for other robots. BTW, the robots.txt file is incorrect. You must not have any blank lines am |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | alistair.lattimore | 10/1/11 4:18 PM | Out of complete curiosity, why are you using Oracle Apex for your website instead of more mainstream CMS? |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | webado | 10/1/11 4:21 PM | Must be a fairly large organization that already uses Oracle as a database and Apex is the tool Oracle offers to build relatively quick applications. But as I said I've only seen that used for intranet applications. |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | jeffkemp | 10/1/11 10:54 PM | Hi, the only reason I'm using apex is because IRL I'm an Oracle database developer, so I'm using what I know best :) BTW thanks for the tip about our robots.txt, I'll fix that in case it causes issues for any crawlers. I've heard it is possible to |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | jeffkemp | 10/3/11 7:42 AM | webado: |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | webado | 10/3/11 9:51 AM | Robots do not accept cookies so this is what's wrong. |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | jeffkemp | 10/3/11 3:47 PM | I noticed that the majority of the URLs Google was having trouble indexing (due to expired session IDs) were using the app ID (e.g. f?p=102:) instead of alias (e.g. f?p=CHURCH:). Since my site works completely using alias, I've put 102 in my robots.t |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | webado | 10/3/11 6:25 PM | You should have 301 redirected the urls that use p=999 to the equivalent ones that use p=AAAA or whatever . That's instad of blocking them in the robots.txt file. Googlebot won't be accepting cookies, that's only for browsers. Maybe you have someth |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | jeffkemp | 10/3/11 6:55 PM | Hi webado, Yes, except that so far I've been unable to work out a way to do that in my Apache configs. /apex/ is proxypass-ed to the oracle server on localhost, so any redirect or rewrite rules in my apache config never get a chance to work on them. |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | webado | 10/3/11 8:42 PM | Oh dear, forgott about the Apex bit .... Still I'm sure a 301 redirection can be done in .htaccess, we're not talking url rewriting. This should be intercepted before it's sent to Apex, won't it? Since your application understands both p=999 and |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | alistair.lattimore | 10/4/11 5:54 AM | Before we get started, let it be know that I've never needed to use mod_proxy before. That being said, a quick read through the mod_rewrite & mod_proxy documentation suggests to me that if you added standard mod_rewrite rules before the rewrite rule |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | jeffkemp | 10/4/11 6:10 AM | I've got it all in my /etc/httpd/conf.d/apex.conf file. Contents (I've abbreviated a bit) are below. Please don't nitpick it apart (unless it has a bearing on the problem we're talking about here), thanks :) NameVirtualHost *:80 <VirtualHost *:80> |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | alistair.lattimore | 10/4/11 6:24 AM | I'd assume if you added rewrite rules issuing a 301 redirect before you start the proxy configuration - ie before or after your 'unrelated rewrite rules' - they should work. Essentially what should happen is a request for a URL (ref by number) comes |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | jeffkemp | 10/4/11 7:19 AM | Thanks alistair, probably the most helpful part of your answer was the suggestion to do a 301 redirect. My problem was, I had tried that, but couldn't get it to work - and assumed it was impossible :) So did further digging and discovered my proble |
| Re: I mistakenly misconfigured my site, now Googlebot is flooding my site with requests that will always get 302. | jeffkemp | 10/4/11 7:41 AM | Improved, now it rewrites the session ID to 0 (zero) for application 102 (alias CHURCH): RewriteCond %{QUERY_STRING} ^p=102:([a-z0-9]*):([0-9]*):(.*)$ [NC,OR] RewriteCond %{QUERY_STRING} ^p=CHURCH:([a-z0-9]*):([1-9][0-9]*):(.*)$ [NC] |