Categories: Google News Publishers :

Hashbangs, HTTPS, and Pending Sitemap Questions

Showing 1-16 of 16 messages
Hashbangs, HTTPS, and Pending Sitemap Questions Todd H. Lynch 5/16/12 1:00 PM
I have two questions:

1) I am working on getting the site https://www.stlbeacon.org submitted to Google News.  The site uses both hashbangs and forces users to HTTPS because the whole site functions as a single page application with integrated donations.  I have adjusted the HTTP -> HTTPS redirect so that robots are not redirected, this should allow google news to crawl my site in HTTP mode, and they index HTTP pages, that is ok with me, as the redirection will take place for regular users.  I have also properly implemented _escaped_fragment_ and it is working on regular google search.  Is there anything that I should be aware of that would prevent google news from indexing the site?

2) I have submitted a news sitemap and it has been accepted by Google Webmaster Tools.  I submitted it well over 24 hours ago and it is still in the "pending" state, and as you might expect, the articles are not appearing in Google News.  Any thoughts?  I searched a bit for "pending sitemap bug" and found a few threads of people discussing it, but no solid resolution.

Thanks,
Todd Lynch
bobbygunn 5/17/12 4:21 PM <This message has been deleted.>
Re: Hashbangs, HTTPS, and Pending Sitemap Questions bobbygunn 5/17/12 4:24 PM
Todd,
The first thing i wold ask is if you are specifically using the "User-agent: Googlebot-news" to pass the news bot to the http instead of https? I know it sounds simple but that may do the trick. Could you post the link of your sitemap for me?

Let me go through some of my notes. I have had a similar problem on bing news with has tags. I am no programmer but i do try to keep good notes on all problems I help solve here and on my site. I will get back to you soon with some ideas here.
Re: Hashbangs, HTTPS, and Pending Sitemap Questions Todd H. Lynch 5/17/12 5:00 PM
I just added the "Googlebot-news" entry to the array and re-submitted.  I also made the search case-insensitive.


function is_bot(){
$botlist = array("Teoma", "alexa", "froogle", "Gigabot", "inktomi",
"looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory",
"Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot",
"crawler", "www.galaxy.com", "Googlebot", "Googlebot-news", "Scooter", "Slurp",
"msnbot", "appie", "FAST", "WebBug", "Spade", "ZyBorg", "rabaz",
"Baiduspider", "Feedfetcher-Google", "TechnoratiSnoop", "Rankivabot",
"Mediapartners-Google", "Sogou web spider", "WebAlta Crawler","TweetmemeBot",
"Butterfly","Twitturls","Me.dium","Twiceler","facebook");

if (isset($_SERVER['HTTP_USER_AGENT'])){
            foreach($botlist as $bot){
                    if(stripos($_SERVER['HTTP_USER_AGENT'],$bot)!==false)
                    return true; // Is a bot
            }
        }
return false; // Not a bot
}
$is_bot = is_bot();

I will update this post when I know if it worked.
Re: Hashbangs, HTTPS, and Pending Sitemap Questions Todd H. Lynch 5/18/12 6:03 AM
And here is the sitemap:

http://www.stlbeacon.org/tools/news_sitemap?v=4

Note that when a non-bot hits an article page on the site it redirects the user to https AND /#!/, but there is logic that does not redirect bots.

Todd


Re: Hashbangs, HTTPS, and Pending Sitemap Questions Todd H. Lynch 5/18/12 6:37 AM
I had submitted the sitemap three times with different query strings.  Now 2 of them have a status of "-" and one of them is still pending.

There are only 30 items in the news sitemap, so I would think that once Google started it could be done in minutes.  Not sure what is up.

Re: Hashbangs, HTTPS, and Pending Sitemap Questions bobbygunn 5/18/12 2:26 PM
Todd,
Sometimes there is a delay in webmaster tools. The best way to keep checking is by a site:sitename search in news.google.com.  This is directly related to gNews not being able to crawl https pages. Give that news bot call a day or two and see what happens.
Re: Hashbangs, HTTPS, and Pending Sitemap Questions Todd H. Lynch 5/21/12 7:36 AM
FROM GOOGLE NEWS =======================================

Hi Todd,


Google News does not crawl sites using HTTP Secure at this time. If this
functionality is removed, then we can crawl and index your site's
articles.

Regards,
The Google News Team 

MY RESPONSE ============================================

Hi Google,

Thanks for the response.  The site works in both HTTP and HTTPS modes, the way we have it currently set up, we redirect users to HTTPS by default, but we allow all robots to crawl the site in either mode.  The URLS that we submitted in the sitemap are to HTTP.

The site is set up loading a javascript framework on the initial page load, and then every page is loaded via JSON.  We choose to force users to HTTPS on initial load so that when they go to donate they do not have to reload the whole framework a second time to get to HTTPS mode.  Users can also log in from any page, and as we move towards a fully integrated user account / donation model, we will want the users login to be secure.

I was hoping that removing the HTTPS redirect for robots and submitting the pages with HTTP in the sitemap would allow Google News to crawl and index the site.  Do you have any suggestions for how I could keep my users data secure and get the site indexed in Google News?

Thanks,
Todd
Re: Hashbangs, HTTPS, and Pending Sitemap Questions bobbygunn 5/21/12 2:08 PM
Todd,

What this is saying is that the redirect is not working like it should. Maybe someone with a more programming related background could help you on that end. I asked a friendto look at your details you had posted before about your redirect. He says it should work. But "should" work and "works" don't always coincide. I have one other person I have asked about this and am waiting on a response. I will post here when I hear back from them.
Re: Hashbangs, HTTPS, and Pending Sitemap Questions Todd H. Lynch 5/21/12 2:42 PM
Hi Todd,

As our crawler currently encounters your News sitemap, it is under HTTPS.
If you can change this, specifically for your sitemap, it should improve
crawling and indexing, while preserving the security of your user data.
Re: Hashbangs, HTTPS, and Pending Sitemap Questions Todd H. Lynch 5/21/12 2:43 PM
MY RESPONSE TO GOOGLE=============================

Ah.  That is because I submitted the URL to the webmaster tools for the HTTPS site.  I have another webmaster tools setup for the HTTP version.  You can see that the sitemap works with either URL:


I have gone into the Webmaster tools for the http://stlbeacon.org and submitted the sitemap, but it tells me that the http site is not included in google news.  Would it be possible to add the http version to google news?  I think I had asked you to change it from http to https in an earlier request when I did not know it was not supported.

One thing I can't figure out with Webmaster tools is that my site is listed as http://stlbeacon.org and I am unable to change the official name tohttp://www.stlbeacon.org.  I am under the impression that each domain in Webmaster tools is considered unique.

Thanks,
Todd
Re: Hashbangs, HTTPS, and Pending Sitemap Questions bobbygunn 5/21/12 9:29 PM
Todd,

You can add both the www version of your site in the same webmaster tools account as you have now for the http site. Add it and verify it and submit the sitemaps there. See if it takes it. If it doesn't i will flag this for Team to take a look and maybe they can update your info.
Re: Hashbangs, HTTPS, and Pending Sitemap Questions Todd H. Lynch 5/22/12 7:34 AM
Thanks.  I now have webmaster tools set up for both HTTP and HTTPS and I submitted the sitemap to the HTTP account.  It currently says the following:


#
Sitemap
TypeProcessedIssuesItemsSubmittedIndexed
1
Sitemap
May 21, 2012
-
Web
12,158
6,920
2
Sitemap
May 21, 2012
-
Web
10
-
News
10
-
I am going to check the logs now and see if Google-News made any requests.

Re: Hashbangs, HTTPS, and Pending Sitemap Questions Todd H. Lynch 5/22/12 9:14 AM
We appear to be in google news, thanks for your help.
Re: Hashbangs, HTTPS, and Pending Sitemap Questions bobbygunn 5/22/12 2:16 PM
Looks like it accepted it so just wait and see if it starts indexing. It can take a day or two sometimes.
Re: Hashbangs, HTTPS, and Pending Sitemap Questions bobbygunn 5/23/12 2:40 PM
Yes, i see you are getting indexed now. If you could send me an email with the redirect protocol you used i would be greatful so i can add it to my files on this matter. You can get my email through my profile pic or you can just post it here.