After a lot of googling, and careful reading of the full specification of making AJAX Application crawlable, I am still puzzle on how the Google bot escapes hashbang fragments when converting them to _escape_fragment_ query parameter.
According to the documentation, %00..20, %23, %25..26, %2B and %7F..FF are escaped. But this is opposed to the sample provided:
Applied strictly the specification would double-escapes the fragment, since special characters are already escaped in a normal fragment. Which would means that a value of "A complex value & special chars +" would be shown as:
which would requires a double decoding. From what I reads up to know, no one clearly mention this, and therefore, I doubt this was the original intent, and as far as I see, apart from badly encoded URL, the only character that could cause an issue during the transition, is the ampersand, which is both a separator of the fragment and a separator of the query parameter.
Therefore, I currently suppose that the full specification is not correct, and that the only character that gets escaped twice is simply the ampersand. This way, the above would be simplified to:
For testing purpose, here is a bookmarklet (if it does not get filtered out...) that I have built to help testing the transition in both direction. It detect in the current browser URL and switch the browser location between the two URL. I currently apply only escaping, and unescaping of the ampersand between the two URL (I also cleanup the hash, considering it as a query string in itself, unencoding and re-encoding it as well)
Does anyone know the real algorithm used by the crawler and could elaborate on the above to help me makes this bookmarklet fully accurate ? (Obviously, I could prepare a real world testing waiting for the crawler to check the hypothesis, but this could be long...)