Google APIs: No SOAP key? No problem.

So, as one of the maintainers (using that term somewhat loosely as I haven’t touched a significant amount of code in there for quite sometime) of Supybot, I’ve heard a lot of complaining about the inability to get new SOAP search API keys. Though there are (were?) some ways around this, by and large most folks who have wanted to use our Google plugin but don’t already have a SOAP search API key were basically out of luck.

To wit, I did write a Yahoo search plugin that people have used and like using. Thankfully they actually actively embrace Python programmers using their APIs and just provide a nice RESTful way of doing things without any domain specific junk. It’s really quite a good plugin and a fairly good search engine and I encourage Supybot users to use it if they don’t have a Google SOAP key.

However, that said, people still want their Google. Unfortunately simply “updating” the Supybot Google plugin isn’t as simple as transliterating things from one API to the other, you see. Instead of providing a nice API like Yahoo did, Google replaced the SOAP API with an AJAX one. That’s fantastic for embedding search boxes on web pages and for ease of rolling out updates (just update the script that everyone’s <script> element points to), but really crappy for the IRC bot plugin authors! Fortunately, behind the scenes of the nasty obfuscated JavaScript there is a RESTful service as well, and it too (like Yahoo’s) returns a JSON-formatted resultset (with a catch that I’ll get to later).

So, what to do if you want to use this nice (invisible to most) service? Here’s the skinny.

The “new” search system is called “UDS” for “User Distributed Search”. I don’t know what that’s supposed to mean, but man am I tired of seeing the UDS acronym all over that JavaScript file I linked to above. I’m actually in the process of untangling/de-obfuscating it at the moment, but that’s only tangentially relevant to this post. Really all you need to know to use this service is how to construct the proper URL and what the results mean. But first things first, you’re going to need one of the AJAX keys to do it so go grab one first.

Now, the URL structure is pretty simple, the “base URL” is just:

http://www.google.com/uds/GwebSearch?

In fact, that in and of itself is a valid search URL. Go ahead and try it if you want, and you’ll see an example of an error result. For real application uses, you’ll want to construct the proper query string that goes next. Here’s what you need to put in:

  • callback = for now just use GwebSearch.RawCompletion here, though once I detangle the JavaScript I may figure out other possible values here (required)
  • context = not sure about this yet either, but using 0 as the value has worked for me pretty well (required)
  • hl = the “locale” you want to search in, for English-speaking folks, it’s en. For any other locale I’m guessing it matches this listing. Entries that aren’t on that listing seem to use some sensible default (likely en).
  • key = this is the big API key you got when you signed up (required)
  • v = the “version” code, I just use 1.0 (required)
  • q = the actual query string, formatted for URLs (replace spaces with + signs, etc.) (required)
  • rsz = the “result size” you want, the valid values I see are large and small, with the default being small. Small returns 4 results, large returns 8 results. (optional)

There are various other arguments that you can add, but those are the most relevant and the ones I understand best. So, a quick example URL for a search for “moo” would be:

http://www.google.com/uds/GwebSearch?callback=GwebSearch.RawCompletion&context=0&hl=en&key=<key>&v=1.0&q=moo&rsz=large

Obviously, you’d put your key in instead of just <key>, but the rest is all exactly what you’d put in for a search for “moo” where you wanted the most results. One important thing to note though is that the order of these arguments does not matter. As long as they are all key=value pairs separated by & there should be no problem.

Great, so now what? Well, the result you get back from a GET (note: this is important, it must be a GET and not a POST) on that URL, it looks something like this:

GwebSearch.RawCompletion('0',{"results": <stuff>}, 200, null, 200)

Of course, it’ll be much longer because <stuff> will actually be the important stuff we’re looking for. But this is the catch I spoke of earlier. Yes, this is a properly-formatted JSON response. But, most JSON parsers will choke on it, because while it does represent a JavaScript object, it doesn’t know what the heck a GwebSearch.RawCompletion is without a little context! Fortunately, basically all we are interested in is the dictionary “results”, and we can just parse that out and give that to our JSON parser instead.

So, let’s look at what the “results” dictionary looks like for this search:


{"results":
[
{"GsearchResultClass":"GwebSearch",
"unescapedUrl":"http://www.moo.com/",
"url":"http://www.moo.com/",
"visibleUrl":"www.moo.com",
"cacheUrl":"http://www.google.com/search?q\u003Dcache:DsfEzWoDcLoJ:www.moo.com",
"title":"\u003Cb\u003EMOO\u003C/b\u003E | We love to print",
"titleNoFormatting":"MOO | We love to print",
"content":"\u003Cb\u003EMOO\u003C/b\u003E Print Limited, registered in England; company number 5121723; registered office 6 Bakers Yard, Bakers Row, London, EC1R 3DD \u003Cb\u003E...\u003C/b\u003E"
},
{"GsearchResultClass":"GwebSearch",
"unescapedUrl":"http://en.wikipedia.org/wiki/MOO",
"url":"http://en.wikipedia.org/wiki/MOO",
"visibleUrl":"en.wikipedia.org",
"cacheUrl":"http://www.google.com/search?q\u003Dcache:O-LNoqQLso8J:en.wikipedia.org",
"title":"\u003Cb\u003EMOO\u003C/b\u003E - Wikipedia, the free encyclopedia",
"titleNoFormatting":"MOO - Wikipedia, the free encyclopedia",
"content":"A \u003Cb\u003EMOO\u003C/b\u003E (MUD object oriented) is a type of MUD and is a text-based online virtual reality system to which multiple users are connected at the same time. \u003Cb\u003E...\u003C/b\u003E"
},
<6 more similar results here, snipped for brevity>
],
“cursor”:{}
}

So basically it’s a dictionary with two elements: results and cursor. We’re not interested in the cursor, just the results, which maps to a list of dictionaries itself. In each of those dictionaries you’ve got several fields:

  • GSearchResultClass = for web searches, it looks like this will always be GwebSearch
  • unescapedUrl = it is what it sounds like
  • url = also what it sounds like
  • visibleUrl = not entirely sure what precisely this is supposed to be other than the main site URL
  • cacheUrl = a link to the URL for the search result as cached by Google, very cool
  • title = duh
  • titleNoFormatting = duh (see what I did there?)
  • content = The standard 2-line-or-so block of text describing the URL

Now, with this in mind, with any handy JSON parsing library, you can code up your very own Google search app that uses the new AJAX keys instead of the SOAP API keys. In an upcoming post I’ll do a Python example.

3 Comments

  1. Reborg:

    Hi,

    “callback =” enables JSON with padding support and specifically allows Google Search to return results wrapped in a javascript method call. In this form the result can be used as argument for an eval() javascript call. If GwebSearch.RawCompletion has been initialized as a function closure (for example alert(’This is the first argument: ‘ + arguments[0])) and you use the URL you’ve mentioned above as source for a tag to append to the document head, what you’ll end up with is that the evaluation of the script will show the alert box.

    I suspect that the RawCompletion attribute of the GwebSearch object is initialized by the Google AJAX API with some function that is then used to handle the response. As far as you don’t use this URL request in some automated server side script, I think you won’t break the Google Term of Use.

  2. junkevil:

    hi, i run a supybot for the irc of cinemageddon (b and cult movie torrent tracker) i’m having problems with the ajax api and thats how i came across your page. any chance you could compile a python plugin for an ajax key? i’d be very interested.

  3. Daniel:

    Hi,

    It so easier to work with JSON then with JavaScript. Thanks a lot. Is there a way to implement “Next Page” functionality?

Leave a comment