Blog reviews
I think in an attempt to have good, quality content, before we publish a post, we should make somebody else proof read it for spelling mistaeks, content accuracy, etc. What do you think?
Archive for September 2007
I think in an attempt to have good, quality content, before we publish a post, we should make somebody else proof read it for spelling mistaeks, content accuracy, etc. What do you think?
I have the opportunity to use Django at work to develop web sites. Most places use PHP, ASP, ASP.NET, JSP, ColdFusion, Perl or Ruby on Rails; Django shops are a rare breed. A quick search for ‘django’ on some of the job sites for the Montreal region yields no results. I am therefore very grateful to have the chance to work with a non-sucky technology (not that I want to imply that the alternatives suck, but I find them deficient.)
We switched away from PHP for several reasons, the most important was that it was difficult for us to quickly go from a small site to a medium-large site with PHP. What I mean by that is that a client originally wants just a small site with a few static pages to advertise his products and/or services, have a contact page, etc. Then a few months later, the customer decides that he wants to be able to enter his products and their prices himself. That means creating a database + tables, creating the CRUD pages to manipulate the data, etc. Django gives us an automatic administration interface, which means we only need to worry about how to present the data on the site; for creation, modification and deletion, the client can just use the admin interface (which is available in French.)
Other reasons we switched were: a more solid and complete standard library, a more powerful language, built-in support for i18n (not sure if PHP has it), the generic views, especially object_list with automatic pagination, bundled modules for comments, syndication and users.
Along with the fun of not having to do a lot of mindless work, switching to Django brought some new pains as well. Here are some things you should expect if you decide to start using Django instead of PHP.
Screen scraping is a somewhat still common technique that script-writers use to take the output of a web page (which wasn’t designed to be consumed by anything but a person viewing it in a web browser), grabs the portions relevant to their purposes, and spits out the parts that it wants. In fact, one of the first plugins for the old moobot project that I wrote was a Google plugin which I thought I was so clever to write using the IE search results version of Google since it was much lighter and easier to parse. It has no ads, no real formatting to speak of, and it was pretty much 80% data to 20% “other crap”, making it easy to scrape out the bits I wanted. Thankfully they started providing an API until, well, see the below post
But, Google has plenty of great web designers at work there, so they don’t have horribly malformed HTML or tons of inline style crap. Most other websites out there aren’t quite in the same boat. In fact, looking at the source of the vast majority of sites will make most competent web designers wince if not cry. Even well-formed HTML is not that easy to parse, and folks often resort to imprecise string matching or nasty regular expressions to get the job done. If only there was a way to get that nasty HTML into a more nicely-parseable format…
Well, I saw this blog post on programming.reddit.com (highly recommended, btw), and apparently he has set up a service that will fetch webpages and transmogrify them into either (presumably well-formed) XML or JSON output, two flavors of output that have become popular with the rise of AJAX. Unfortunately, you still get a lot of crap because, hey, Garbage In, Garbage Out. But, at least it’s crap that’s in a somewhat prettier outfit. Or if not prettier, at least, easier to dig through. However, take heed in the authors plea at the end and don’t hammer the crap out of this.
So, as one of the maintainers (using that term somewhat loosely as I haven’t touched a significant amount of code in there for quite sometime) of Supybot, I’ve heard a lot of complaining about the inability to get new SOAP search API keys. Though there are (were?) some ways around this, by and large most folks who have wanted to use our Google plugin but don’t already have a SOAP search API key were basically out of luck.
To wit, I did write a Yahoo search plugin that people have used and like using. Thankfully they actually actively embrace Python programmers using their APIs and just provide a nice RESTful way of doing things without any domain specific junk. It’s really quite a good plugin and a fairly good search engine and I encourage Supybot users to use it if they don’t have a Google SOAP key.
However, that said, people still want their Google. Unfortunately simply “updating” the Supybot Google plugin isn’t as simple as transliterating things from one API to the other, you see. Instead of providing a nice API like Yahoo did, Google replaced the SOAP API with an AJAX one. That’s fantastic for embedding search boxes on web pages and for ease of rolling out updates (just update the script that everyone’s <script> element points to), but really crappy for the IRC bot plugin authors! Fortunately, behind the scenes of the nasty obfuscated JavaScript there is a RESTful service as well, and it too (like Yahoo’s) returns a JSON-formatted resultset (with a catch that I’ll get to later).
So, what to do if you want to use this nice (invisible to most) service? Here’s the skinny.
Continue reading ‘Google APIs: No SOAP key? No problem.’ »
So I landed a position as a consultant at Pariveda Solutions and I start in a little over a week! Hopefully that means there will be more programming talk to be had. In fact, I’m almost sure of it.