Archive for the ‘web’ Category.

Some tripping points when you start using Django

I have the opportunity to use Django at work to develop web sites. Most places use PHP, ASP, ASP.NET, JSP, ColdFusion, Perl or Ruby on Rails; Django shops are a rare breed. A quick search for ‘django’ on some of the job sites for the Montreal region yields no results. I am therefore very grateful to have the chance to work with a non-sucky technology (not that I want to imply that the alternatives suck, but I find them deficient.)

We switched away from PHP for several reasons, the most important was that it was difficult for us to quickly go from a small site to a medium-large site with PHP. What I mean by that is that a client originally wants just a small site with a few static pages to advertise his products and/or services, have a contact page, etc. Then a few months later, the customer decides that he wants to be able to enter his products and their prices himself. That means creating a database + tables, creating the CRUD pages to manipulate the data, etc. Django gives us an automatic administration interface, which means we only need to worry about how to present the data on the site; for creation, modification and deletion, the client can just use the admin interface (which is available in French.)

Other reasons we switched were: a more solid and complete standard library, a more powerful language, built-in support for i18n (not sure if PHP has it), the generic views, especially object_list with automatic pagination, bundled modules for comments, syndication and users.

Along with the fun of not having to do a lot of mindless work, switching to Django brought some new pains as well. Here are some things you should expect if you decide to start using Django instead of PHP.

  • Harder deployment: you gotta give it to PHP, deployment is super easy. Since most Apache installations have PHP already installed and configured, all you gotta do is dump your .php files on the server and you’re pretty much done. Django’s deployment is more complicated than that, unfortunately. Using mod_python, you must either set up a VirtualHost or a Location in your httpd.conf file, which most likely requires root access. mod_python also caches your Python scripts, which means that whenever you make a change to a Python file (or a translation file), you need to restart Apache. If you don’t have root access, you can still put the site online with a few .htaccess files (this is undocumented), but it’s a pain and if you need to update some .py files, you will need to get an administrator to restart Apache for you. This is probably my biggest gripe with Django.I have no experience with the other deployment alternatives such as FastCGI, so I can’t speak about them.
  • Django is still in active development: Django has not yet reached 1.0, so the authors and contributors don’t mind breaking backward compatibility here and there. To make sure we don’t have problems, we use the latest stable version, 0.96. When 0.97 comes out however, some code will need to be updated (e.g.: clean_data was renamed cleaned_data in newforms.) Some interesting features are also only available in the development version, so make sure you read the documentation for your version.
  • You explicitly define URLs: Contrary to PHP where you just type the name of your script in the address bar, you need to define all the URLs on your site in Python files (e.g.: urls.py) I actually consider this a good feature of Django; I’ve seen quite a few sites in production on which navigating to phpinfo.php revealed a gold mine of information. For static pages however, it can be cumbersome to edit the URLs module, add a view function which will load the page.
  • The Django template language: The Django template language is not a full-fledged programming language. In general, this is an advantage, this helps make sure you don’t start mixing code and design a little too liberally. Sometimes, however, there is no way you can do an operation in your Python code, it needs to be in the template. For instance, if you have a list of objects and they are paginated, to display all the page numbers with links, you will need to write your own filter to create a list from 1 to pages (template variable which contains the total number of pages.)
  • Some features are undocumented: Django has awesome documentation, but some aspects of the framework, for one reason or another, are undocumented. These include signals and the comment framework.

Screen-scraping made .. kind of easier, I guess

Screen scraping is a somewhat still common technique that script-writers use to take the output of a web page (which wasn’t designed to be consumed by anything but a person viewing it in a web browser), grabs the portions relevant to their purposes, and spits out the parts that it wants. In fact, one of the first plugins for the old moobot project that I wrote was a Google plugin which I thought I was so clever to write using the IE search results version of Google since it was much lighter and easier to parse. It has no ads, no real formatting to speak of, and it was pretty much 80% data to 20% “other crap”, making it easy to scrape out the bits I wanted. Thankfully they started providing an API until, well, see the below post ;)

But, Google has plenty of great web designers at work there, so they don’t have horribly malformed HTML or tons of inline style crap. Most other websites out there aren’t quite in the same boat. In fact, looking at the source of the vast majority of sites will make most competent web designers wince if not cry. Even well-formed HTML is not that easy to parse, and folks often resort to imprecise string matching or nasty regular expressions to get the job done. If only there was a way to get that nasty HTML into a more nicely-parseable format…

Well, I saw this blog post on programming.reddit.com (highly recommended, btw), and apparently he has set up a service that will fetch webpages and transmogrify them into either (presumably well-formed) XML or JSON output, two flavors of output that have become popular with the rise of AJAX. Unfortunately, you still get a lot of crap because, hey, Garbage In, Garbage Out. But, at least it’s crap that’s in a somewhat prettier outfit. Or if not prettier, at least, easier to dig through. However, take heed in the authors plea at the end and don’t hammer the crap out of this.

Google APIs: No SOAP key? No problem.

So, as one of the maintainers (using that term somewhat loosely as I haven’t touched a significant amount of code in there for quite sometime) of Supybot, I’ve heard a lot of complaining about the inability to get new SOAP search API keys. Though there are (were?) some ways around this, by and large most folks who have wanted to use our Google plugin but don’t already have a SOAP search API key were basically out of luck.

To wit, I did write a Yahoo search plugin that people have used and like using. Thankfully they actually actively embrace Python programmers using their APIs and just provide a nice RESTful way of doing things without any domain specific junk. It’s really quite a good plugin and a fairly good search engine and I encourage Supybot users to use it if they don’t have a Google SOAP key.

However, that said, people still want their Google. Unfortunately simply “updating” the Supybot Google plugin isn’t as simple as transliterating things from one API to the other, you see. Instead of providing a nice API like Yahoo did, Google replaced the SOAP API with an AJAX one. That’s fantastic for embedding search boxes on web pages and for ease of rolling out updates (just update the script that everyone’s <script> element points to), but really crappy for the IRC bot plugin authors! Fortunately, behind the scenes of the nasty obfuscated JavaScript there is a RESTful service as well, and it too (like Yahoo’s) returns a JSON-formatted resultset (with a catch that I’ll get to later).

So, what to do if you want to use this nice (invisible to most) service? Here’s the skinny.
Continue reading ‘Google APIs: No SOAP key? No problem.’ »