Some tripping points when you start using Django

I have the opportunity to use Django at work to develop web sites. Most places use PHP, ASP, ASP.NET, JSP, ColdFusion, Perl or Ruby on Rails; Django shops are a rare breed. A quick search for ‘django’ on some of the job sites for the Montreal region yields no results. I am therefore very grateful to have the chance to work with a non-sucky technology (not that I want to imply that the alternatives suck, but I find them deficient.)

We switched away from PHP for several reasons, the most important was that it was difficult for us to quickly go from a small site to a medium-large site with PHP. What I mean by that is that a client originally wants just a small site with a few static pages to advertise his products and/or services, have a contact page, etc. Then a few months later, the customer decides that he wants to be able to enter his products and their prices himself. That means creating a database + tables, creating the CRUD pages to manipulate the data, etc. Django gives us an automatic administration interface, which means we only need to worry about how to present the data on the site; for creation, modification and deletion, the client can just use the admin interface (which is available in French.)

Other reasons we switched were: a more solid and complete standard library, a more powerful language, built-in support for i18n (not sure if PHP has it), the generic views, especially object_list with automatic pagination, bundled modules for comments, syndication and users.

Along with the fun of not having to do a lot of mindless work, switching to Django brought some new pains as well. Here are some things you should expect if you decide to start using Django instead of PHP.

  • Harder deployment: you gotta give it to PHP, deployment is super easy. Since most Apache installations have PHP already installed and configured, all you gotta do is dump your .php files on the server and you’re pretty much done. Django’s deployment is more complicated than that, unfortunately. Using mod_python, you must either set up a VirtualHost or a Location in your httpd.conf file, which most likely requires root access. mod_python also caches your Python scripts, which means that whenever you make a change to a Python file (or a translation file), you need to restart Apache. If you don’t have root access, you can still put the site online with a few .htaccess files (this is undocumented), but it’s a pain and if you need to update some .py files, you will need to get an administrator to restart Apache for you. This is probably my biggest gripe with Django.I have no experience with the other deployment alternatives such as FastCGI, so I can’t speak about them.
  • Django is still in active development: Django has not yet reached 1.0, so the authors and contributors don’t mind breaking backward compatibility here and there. To make sure we don’t have problems, we use the latest stable version, 0.96. When 0.97 comes out however, some code will need to be updated (e.g.: clean_data was renamed cleaned_data in newforms.) Some interesting features are also only available in the development version, so make sure you read the documentation for your version.
  • You explicitly define URLs: Contrary to PHP where you just type the name of your script in the address bar, you need to define all the URLs on your site in Python files (e.g.: urls.py) I actually consider this a good feature of Django; I’ve seen quite a few sites in production on which navigating to phpinfo.php revealed a gold mine of information. For static pages however, it can be cumbersome to edit the URLs module, add a view function which will load the page.
  • The Django template language: The Django template language is not a full-fledged programming language. In general, this is an advantage, this helps make sure you don’t start mixing code and design a little too liberally. Sometimes, however, there is no way you can do an operation in your Python code, it needs to be in the template. For instance, if you have a list of objects and they are paginated, to display all the page numbers with links, you will need to write your own filter to create a list from 1 to pages (template variable which contains the total number of pages.)
  • Some features are undocumented: Django has awesome documentation, but some aspects of the framework, for one reason or another, are undocumented. These include signals and the comment framework.

Screen-scraping made .. kind of easier, I guess

Screen scraping is a somewhat still common technique that script-writers use to take the output of a web page (which wasn’t designed to be consumed by anything but a person viewing it in a web browser), grabs the portions relevant to their purposes, and spits out the parts that it wants. In fact, one of the first plugins for the old moobot project that I wrote was a Google plugin which I thought I was so clever to write using the IE search results version of Google since it was much lighter and easier to parse. It has no ads, no real formatting to speak of, and it was pretty much 80% data to 20% “other crap”, making it easy to scrape out the bits I wanted. Thankfully they started providing an API until, well, see the below post ;)

But, Google has plenty of great web designers at work there, so they don’t have horribly malformed HTML or tons of inline style crap. Most other websites out there aren’t quite in the same boat. In fact, looking at the source of the vast majority of sites will make most competent web designers wince if not cry. Even well-formed HTML is not that easy to parse, and folks often resort to imprecise string matching or nasty regular expressions to get the job done. If only there was a way to get that nasty HTML into a more nicely-parseable format…

Well, I saw this blog post on programming.reddit.com (highly recommended, btw), and apparently he has set up a service that will fetch webpages and transmogrify them into either (presumably well-formed) XML or JSON output, two flavors of output that have become popular with the rise of AJAX. Unfortunately, you still get a lot of crap because, hey, Garbage In, Garbage Out. But, at least it’s crap that’s in a somewhat prettier outfit. Or if not prettier, at least, easier to dig through. However, take heed in the authors plea at the end and don’t hammer the crap out of this.

Google APIs: No SOAP key? No problem.

So, as one of the maintainers (using that term somewhat loosely as I haven’t touched a significant amount of code in there for quite sometime) of Supybot, I’ve heard a lot of complaining about the inability to get new SOAP search API keys. Though there are (were?) some ways around this, by and large most folks who have wanted to use our Google plugin but don’t already have a SOAP search API key were basically out of luck.

To wit, I did write a Yahoo search plugin that people have used and like using. Thankfully they actually actively embrace Python programmers using their APIs and just provide a nice RESTful way of doing things without any domain specific junk. It’s really quite a good plugin and a fairly good search engine and I encourage Supybot users to use it if they don’t have a Google SOAP key.

However, that said, people still want their Google. Unfortunately simply “updating” the Supybot Google plugin isn’t as simple as transliterating things from one API to the other, you see. Instead of providing a nice API like Yahoo did, Google replaced the SOAP API with an AJAX one. That’s fantastic for embedding search boxes on web pages and for ease of rolling out updates (just update the script that everyone’s <script> element points to), but really crappy for the IRC bot plugin authors! Fortunately, behind the scenes of the nasty obfuscated JavaScript there is a RESTful service as well, and it too (like Yahoo’s) returns a JSON-formatted resultset (with a catch that I’ll get to later).

So, what to do if you want to use this nice (invisible to most) service? Here’s the skinny.
Continue reading ‘Google APIs: No SOAP key? No problem.’ »

New jorb

So I landed a position as a consultant at Pariveda Solutions and I start in a little over a week! Hopefully that means there will be more programming talk to be had. In fact, I’m almost sure of it.

Fantasy football site - DB schema design issues

I’ve had the dream of building a better fantasy sports site for a while now, one where I could do all sorts of wacky data manipulation and feature addition. Unfortunately, getting such a site up and running seems to have two conflicting steps: 1. I want to have something up and running that I can dig into relatively quickly, but 2. I want it to be incredibly flexible and extensible. So, I’ve decided that #1 is the more important one, but that doesn’t mean I can’t continue to think about how to do #2. And a lot of #2 is going to require DB schema changes, because obviously if you want a site to be more dynamic/flexible and to allow the users to control more things, then you need more fields in the database to store these changes.

The specific example this time is in dealing with what statistics to track and how to score them. Most popular fantasy leagues only track a very small subset of the available stats on a given sport. In football, it’s typically a list like:

  • QB: pass yds, pass TDs, interceptions
  • RB/WR/TE: rec yds, rec TDs
  • QB/RB/WR/TE: rush yds, rush TDs, fumbles lost, 2pt conversions
  • K: FG made/missed in specified ranges (0-19 yds, 20-29 yds, … up to 50 yards or more)
  • DEF: fumble recoveries, defensive touchdowns, sacks, interceptions

But there are still plenty of other stats out there, especially in leagues that use different positions (like ones that use IDPs instead of team defenses). So, my ultimate end would be to have a flexible place to store each statistic and its info, though for now I’m happy with something that just the ones I need to run my leagues.

The question is, how exactly would such a structure look in the database? Keeping in mind that some of these stats are fundamentally different, and some have some scoring quirks, I’ve come up with three separate stat “types”: the Range Stat, the Rate Stat, and the Rate Stat With Bonuses.

The Range Stat would be something like “Points Allowed” for a team defense or “FG Made” for a kicker. I think, rather than having specific stats like “FG1-19″ (number of field goals made that were between 1 and 19 yards), I should just be able to say “he made an 18 yard field goal” and let the scoring rules for each individual league adjust it accordingly, instead of shoe-horning it into the FG1-19 stat for one league and the FG1-29 for another. Same thing with defensive points allowed, I’d like to just be able to enter each team’s scores for a game and then have it slot accordingly.

Rate Stats are the bread and butter of fantasy. They are the “6 pts per rush TD” and “0.1 pts per rush yd” stats. There’s not much magic here, really.

Rate Stats With Bonuses are less common, but useful. Some leagues choose to offer a bonus for big games like instead of just “0.1 pts per rush yd” they’ll have “0.1 pts per rush yd, +5 pts at 100 yds”. Not a whole lot of magic here either, as the bonus is applied during the score evaluation and not stored in the database itself.

But the question is, how do I represent these in the DB without having static columns for each possible stat? Is that possible? I’d like to be able to potentially add new stats without having to alter table structures. I’d rather not have to just have a huge table with nothing but default values for things that I’m not using at the moment. I’m wondering if there’s some sort of generic pattern for doing this or maybe I just haven’t thought hard enough about it. I’ll post details about the current schema in a later post, perhaps.

A programming-centric blog

I think if I chart out some of my thoughts/projects on a blog I’ll be more inclined to actually stick to some of them or even to complete them. So instead of littering my personal blog with this stuff, I’m gonna use this blog as a place to stick code snippets and/or thoughts about some of the coding projects I’m working on.