Author Archive

Adding UI to make up for your UI

Sometimes applications don’t respond because, well, they’re busy. But only the most special of apps treat this not as a bug, but something to put up a “nothing to see here, move along” notification:

Microsoft Visual Studio is Busy

I mean, the sad part is I actually understand why they likely implemented this instead. At this point in Visual Studio’s lifecycle, it’s easier just to poll to check if the UI is unresponsive than to actually root out the cause of the unresponsiveness. It’s sad, actually. And the fun part is, in preparation for this blog article I didn’t even have to try very hard to get that little balloon to pop up. Took me about 30 seconds (and no I’m not going through the trouble of reporting it to Microsoft because it’s not entirely repeatable and I’m skeptical of what the results would be anyway). Go Visual Studio.

Another SSIS Wart

As I get more and more educated on SSIS, I’m becoming more appreciative of what it does (or in some cases, intends to do) overall but I’m also finding some of the weirdest things that are just a little “off”. Most are just the kind of things that make you go “huh, that’s strange”, but I found one yesterday that made me seriously stop and scratch my head because I was just completely lost as to why things would be done this way. In fact, if it weren’t for the Wrox SSIS book that I got, I probably wouldn’t have figured out the answer.

I’ll start this off with a quick question for you to ponder the answer to before I go into the details — What is a “property” on an object other than essentially just a variable defined only within the scope of that object? Rather than answer that question just yet, here’s the problem I was scratching my head over. In SSIS, they have some nifty pre-defined events which you can easily attach event handlers to. Thanks to Jamie Thomson’s blog post entitled “Custom Logging Using Event Handlers” I’ve been happily coding away logging routines using the OnPreExecute and OnPostExecute event handlers (among a few others). But, I soon ran into a problem that wasn’t addressed in the blog post, and that was that every event that fires propagates up the call chain, regardless of whether or not the event is handled[*]. That is to say, if I have an SSIS package that has (”contains”) a task which loads a file into a database, and I define an event handler for OnPreExecute that simply writes a row to an event log that says “Started: “, then once it gets to the file load task, the database will contain the following:

  1. Started: Main package
  2. Started: File load
  3. Started: Main package

Obviously that last row is a bit silly because the main package didn’t just start, it’s in the middle of executing all of its tasks. That’s because the PreExecute event propagated up from the file load task. Now, one might argue that propagating that particular event doesn’t make a lot of sense because you only ever PreExecute once for each task, and it certainly has nothing to do with when the child tasks do anything. But, for the sake of consistency I can forgive that. If that was the wart, then that’d just be a “huh, thats strange” kind of wart. After all, you certainly do want Error and Warning events to propagate up so you can just define event handlers for those in the parents.

So, at this point I figure there has to be some way to prevent the upward propagation of some events, even if I have to specify it per event handler. The first place I looked was in the properties for the event handler. The only thing that looked mildly promising was the LoggingMode property, but that was limited to “Use Parent Setting”, “Enabled”, or “Disabled” - nothing about “handle and stop propagating”. So, the next place I checked was the “expressions” for the event handler. Expressions in SSIS are basically ways of setting properties that are dynamic (ie, can change at/during runtime), and they’re very useful and generally contain most of the properties anyway, plus an occasional extra tidbit. Well, no extra helpful tidbits in this case, though I was positive that was where it would be.

What does any good programmer do when they’ve exhausted their available local resources? Google it, of course. Unfortunately no matter what search string I used (and my Google-fu is pretty good), I couldn’t find anything that would tell me how to do what I want. Plus, the only other SME on SSIS in the company that I’m aware of was unavailable to ask, and I’m about to outgrow using him as a reference anyway. Then I remembered the aforementioned book, and sure enough one of the first things in the index under events was “bubbling”. And in less than 2 minutes of reading, I had my solution.

Now for the solution and then a quick revisit to the question I kicked this story off with. The solution: there is a “system variable” defined at the error handler scope level entitled Propagate, which if you set it to False will not propagate the event to its parent(s). How the heck is a system variable defined only within the scope of a single object (and obviously only useful within that scope) a better choice than a property? Properties are displayed in a nice little grid that’s easily accessible. System variables are hidden behind several clicks in the interface. I’d wager that 95% or more of SSIS packages never have to muck about with them, and none of the dozen or more packages I’ve worked on have ever needed to do so (until now!), in spite of all the functionality I’ve included. Thankfully, I’m not the only one who had such an issue (I didn’t find that blog post until I started writing this one, and he doesn’t have the solution in his post, but Jamie Thomson posted a response which he found).

And to make matters worse I just realized that this means I’ll have to define that variable for all children within a package otherwise I’ll get a lot of “Started: Main package” messages, one for each sub-task (which is a lot, including checking for input, truncating tables, etc.). According to the blog post and response linked at the end of the last paragraph, there’s no way to tell an event handler to not listen to propagated events either. Sigh.

[*] - it also doesn’t help that Microsoft’s documentation has a very misleading/incorrect diagram showing event propagation stopping at the first handled event

The Nerd Handbook

So that we don’t get into too deep of a posting morass, here’s a good link that was posted on reddit (though I found it via the SomethingAwful forums):

http://www.randsinrepose.com/archives/2007/11/11/the_nerd_handbook.html

RFC: Gearing an Open Source talk to a Microsoft heavy crowd

When I was hired on at my new job, I was apparently referred to internally as “the open source guy”. Now, that’s a moniker I don’t really mind because I do espouse the open source/free software philosophy and I definitely see it as being useful both from a personal standpoint and as a professional developer. Well the new gig heavily encourages knowledge-sharing and building of “intellectual capital” via presentations and such, so I thought that open source as a community, development strategy, and philosophy might be a worthwhile thing to share. So, this is my request for comments on ideas for what to present. But rather than just asking that question, here’s sort of what I had in mind.

First I’d have to do some definition/demystifying up front: answering the who/what/why’s and all that, because this is a heavy Microsoft crowd, though not so much that they are fanboys or anything like that (well, some might be but they likely won’t attend) it’s just what they know and that’s what the clients run on. I know at least a few people, including some of the partners, are genuinely interested in open source not only as a business strategy but as a community. So I think the first few big talking points in this section would be:

  • What is open source? (I’d like to stick mostly with definitions here and not delve too much into history and/or the fractious issue of what constitues “free” or “open source” software)
  • What open source projects they might be aware of, even if they don’t know they are open source (Linux, Mozilla, Java (yes, Java), PHP, Subversion, etc.)
  • What open source projects affect them that they are likely unaware of (Apache, Python, MySQL, Mediawiki (wikipedia))
  • Why open source? (Why do developers give away code? Why use it?)

And that would be the bulk of the presentation. From there I can come up with a more directed approach talking about how my company can leverage the advantages of open source to help increase productivity/add value/etc. Really I just wanted to hear what people thought about the first part and whether or not I was missing anything or perhaps should take some stuff out. Though if anyone has any feedback for what open source could mean for a small IT consulting firm, then by all means, fire away :)

Not a WTF, but certainly a wart

As you may know, I’m not that much of a fan of Microsoft technology, but understand that it’s somewhat unavoidable and allowing for the use of MS products greatly opens the prospects of employment/work. So, at my new gig I’m aware that a healthy chunk of time spent doing development will be using Microsoft tools and all the quirks that go along with them. Sometimes I’m pleasantly surprised, sometimes I’m reminded of why I dislike Microsoft products in general. Lately it’s been more of the latter.

On the current client I’m designing stuff for, we’re using MS SQL Server 2005’s SSIS toolchain, which is a great idea with mediocre execution. For those who are unfamiliar with SSIS, it’s basically the evolution of their DTS package that came with SQL Server 2000, which is designed to simplify the ETL process of getting data from a source (or various sources) into the destination database after all sorts of cleansing and transformation. The DTS package that shipped with SQL Server 2000 was apparently pretty warty, a trait that is common amongst first-gen Microsoft products, and so SSIS is apparently notably cleaner and nicer. But, as most of us are aware, that doesn’t mean it’s wart-free, and I’ve found a nasty wart or two.
Continue reading ‘Not a WTF, but certainly a wart’ »

Back into the Django fray for a bit

To sort of go along with the fantasy football related programming post from earlier, I also like to do a set of “power rankings” for the NFL using a fairly simple but data-heavy formula that I derived via a bit of trial and error and really just some thought exercises. For the past season or so I just maintained it using a simple spreadsheet (at first Excel, then Gnumeric once I converted my laptop to Linux after getting the new job), but even when I was doing it I noticed data inconsistency which almost definitely resulted from data entry errors on my part. For example, part of the formula involves a teams “points for” (points they score against other teams) and “points against” (points other teams score against them). Well, when you sum up the “points for” for all the teams, and when you sum up the “points against” for all of the teams, you should get the same number. But sometimes I wouldn’t. Those errors weren’t quite as nefarious to track down as other things like in my strength of schedule calculations the total W-L record of one component should be the mirror opposite of another, but it would be missing some wins somewhere.
Continue reading ‘Back into the Django fray for a bit’ »

Screen-scraping made .. kind of easier, I guess

Screen scraping is a somewhat still common technique that script-writers use to take the output of a web page (which wasn’t designed to be consumed by anything but a person viewing it in a web browser), grabs the portions relevant to their purposes, and spits out the parts that it wants. In fact, one of the first plugins for the old moobot project that I wrote was a Google plugin which I thought I was so clever to write using the IE search results version of Google since it was much lighter and easier to parse. It has no ads, no real formatting to speak of, and it was pretty much 80% data to 20% “other crap”, making it easy to scrape out the bits I wanted. Thankfully they started providing an API until, well, see the below post ;)

But, Google has plenty of great web designers at work there, so they don’t have horribly malformed HTML or tons of inline style crap. Most other websites out there aren’t quite in the same boat. In fact, looking at the source of the vast majority of sites will make most competent web designers wince if not cry. Even well-formed HTML is not that easy to parse, and folks often resort to imprecise string matching or nasty regular expressions to get the job done. If only there was a way to get that nasty HTML into a more nicely-parseable format…

Well, I saw this blog post on programming.reddit.com (highly recommended, btw), and apparently he has set up a service that will fetch webpages and transmogrify them into either (presumably well-formed) XML or JSON output, two flavors of output that have become popular with the rise of AJAX. Unfortunately, you still get a lot of crap because, hey, Garbage In, Garbage Out. But, at least it’s crap that’s in a somewhat prettier outfit. Or if not prettier, at least, easier to dig through. However, take heed in the authors plea at the end and don’t hammer the crap out of this.

Google APIs: No SOAP key? No problem.

So, as one of the maintainers (using that term somewhat loosely as I haven’t touched a significant amount of code in there for quite sometime) of Supybot, I’ve heard a lot of complaining about the inability to get new SOAP search API keys. Though there are (were?) some ways around this, by and large most folks who have wanted to use our Google plugin but don’t already have a SOAP search API key were basically out of luck.

To wit, I did write a Yahoo search plugin that people have used and like using. Thankfully they actually actively embrace Python programmers using their APIs and just provide a nice RESTful way of doing things without any domain specific junk. It’s really quite a good plugin and a fairly good search engine and I encourage Supybot users to use it if they don’t have a Google SOAP key.

However, that said, people still want their Google. Unfortunately simply “updating” the Supybot Google plugin isn’t as simple as transliterating things from one API to the other, you see. Instead of providing a nice API like Yahoo did, Google replaced the SOAP API with an AJAX one. That’s fantastic for embedding search boxes on web pages and for ease of rolling out updates (just update the script that everyone’s <script> element points to), but really crappy for the IRC bot plugin authors! Fortunately, behind the scenes of the nasty obfuscated JavaScript there is a RESTful service as well, and it too (like Yahoo’s) returns a JSON-formatted resultset (with a catch that I’ll get to later).

So, what to do if you want to use this nice (invisible to most) service? Here’s the skinny.
Continue reading ‘Google APIs: No SOAP key? No problem.’ »

New jorb

So I landed a position as a consultant at Pariveda Solutions and I start in a little over a week! Hopefully that means there will be more programming talk to be had. In fact, I’m almost sure of it.

Fantasy football site - DB schema design issues

I’ve had the dream of building a better fantasy sports site for a while now, one where I could do all sorts of wacky data manipulation and feature addition. Unfortunately, getting such a site up and running seems to have two conflicting steps: 1. I want to have something up and running that I can dig into relatively quickly, but 2. I want it to be incredibly flexible and extensible. So, I’ve decided that #1 is the more important one, but that doesn’t mean I can’t continue to think about how to do #2. And a lot of #2 is going to require DB schema changes, because obviously if you want a site to be more dynamic/flexible and to allow the users to control more things, then you need more fields in the database to store these changes.

The specific example this time is in dealing with what statistics to track and how to score them. Most popular fantasy leagues only track a very small subset of the available stats on a given sport. In football, it’s typically a list like:

  • QB: pass yds, pass TDs, interceptions
  • RB/WR/TE: rec yds, rec TDs
  • QB/RB/WR/TE: rush yds, rush TDs, fumbles lost, 2pt conversions
  • K: FG made/missed in specified ranges (0-19 yds, 20-29 yds, … up to 50 yards or more)
  • DEF: fumble recoveries, defensive touchdowns, sacks, interceptions

But there are still plenty of other stats out there, especially in leagues that use different positions (like ones that use IDPs instead of team defenses). So, my ultimate end would be to have a flexible place to store each statistic and its info, though for now I’m happy with something that just the ones I need to run my leagues.

The question is, how exactly would such a structure look in the database? Keeping in mind that some of these stats are fundamentally different, and some have some scoring quirks, I’ve come up with three separate stat “types”: the Range Stat, the Rate Stat, and the Rate Stat With Bonuses.

The Range Stat would be something like “Points Allowed” for a team defense or “FG Made” for a kicker. I think, rather than having specific stats like “FG1-19″ (number of field goals made that were between 1 and 19 yards), I should just be able to say “he made an 18 yard field goal” and let the scoring rules for each individual league adjust it accordingly, instead of shoe-horning it into the FG1-19 stat for one league and the FG1-29 for another. Same thing with defensive points allowed, I’d like to just be able to enter each team’s scores for a game and then have it slot accordingly.

Rate Stats are the bread and butter of fantasy. They are the “6 pts per rush TD” and “0.1 pts per rush yd” stats. There’s not much magic here, really.

Rate Stats With Bonuses are less common, but useful. Some leagues choose to offer a bonus for big games like instead of just “0.1 pts per rush yd” they’ll have “0.1 pts per rush yd, +5 pts at 100 yds”. Not a whole lot of magic here either, as the bonus is applied during the score evaluation and not stored in the database itself.

But the question is, how do I represent these in the DB without having static columns for each possible stat? Is that possible? I’d like to be able to potentially add new stats without having to alter table structures. I’d rather not have to just have a huge table with nothing but default values for things that I’m not using at the moment. I’m wondering if there’s some sort of generic pattern for doing this or maybe I just haven’t thought hard enough about it. I’ll post details about the current schema in a later post, perhaps.