Tuesday, November 27, 2012

Getting my grind on

I spent all today number crunching. I spent all of yesterday on the plane wrestling with my database, trying to get it to conform to my ambitions, and only admitting defeat when we started our descent to Changi. I often suffer from a dilemma: am I trying to do something in the hardest way possible, and should really be giving it a rest and doing something else, or am I suffering from a paucity of willpower and just need to force myself to stick to the task for a little longer?

Today I turned my computer back on and found it was the former, for which I should rejoice. My brain, befuddled by sleep deprivation, a change in altitude and an ipod that kept playing the Rolling Stones' greatest hits, had overlooked a fairly obvious division by zero. I can't get no satisfaction, indeed.

I'm trying to replicate in a database an enormous spreadsheet. This is a useful thing to do because even on the most unfeasibly high powered computer, Excel chokes and wheezes when you're doing things with 5 million cells. I'm not shaving a few minutes off a process here: I'm changing a task from something that takes half a day and all your attention to run, into something that runs in 2 minutes. At that point it's something qualitatively different: if the cadence with which you can change your model approaches once a minute, instead of once a day, you'll give yourself a much better chance of locating the best possible solution.

Also, you'll leave the office being able to say you achieved something more interesting than cutting and pasting from one spreadsheet to another, and then waiting four hours. So fiddling with data can give you a happier life in the office, and if not, then at least you should be able to leave quicker.

Sadly, it's a bit like the unglamourous part of watch making: there are hundreds of cogs that are often codependent on one another, and although you could just get a hammer and bash them all into place, the only trustworthy, longlasting approach is to investigate each little variance until you've solved them all.

If we'd started with a database it would have been easier; SQL is a language consisting of consecutive imperative statements: do this, change x if y is equal to z, tell me the square root of the sum of the last ten numbers in a list, and so on. It makes computers good at doing the sort of tedious-but-complicated things humans struggle with.

(Of course, I may be saying that because I'm familiar with SQL, and you can write horrible, spaghettified code in SQL just like any other language, but it's easier to turn clean SQL into a spider's web of nested Excel formulae than to go in the opposite direction. Plus I've got twelve years of hammering Excel too, and familarity doesn't always breed affection, otherwise people who have commuted on the Northern Line for thirty years would be ecstatic at the mere suggestion of a delayed train.)

So in the morning I had 50,000 discrepancies between the spreadsheet and the database. Getting it to 10,000 wasn't so hard, but from 10,000 to 9,000 took half the afternoon. When you're trying to make sure your interpretation of the truth is the same as someone else's, it can be nigh-impossible to spot the common problem running throughout your code. About four I had a break through and went from 9.000 to 4,000, and then to 1,000, and in the last hour I've ground it down to just 5 remaining discrepancies
.
In past days I'd have said sod it at the last 1%, but I really want to eradicate the spreadsheet, and if I can get a perfect match, it should be just a trivial exercise to unleash my creation on the world. Hence tomorrow I'll grind through the last ones, and be able to honestly say that's done.

And then there's making it easy for somebody else to use, but that's another story...

0 comments:

Post a Comment