Saturday, September 29, 2012

Develop better database code in one-tenth the time

Okay, maybe one-tenth the time is an exaggeration.  But in some cases, it's possible to cut down development time tremendously.

When working on complex ETL processes, or huge stored procedures, the sheer size of the data and processes often make the development process much slower than it needs to be.  For instance, if when making a fix you need to run a stored procedure that takes 20 minutes, then every single minor mistake you make in your code will cost you at the very least more than 20 minutes to find and fix.

However, if you're able to cut down your code or data in such a way that it runs in one minute in the development environment, you've just improved the speed of development tremendously, because each iteration of fixes will take much less time.

So, what are some ways to do this?  The first would be to, if possible, reduce the size of the data you're working with in your development environment.  Can you truncate most of the data, leaving only what's critical to test your fix?  This can be a little tricky in a database with a lot of foreign keys set up.  But on the other hand, once you have a script to trim your data, you can reuse it.  Of course, before actually releasing code tested in an environment with limited data, you'd want to test it in an integration environment with a full data set.

The next best way is to ruthlessly trim the process until you have the chunk of code that matters, and only work with that.  For instance, recently I was working on a stored procedure that usually takes about five minutes to run.  Five minutes isn't that bad, but I was working on some tricky algorithm changes and I knew I'd need to be iterating a lot before I got it right.  So, instead of putting in my changes and running them (and leaving myself 5 minutes each run to get distracted), I just isolated the subset of the process that really mattered, commented out everything else I could, and wrote the data to a global temporary table that I could then examine for problems.  So instead of having every single problem show up only after 5 minutes, I could find it after just a few seconds.  I ended up getting it done very quickly, because there was no down-time.

I find it's more fun to work this way, too.  If you constantly  have these 5 or 10 minute chunks when you're just waiting for something to happen, it's easy to get bored and distracted, and forget the details of what you're working on.