Thursday, February 19, 2009

I hate duplicate code; part I

Just last week I was working on some code modifications, which basically just added another category of sales to a pre-existing column. When I was researching how to do this, I saw that this field was used in numerous reports in a case statement. For instance:

ClassId =
CASE
WHEN ec.CSRID = 19 THEN 34
WHEN he.DomainTypeID = 4 THEN 39
WHEN he.DomainTypeID = 128 THEN 37
ELSE 30
END

There's a lot wrong with this code, especially because it existed in a total of 5 stored procedures. Significantly, there were already bugs because it had been modified in some procedures and not others, even though theoretically it was all supposed to be the same.

This violates one of the basic "best practices" of coding, which is NO DUPLICATE CODE. There's a good write-up on why duplicate code is a bad idea in Wikipedia. Basically, it's harder to understand, and harder to maintain and fix bugs in. For this particular piece of duplicate code we've already seen that there were some bugs in there by the fact that some sections were missing some ClassId categories that had been added.

So, to avoid having this chunk of duplicate code in multiple stored procedures, we create the following function ClassIdGet:

Create function dbo.ClassIdGet ( @CSRID tinyint, @DomainTypeID int )
returns tinyint as
begin
declare @ClassId tinyint
select @ClassId =
case
WHEN @CSRID = 19 THEN 34
WHEN @CSRID = 54 THEN 123
WHEN @DomainTypeID = 4 THEN 39
WHEN @DomainTypeID = 128 THEN 37
ELSE 30
end
return @ClassId
end

Then, in my stored procedure, I call the function like this:

ClassId = dbo.ClassIdGet (ec.CSRID, he.DomainTypeID)

This is a huge improvement. Now, whenever we add a new ClassId, we can just modify the function ClassIdGet.

Is this the ideal way to take care of this issue? No - ideally I'd like to put the mapping of ClassId from CSRID and DomainTypeID into a domain table. Then adding a new ClassId would only involve adding a record to a table, instead of modifying a function. However, moving the code to a function is a world of improvement without a very dramatic change.

About the duplicate code above - why does this kind of poor coding practice show up again and again? I'm still thinking about that one. I think partly it's because for most companies working on internal applications, best practices in coding are not even on the radar. It's easy for people to get excited about esoteric coding methodologies, or about heavily promoted new data warehouse platforms. It's just not as easy to get excited about one of the key tenets of good programming - no duplicate code.

Also, it requires having a longer term view of things. It does take a little longer to do things the right way - in this instance, creating a both a stored procedure and a function to encapsulate the code, instead of just a stored procedure. It also takes a programmer who's experienced enough to know that copying and pasting code is usually not such a good idea. And it takes an understanding that the intitial development of the code is just a small part of the work generated over the code lifetime.

However, think of the time that would have been saved if it had been done right the first place - instead of needing a bunch of research and code modifications, whenever we needed a new ClassId, we would have been able to just modify one function.

But without a culture of writing high quality code, most people tend to do what works immediately, rather than take a longer term view.


Thursday, February 12, 2009

Using templates to improve your database code

If you're like me, when it's time to do some coding the first thing you do is look for some code to copy. It's pretty sensible - why reinvent the wheel when there's something out there that may already do what you need?

But instead of just copying whatever code that may already exist that seems to suits your needs, extend the concept of copying code, and make a set of templates. For instance, you could have a set of stored procedure templates for the following purposes:

- Loading data feeds, say from an external vendor
- Pushing out data feeds
- Processing exception records
- Loading data warehouse fact tables
- Reporting
- Validating before inserting data

These are oriented to data warehouse needs, but of course you could create a similar set of stored procedure templates for an OLTP database. When you find yourself needing a new type of functionality, you would create a new template.

What are the benefits to creating templates like this, and making sure people use them? Well, for one, you're leveraging the capabilities of the most experienced developers to help out the less experienced ones. The more experienced developers would set up the templates, using the standards that have been set for error trapping, logging processes, etc. Then the more junior developers could write up the code for a particular need, using the template, knowing that they're following the correct coding standards (or at least, making a good start at it).

Some other benefits are that your code becomes more readable and maintainable when stored procedures follow templates. And, of course, coding becomes much faster if you have a trusted set of templates that you can start out with.

Thursday, February 5, 2009

Introduction

I've been thinking about creating a database oriented blog for a while now. Gradually a stockpile of article ideas has been accumulating in my notes, and now is the time to starting expanding on some of them.

A little about my background - I've been working on databases for 15 years. I started out working for Microsoft in tech support on DOS 7, as a contractor. I had no formal computer background whatsoever, but it turned out that I really enjoyed it, and was pretty good at it.

I switched to tech support for Microsoft Access when it first came out, then started working more on ODBC and MS SQL Server issues. After a year or so of solving problems on the phone for people who were making about 5 times what I was, I decided that doing database consulting/contracting was the way to go. I got my MCDBA certification (I was one of the first!) and have been doing that ever since, focusing on MS SQL Server, but also with a heavy dose of OLAP, Perl, and various other tools.

It's been a great career for me - flexible and lucrative, and often quite interesting. This blog is a way for me to give back to the database community, share some of the strategies that have been useful for me and the companies I've worked at, and also to bitch and moan about some of the problems that always seems to come up, that are also very hard to change.

Unless it's something I'm really excited about, I'm going to stay away from SQL tips and tricks. That kind of information is plentiful, and just a Google search away. I'm planning on articles about topics such as:

- Why is it so important to set up and maintain great development and test
environments?
- Databases and bug tracking software - best practices
- Tracking database usage - why bother?

I welcome comments and ideas!