Thursday, February 19, 2009

I hate duplicate code; part I

Just last week I was working on some code modifications, which basically just added another category of sales to a pre-existing column. When I was researching how to do this, I saw that this field was used in numerous reports in a case statement. For instance:

ClassId =
CASE
WHEN ec.CSRID = 19 THEN 34
WHEN he.DomainTypeID = 4 THEN 39
WHEN he.DomainTypeID = 128 THEN 37
ELSE 30
END

There's a lot wrong with this code, especially because it existed in a total of 5 stored procedures. Significantly, there were already bugs because it had been modified in some procedures and not others, even though theoretically it was all supposed to be the same.

This violates one of the basic "best practices" of coding, which is NO DUPLICATE CODE. There's a good write-up on why duplicate code is a bad idea in Wikipedia. Basically, it's harder to understand, and harder to maintain and fix bugs in. For this particular piece of duplicate code we've already seen that there were some bugs in there by the fact that some sections were missing some ClassId categories that had been added.

So, to avoid having this chunk of duplicate code in multiple stored procedures, we create the following function ClassIdGet:

Create function dbo.ClassIdGet ( @CSRID tinyint, @DomainTypeID int )
returns tinyint as
begin
declare @ClassId tinyint
select @ClassId =
case
WHEN @CSRID = 19 THEN 34
WHEN @CSRID = 54 THEN 123
WHEN @DomainTypeID = 4 THEN 39
WHEN @DomainTypeID = 128 THEN 37
ELSE 30
end
return @ClassId
end

Then, in my stored procedure, I call the function like this:

ClassId = dbo.ClassIdGet (ec.CSRID, he.DomainTypeID)

This is a huge improvement. Now, whenever we add a new ClassId, we can just modify the function ClassIdGet.

Is this the ideal way to take care of this issue? No - ideally I'd like to put the mapping of ClassId from CSRID and DomainTypeID into a domain table. Then adding a new ClassId would only involve adding a record to a table, instead of modifying a function. However, moving the code to a function is a world of improvement without a very dramatic change.

About the duplicate code above - why does this kind of poor coding practice show up again and again? I'm still thinking about that one. I think partly it's because for most companies working on internal applications, best practices in coding are not even on the radar. It's easy for people to get excited about esoteric coding methodologies, or about heavily promoted new data warehouse platforms. It's just not as easy to get excited about one of the key tenets of good programming - no duplicate code.

Also, it requires having a longer term view of things. It does take a little longer to do things the right way - in this instance, creating a both a stored procedure and a function to encapsulate the code, instead of just a stored procedure. It also takes a programmer who's experienced enough to know that copying and pasting code is usually not such a good idea. And it takes an understanding that the intitial development of the code is just a small part of the work generated over the code lifetime.

However, think of the time that would have been saved if it had been done right the first place - instead of needing a bunch of research and code modifications, whenever we needed a new ClassId, we would have been able to just modify one function.

But without a culture of writing high quality code, most people tend to do what works immediately, rather than take a longer term view.


No comments:

Post a Comment