Databases Are Fun: 2009

Thursday, December 31, 2009

Easy Error Trapping When Using xp_cmdshell

Error handling can be tough when using xp_cmdshell. Before I learned the trick that I go over below, I could usually figure out in my code if an error had occured when I ran a command via xp_cmdshell. However, getting details about the error, or any output at all, was tricky and could involve parsing out files.

Before you start, please be aware that xp_cmdshell will be executed under the same security context as the SQL Server service, and can be a security problem in some environments.

This code uses the insert/execute syntax. If you've never used this before, it's a good idea to learn it. Basically, instead of inserting data into a table the normal way, you can insert the results of an execute statement - in this case the xp_cmdshell statement, like below:

Insert into XPCmdShellOutput 
Execute master..xp_cmdshell 'bcp tempdb..Employee out c:\temp\Employee.txt -c'

So, below is an easy, straightforward way to use xp_cmdshell to bcp out a table, and also see the output from the command. The same principles can be used for any other use of xp_cmdshell, and not only bcp.

set nocount on
use tempdb

if object_id('tempdb..Employee') is not null drop table Employee
if object_id('tempdb..XPCmdShellOutput') is not null drop table XPCmdShellOutput

-- Create the table that we need to extract from
create table Employee (EmployeeName varchar(20))
insert into Employee values ('John')

-- This table will be used to gather the output of xp_cmdshell
create table XPCmdShellOutput (OutputLine varchar(1000))

-- Show the output of xp_cmdshell when the directory does not exist
Insert into XPCmdShellOutput 
Execute master..xp_cmdshell 'bcp tempdb..Employee out c:\DirectoryDoesNotExist\Employee.txt -c'
select 'Error when directory does not exist' = OutputLine from XPCmdShellOutput
delete from XPCmdShellOutput

-- Show output of xp_cmdshell when the table to be exported does not exist
Insert into XPCmdShellOutput 
Execute master..xp_cmdshell 'bcp tempdb..Employee1 out c:\temp\Employee.txt -c'
select 'Error when table does not exist' = OutputLine from XPCmdShellOutput
delete from XPCmdShellOutput

-- Finally, successfully export the table!
Insert into XPCmdShellOutput 
Execute master..xp_cmdshell 'bcp tempdb..Employee out c:\temp\Employee.txt -c'
select 'Successfully export the table!' = OutputLine from XPCmdShellOutput

When you run the above code, this will be the output (note that you may need to modify the c:\temp directory in your environment, and you need create table permissions in the tempdb database):


Error when directory does not exist:

OutputLine
Password: 
SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC SQL Server Driver]Unable to open BCP host data-file
NULL


Error when table does not exist:

OutputLine
Password: 
SQLState = S0002, NativeError = 208
Error = [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'tempdb..Employee1'.
NULL


Successfully export the table!

OutputLine
Password: 
NULL
Starting copy...
NULL
1 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.): total        1 Avg        1 (1000.00 rows per sec.)
NULL

The output of the xp_cmdshell was inserted into the XPCmdShellOutput table every time we ran it, because we used the insert/execute syntax. Then, we select from XPCmdShellOutput to show what the output actually was. The first section shows the error when the directory does not exist. The second shows the error message when table name to be exported is misspelled. And the third shows a successful export.

This is sample code, and simplified to make it easier to understand. In working code, you would check the XPCmdShellOutput table for the string "Error". If the error string exists, then obviously an error occurred, and the details would have been stored in the XPCmdShellOutput table. Note that when the export is successful, you can also extract other information from the XPCmdShellOutput table - for instance, how many rows were copied out, and how long the export took.

When using xp_cmdshell with bcp, keep in mind that it will NOT recognize any temporary tables that were created. If you need to use temporary tables, they must be global temporary tables, prefixed with ## instead of #.

Tuesday, June 23, 2009

Renaming a Column in a Temp Table in SQL Server 2005 - Yes, You Can!

For whatever weird reason, you may need to create a temp table in one step, and then rename one of the columns. Perhaps you're creating the temp table in one stored procedure, and modifying it in another. I'm not going to tell you that's a silly thing to do, or you should just create it with the right names in the first place - I saw those kind of responses online, and it's very unhelpful! Sometimes you just need to do this type of thing.

It's not a straightforward thing to do, though. When you just run the below script that calls sp_rename:

if object_id('tempdb..#Test123') is not null drop table #Test123
create table #Test123 (field1 int)
exec sp_rename '#Test123.Field1', 'Field2', 'COLUMN'

...you get this error: "Either the parameter @objname is ambiguous or the claimed @objtype (COLUMN) is wrong."

The key is to call sp_rename from the tempdb, like so:

if object_id('tempdb..#Test123') is not null drop table #Test123
create table #Test123 (field1 int)
exec tempdb..sp_rename '#Test123.Field1', 'Field2', 'COLUMN'
select * from #Test123

And - success! Note that you'll still have problems referencing the specific fieldname that was renamed. I solved this by doing a select into another temp table. Also, the procedure you're doing this in will probably have to do a recompile - that wasn't a problem for me, either. Overall, it was a better solution than the alternatives.

Thursday, May 21, 2009

Easily delete database objects

Any developer working on larger, more complex systems, will eventually need to do some massive cleanup of tables, stored procedures, and other objects that are no longer used. Although you could just write a simple statement like this:

Drop table TestTable

...you should really have something more robust, that returns informative messagess and checks for errors. Something like this:

Declare @Error int
if exists (select * from sys.objects where name = 'TestTable' and type = 'u' and schema_id = schema_id('dbo') ) begin
-- The table exists, prepare to delete it
Drop table dbo.TestTable
Select @Error = @@Error
if @Error <> 0 begin
RAISERROR ('Error dropping table dbo.TestTable' ,16 ,1)
end
print 'Successfully dropped Table TestTable.'
end 
else begin
print 'Table TestTable does not exist or has already been deleted.'
end

But do you want to constantly be rewriting that piece of code as you need to drop different objects? Absolutely not! That's why I wrote the stored procedure above called sp_DropDatabaseObject that will delete many different types of database objects (tables, procedures, views, functions, and indexes). It incorporates all the functionality above (error trapping, good error messages), in a reusable procedure. It's created in the master database, so that it can be called from any database. Note that at the end, I call sp_MS_marksystemobject to mark it as a system object - this allows it to have the context of the calling database even though it's located in the master database.

Here are some examples of how to run sp_DropDatabaseObject

-- Drop table
exec sp_DropDatabaseObject 'dbo', 'TestTable', 'u'
-- Drop procedure
exec sp_DropDatabaseObject 'dbo', 'TestProcedure', 'p'
-- Drop index
exec sp_DropDatabaseObject 'dbo', 'TestTable.index1', 'i'
-- Drop View
exec sp_DropDatabaseObject 'dbo', 'TestView', 'v'
-- Drop function
exec sp_DropDatabaseObject 'dbo', 'TestFunction', 'fn'

And below is the code for the stored procedure. I haven't yet modified it to use SQL 2005 error trapping (try/catch), but that would be a definite improvement.

use master
go

Create procedure dbo.sp_DropDatabaseObject 
 @pSchemaName varchar(100) -- the schema the object belongs to, when applicable
 ,@pObjectName sysname -- name of the object to drop, including schema (i.e. dbo.TableName)
 ,@pObjectType char(2) -- type of object to be dropped. 
 -- Can be 'U', 'V', 'P', 'FN', 'I' (for table, view, procedure, function, and index)

as

----------------------------------------------------------------------------
-- Declarations
----------------------------------------------------------------------------
declare -- Standard declares
 @FALSE tinyint -- Boolean false.
 ,@TRUE tinyint -- Boolean true.
 ,@ExitCode int -- Return value of this procedure.
 ,@rc int -- Return code from a called SP.
 ,@Error int -- Store error codes returned by statements and procedures (@@error).
 ,@RaiseMessage varchar(1000) -- Creates helpful message to be raised when running.

declare -- sp specific declares
 @SingleQuote nchar(1)
 ,@SQL nvarchar(4000)
 ,@IndexTableName varchar(50)
 ,@IndexIndexName varchar(50)

----------------------------------------------------------------------------
-- Initializations
----------------------------------------------------------------------------
select -- Standard constants
 @FALSE = 0
 ,@TRUE = 1
 ,@ExitCode = 0
 ,@rc = 0
 ,@Error = 0

Select
 @SingleQuote = char(39)

----------------------------------------------------------------------------
-- Validate that all objects have an appropriate ObjectType
----------------------------------------------------------------------------
if @pObjectType not in ('U', 'V', 'P', 'FN', 'I') begin
 select @RaiseMessage = 'Invalid ObjectType value: ' + @pObjectType
 goto ErrorHandler
end

----------------------------------------------------------------------------
-- Put together the SQL to drop the database object
----------------------------------------------------------------------------
if @pObjectType = 'U' begin
 if exists (select * from sys.objects where name = @pObjectName and type = @pObjectType and schema_id = schema_id(@pSchemaName) ) begin
 -- The table exists, prepare to delete it
 Select @SQL = 'Drop table ' + @pSchemaName + '.' + @pObjectName 
 end 
 else begin
 select @RaiseMessage = 'Table ' + @pObjectName + ' does not exist or has already been deleted'
 print @RaiseMessage
 goto ExitProc
 end
end

if @pObjectType = 'V' begin
 if exists (select * from sys.objects where name = @pObjectName and type = @pObjectType and schema_id = schema_id(@pSchemaName) ) begin
 -- The view exists, prepare to delete it
 Select @SQL = 'Drop view ' + @pSchemaName + '.' + @pObjectName 
 end 
 else begin
 select @RaiseMessage = 'View ' + @pObjectName + ' does not exist or has already been deleted'
 print @RaiseMessage
 goto ExitProc
 end
end

if @pObjectType = 'P' begin
 if exists (select * from sys.objects where name = @pObjectName and type = @pObjectType and schema_id = schema_id(@pSchemaName) ) begin
 -- The procedure exists, prepare to delete it
 Select @SQL = 'Drop procedure ' + @pSchemaName + '.' + @pObjectName 
 end 
 else begin
 select @RaiseMessage = 'Procedure ' + @pObjectName + ' does not exist or has already been deleted'
 print @RaiseMessage
 goto ExitProc
 end
end

if @pObjectType = 'FN' begin
 if exists (select * from sys.objects where name = @pObjectName and type = @pObjectType and schema_id = schema_id(@pSchemaName) ) begin
 -- The function exists, prepare to delete it
 Select @SQL = 'Drop function ' + @pSchemaName + '.' + @pObjectName 
 end 
 else begin
 select @RaiseMessage = 'Function ' + @pObjectName + ' does not exist or has already been deleted'
 print @RaiseMessage
 goto ExitProc
 end
end

if @pObjectType = 'I' begin
 -- Parse out the table/index names to be able to test for index existance easily
 Select @IndexTableName = substring(@pObjectName, 1, CHARINDEX('.', @pObjectName) - 1) 
 Select @IndexIndexName = substring(@pObjectName, CHARINDEX('.', @pObjectName) + 1, 50 )
 If IndexProperty(OBJECT_ID(@IndexTableName),@IndexIndexName,'IndexID') IS not NULL begin
 -- Check first whether it's a primary key
 if exists 
 (
 select * from sys.indexes where is_primary_key = @TRUE and object_name(object_id) = @IndexTableName and name = @IndexIndexName
 ) 
 begin
 Select @SQL = 'Alter table ' + @pSchemaName + '.' + @IndexTableName + ' drop constraint ' + @IndexIndexName
 end
 else begin
 Select @SQL = 'Drop Index ' + @pSchemaName + '.' + @pObjectName 
 end
 end
 else begin
 select @RaiseMessage = 'Index ' + @pObjectName + ' does not exist or has already been deleted'
 print @RaiseMessage
 goto ExitProc
 end 
end

----------------------------------------------------------------------------
-- Drop the database object
----------------------------------------------------------------------------
if @SQL is not null begin
 Exec @RC = sp_executesql @sql
 select @Error = @@Error
 if @Error <> 0 or @RC <> 0 begin
 select @RaiseMessage = 'Error dropping object : ' + @pObjectName + ' using sql statement: ' + @SQL
 goto ErrorHandler 
 end
 Select @RaiseMessage = 'Completed dropping object: ' + @pObjectName + ' using sql statement: ' + @SQL
 print @RaiseMessage
end

goto ExitProc

----------------------------------------------------------------------------
-- Error Handler
----------------------------------------------------------------------------
ErrorHandler:

 select @ExitCode = -100

 -- Print the Error Message now that will kill isql.
 RAISERROR (
 @RaiseMessage
 ,16 -- Severity.
 ,1 -- State.
 )

 goto ExitProc

----------------------------------------------------------------------------
-- Exit Procedure
----------------------------------------------------------------------------
ExitProc:

 return (@ExitCode)

go


-- Marks it as a system object. Otherwise, it may return object information from the master database instead of the calling database
EXEC sys.sp_MS_marksystemobject sp_DropDatabaseObject
GO

Tuesday, May 19, 2009

9 Things to Do When You Inherit a Database

So—Bob’s left the company to move back east, and you’re the new lead database developer on the database. Or, the third-party company to which the maintenance has been outsourced is no longer working on it, so it’s yours now. One way or another, you need to take over a database system that you had no part in developing. It's not in good shape, and there’s not many resources for you to tap.

What do you do?

I’ve been faced with this situation a few times now, and have developed a list of some of the things that have helped me the most, both in getting productive, and in bringing the database system up to par.

Backups
Make sure that backups are happening. I’m assuming here that you’re the database developer, and not the database administrator. However, just as minimum check, make sure that backups are occurring regularly. Ideally you should successfully restore the backup somewhere else.

Research
Look at the database. Go through and get an idea of the table structure, what the largest tables are by size, what the most commonly used stored procedures are, if there are jobs, and what documentation there is. Read through some the stored procedures. You may find it useful to create a quick and dirty database diagram if there isn’t one, using the built in diagramming tool in SQL Server. This can also be a good visual aid when you talk to other people.

Talk to the former developers
This may not be an option, but try hard to have a least a few friendly interviews with the former developers. This is not the time to make comments like, “I can’t believe you guys did [insert bad development practice here]”. You don’t know the history– maybe it was that way when they got the system. You’ll want to get as much information as they can give you on current issues, items on this list, etc. Keep things friendly – and maybe try to get their cell number in case of questions. A good relationship with former developers can go a long way.

A bug database
Is there a bug database – somewhere that bugs (and sometimes enhancement ideas) are tracked for this system? This is certainly one of the things that you want to set up, if it’s not there currently. I’ve always been lucky enough to work at companies where bug tracking was taken seriously, and there were systems already in place that I could just plug into. If there’s no bug database, time to do some research. I wouldn’t suggest reinventing the wheel here, since there’s a lot of good systems out there—just use what’s available.

Source code control
Is the code in some kind of source code control system, such as VSS or Perforce? If it is—is everything up to date? I’m going to hazard a guess that it’s either not in source code control, or it hasn’t been kept up to date. That’s been a big task for me when starting work on inherited systems. There’s a number of tools with which to tackle this. In the past I’ve used a custom written perl tool that used SQL DMO, but I won’t go into detail—that’s the topic of another article. If nothing else, you could use the built in tools that SQL Server provides to script out your database objects, and check them in. Once you have everything checked in, try running a database build from the checked in code, and compare it to production. Also—make sure you have a good system to keep all the code updated!

Talk to the users and/or business owners
Sit down and have some conversations with the users. This is a good opportunity to get to know their problems and concerns, the improvements they would most like to see, and where things are heading in the future. You want to make sure that this database is sticking around, that it’s not going to be replaced with a third party product or anything like that. If you’re going to put a lot of work into improving the system, you need to know that your efforts are going to pay off for the business. Also–you’ll probably be spending lots of time on issues that are important to a well-run database system (a bug database, source code control, etc), but that won’t give them any new features. Make sure they understand this.

Establish credibility with the users by fixing a few things or making some enhancements
Even though you’ll probably be needing to spend a lot of time on tasks like setting up source code control, bug tracking, etc, you don’t want to do this exclusively. From talks with users, hopefully you’ve identified enhancements or bug fixes that you could get out quickly. Do what you can here. This is a great way to establish credibility with them. Let them know, too, that once you have the systems in place, bug fixes and enhancements will be much easier to roll out.

Create a development environment
If you don’t have a development environment, but code still needs to be written, where are the developers going to write and test their code? I hate to tell you, but if they have access, they’ll write and test in the production environment. So you may have stored procedures called CampaignEmailExport_TEST hanging around (and never getting deleted). Or—oops—you may accidentally overwrite the production version with your new version, and then it runs and causes hundreds of thousands of emails to be sent where they weren’t supposed to. Not that I’ve ever heard of this happening. This kind of problem can go a long way towards convincing users that time and money needs to be spent on working on setting up a good foundation.
For the development environment–you may be able to just get a backup from production, and set it up on another server. If it’s too large, you might need to be creative. Whatever you do, don’t develop or test in the production environment.

Drop obsolete objects
In a system that hasn’t been maintained very well, it’s likely that there are a lot of database objects out there that aren’t being used. They may have suffixes like ‘temp’ or ‘bak’ on them. It can be hard to identify all of these, and you may be tempted to just leave them. However, they can cause a number of problems:

1. They make it difficult to figure out what the actual working codebase is. If you have a lot of duplicate, backup, “working” or “temp” objects, you don’t know what your codebase is like, and how complex it is.

2. Supposed you’d like to drop a tables because it’s huge, and looks like it hasn’t been updated in a long time, but it turns out that they’re being used by stored procedure X. If it turns out that stored procedure X is never used, but you’re keeping it around in the database anyway, then you’ve just lost this opportunity to enhance your code because of an obsolete stored procedure. This kind of issue, multiplied by all the obsolete objects that are in the database, can cause development to be very slow, or even grind to a halt.

Finally...
There’s potentially months and months of work if you start from scratch on all of the above. It’ll require good judgment on what to prioritize, where to start, and how much time to spend on all the tasks that need doing. And perhaps you’re not in a position to set all the priorities. But it can be worthwhile and fun to streamline and tune-up a database that just needs a little work to become a well-oiled machine, requiring much less development time.

Thanks for reading! I welcome feedback in the form of comments, and may post an update to this article with the best suggestions and comments.

Tuesday, March 31, 2009

Enhancing the readability of your code: Table aliasing in sql

Aliasing tables can be a topic that database developers feel very strongly about. I'm actually one of the developers that do use aliases, but there's a right way to do it, and there's a wrong way.
First, the wrong way (using the pubs sample database):

select
   c.Au_id
   ,c.au_lname
   ,c.au_fname
   ,a.title
   ,a.title_id
   ,b.royaltyper
from titles a
   join titleauthor b
      on a.title_id = b.title_id
   join authors c
      on c.au_id = b.au_id

Don't laugh. I've actually seen some sql done exactly like this. Reading and understanding it later on takes far longer than it should (now what did b stand for again?).

Most people would agree that the aliasing in the above example is terrible, and most sql isn't written that way. However, I'd say about 90% of sql is written like the example below:

select
au.Au_id
,au.au_lname
,au.au_fname
,ti.title
,ti.title_id
,ta.royaltyper
from titles ti
join titleauthor ta
on ti.title_id = ta.title_id
join authors au
on ta.au_id = au.au_id

This is an improvement over using aliases that have no relation whatsoever with the tablename. But in this particular sql statement, there's no reason to alias your tables names - it would just save a few keystrokes, at the cost of making it much less readable. Here's what I would do.

select
   authors.Au_id
   ,au_lname 
   ,au_fname
   ,title
   ,titles.title_id
   ,royaltyper
from titles
   join titleauthor
      on titles.title_id = titleauthor.title_id
   join authors
      on titleauthor.au_id = authors.au_id

In this case - just a simple select statment - it really doesn't make any sense to use aliases.

It's in more complex sql, with cross database joins, subqueries, and derived tables that you really want to use aliases in order to make your code more readable. For instance, the following from clause contains tables from 4 different databases:

from BookingStg.dbo.TSData TSData
join BookingStg.dbo.Booking Booking
  on Booking.BookingID = TSData.BookingID
join Domain.dbo.BookingWin BookingWin
  on TSData.DateOfStayTimeID = BookingWin.DateOfStayTimeID
left join WarehouseFactStore.dbo.Customer_Dim Customer_Dim
  on Customer_Dim.TUID = Booking.TUID
  and Customer_Dim.ProductID = TSData.ProductID
left join TServerImp.dbo.TLRPostalCode TLRPostalCode
  on TLRPostalCode.TLR = TSData.TLR
  and TLRPostalCode.ProductID = TSData.ProductID

When using tables from multiple databases, you alias the fully qualitifed names (i.e. DatabaseName.SchemaName.TableName) with just the tablename. Like this:

join BookingStg.dbo.Booking Booking

This cuts down of the length of table references, without obscuring them.

Also, in a situation where you're creating a derived table, you'll need to pick a name for it. Make sure it reflects what the table is used for. For instance, here I'm creating a derived table in order to group transactions, so the alias name reflects that:

from #TransactionAdFact
left join
  (
  Select
     TLR
     ,MinAdID= min(AdID)
  from  RetailOperations.dbo.OmTransaction OmTransaction
  join #BaseData
     on #BaseData.OmniProductMapping  = OmTransaction.OmProductID
  where
     TransDateKey between @StartDateKey and @EndDateKey
     and AdID is not null
  group by
     TLR
     ,OmTransaction.OmProductID
  ) GroupedTransactions
  on GroupedTransactions.TLR = #TransactionAdFact.TLR

Please, don't call the derived table something like Temp1. I see this all the time. It may save you the minute it would cost to think of a name that would be understandable to other people, but for the sake of the people who will be reading this code - take the minute, and make your code more readable.

Chosing the right tables aliases in SQL Server can either make your queries far more readable...or you can save yourself some thought and a couple keystrokes. Your choice.

Thursday, March 26, 2009

Quick SQL - getting the size of all tables in a database

I use this little chunk of SQL code almost every day. It gives you a relatively accurate report of the size of each table in the database, including indexes. Very handy!

select
   TableName   = convert(varchar(100) ,sysobjects.name)    
   ,TotalRows  = max(sysindexes.rows)    
   ,MbData     = floor(sum(convert(real ,sysindexes.dpages)) * spt_values.low / 1048576)   
   ,MbTotal    = floor(sum(convert(real ,sysindexes.used)) * spt_values.low / 1048576  )
from sysobjects 
join sysindexes   
   on sysobjects.id = sysindexes.id
join master.dbo.spt_values spt_values 
   on spt_values.number = 1 
   and spt_values.type = 'E' 
where
   sysobjects.type = 'U'   
   and indid in (0,1,255)
group by    
   sysobjects.name   
   ,spt_values.low
order by 4 desc

Thursday, March 19, 2009

All database outputs need business owners

This is more applicable to the data warehouse world, because in an OLTP world, it's much more obvious what the purpose of database objects are.

However, in the data warehouse/reporting/business intelligence world, frequently reports and fact tables and data feeds will be created, and people will use them for a while, then not find them useful anymore and stop (or leave the company). And they won't be used again, or people don't know about them.

Aside from the problem of having a bunch of unused reports, fact tables or data feeds cluttering up your system, you have the larger problem of people assuming that they ARE still used, and need to be accounted for. As in, "If we modify table ABC, we also need to fix reports DEF and GHI because they depend on them". Well, if it turns out that the reports aren't being actively used anymore, you've taken on a lot of extra wasted work.

One neat idea is to put in is a usage tracking tool for all reports and data feeds. So, from whatever UI that the reports are run, have it also log a record to a table called something like ReportUsageLog. Then regularly monitor this table to see who's using what. If there's any reports that aren't being used, figure out why and see if you can potentially decommission them.

Thursday, March 5, 2009

Having a great development environment

I'm a big fan of checklists, and one of the ones I like is the one by Joel Spolsky (Joel on Software) called The Joel Test: 12 Steps to Better Code.

He goes through a whole set of things that you need to be doing right. The one I want to focus on today is the "Can you make a build in one step", and how it translates to the database world.

What Joel Spolsky focuses on mainly are software products. A build is not a big deal in his world. However, in the world of complex database systems, with all kinds of inputs and outputs from external vendors, things are different. In this world, doing a build, to me, means not just having the database objects set up correctly, but also having data. I'm talking about a large set of data, probably extracted from the production environment - enough to develop and test everything that you might need to.

This is a lot harder than it may seem, and most companies don't do too well at it. The problem is that almost everybody starts out getting database backups from production, and using those in their development environment. This works fine, until the database starts getting too large. Then, obviously, it's not so easy to get backups from production because the files are just so darn huge.

What a lot of companies do is set up the development environment, with all the database objects, and leave the transferring of data up to the individual developer. That's a big mistake. It can take a very long time to specify exactly what data you need, transfer from a production environment, and then import into a development environment. Multiple this by every developer, for every project they're working on, and you're talking about a lot of wasted time.

The better approach is to write up some code to get a reasonable subset of data. There's lots of ways to do this, and I may write more about it at some point, but it's absolutely critical to get data into your dev/test database systems. The biggest difficulty is getting management to approve spending time on a project like this.

Thursday, February 19, 2009

I hate duplicate code; part I

Just last week I was working on some code modifications, which basically just added another category of sales to a pre-existing column. When I was researching how to do this, I saw that this field was used in numerous reports in a case statement. For instance:

ClassId =
   CASE
      WHEN ec.CSRID = 19         THEN 34       
      WHEN he.DomainTypeID = 4   THEN 39   
      WHEN he.DomainTypeID = 128 THEN 37 
      ELSE 30                          
   END

There's a lot wrong with this code, especially because it existed in a total of 5 stored procedures. Significantly, there were already bugs because it had been modified in some procedures and not others, even though theoretically it was all supposed to be the same.

This violates one of the basic "best practices" of coding, which is NO DUPLICATE CODE. There's a good write-up on why duplicate code is a bad idea in Wikipedia. Basically, it's harder to understand, and harder to maintain and fix bugs in. For this particular piece of duplicate code we've already seen that there were some bugs in there by the fact that some sections were missing some ClassId categories that had been added.

So, to avoid having this chunk of duplicate code in multiple stored procedures, we create the following function ClassIdGet:

Create function dbo.ClassIdGet ( @CSRID tinyint, @DomainTypeID int )
returns tinyint  as
begin
   declare @ClassId tinyint
   select @ClassId =
      case
         WHEN @CSRID        = 19      THEN 34   
         WHEN @CSRID        = 54      THEN 123  
         WHEN @DomainTypeID = 4       THEN 39   
         WHEN @DomainTypeID = 128     THEN 37   
         ELSE 30                              
      end
   return @ClassId
end

Then, in my stored procedure, I call the function like this:

ClassId = dbo.ClassIdGet (ec.CSRID, he.DomainTypeID)

This is a huge improvement. Now, whenever we add a new ClassId, we can just modify the function ClassIdGet.

Is this the ideal way to take care of this issue? No - ideally I'd like to put the mapping of ClassId from CSRID and DomainTypeID into a domain table. Then adding a new ClassId would only involve adding a record to a table, instead of modifying a function. However, moving the code to a function is a world of improvement without a very dramatic change.

About the duplicate code above - why does this kind of poor coding practice show up again and again? I'm still thinking about that one. I think partly it's because for most companies working on internal applications, best practices in coding are not even on the radar. It's easy for people to get excited about esoteric coding methodologies, or about heavily promoted new data warehouse platforms. It's just not as easy to get excited about one of the key tenets of good programming - no duplicate code.

Also, it requires having a longer term view of things. It does take a little longer to do things the right way - in this instance, creating a both a stored procedure and a function to encapsulate the code, instead of just a stored procedure. It also takes a programmer who's experienced enough to know that copying and pasting code is usually not such a good idea. And it takes an understanding that the intitial development of the code is just a small part of the work generated over the code lifetime.

However, think of the time that would have been saved if it had been done right the first place - instead of needing a bunch of research and code modifications, whenever we needed a new ClassId, we would have been able to just modify one function.

But without a culture of writing high quality code, most people tend to do what works immediately, rather than take a longer term view.

Thursday, February 12, 2009

Using templates to improve your database code

If you're like me, when it's time to do some coding the first thing you do is look for some code to copy. It's pretty sensible - why reinvent the wheel when there's something out there that may already do what you need?

But instead of just copying whatever code that may already exist that seems to suits your needs, extend the concept of copying code, and make a set of templates. For instance, you could have a set of stored procedure templates for the following purposes:

- Loading data feeds, say from an external vendor
- Pushing out data feeds
- Processing exception records
- Loading data warehouse fact tables
- Reporting
- Validating before inserting data

These are oriented to data warehouse needs, but of course you could create a similar set of stored procedure templates for an OLTP database. When you find yourself needing a new type of functionality, you would create a new template.

What are the benefits to creating templates like this, and making sure people use them? Well, for one, you're leveraging the capabilities of the most experienced developers to help out the less experienced ones. The more experienced developers would set up the templates, using the standards that have been set for error trapping, logging processes, etc. Then the more junior developers could write up the code for a particular need, using the template, knowing that they're following the correct coding standards (or at least, making a good start at it).

Some other benefits are that your code becomes more readable and maintainable when stored procedures follow templates. And, of course, coding becomes much faster if you have a trusted set of templates that you can start out with.

Thursday, February 5, 2009

Introduction

I've been thinking about creating a database oriented blog for a while now. Gradually a stockpile of article ideas has been accumulating in my notes, and now is the time to starting expanding on some of them.

A little about my background - I've been working on databases for 15 years. I started out working for Microsoft in tech support on DOS 7, as a contractor. I had no formal computer background whatsoever, but it turned out that I really enjoyed it, and was pretty good at it.

I switched to tech support for Microsoft Access when it first came out, then started working more on ODBC and MS SQL Server issues. After a year or so of solving problems on the phone for people who were making about 5 times what I was, I decided that doing database consulting/contracting was the way to go. I got my MCDBA certification (I was one of the first!) and have been doing that ever since, focusing on MS SQL Server, but also with a heavy dose of OLAP, Perl, and various other tools.

It's been a great career for me - flexible and lucrative, and often quite interesting. This blog is a way for me to give back to the database community, share some of the strategies that have been useful for me and the companies I've worked at, and also to bitch and moan about some of the problems that always seems to come up, that are also very hard to change.

Unless it's something I'm really excited about, I'm going to stay away from SQL tips and tricks. That kind of information is plentiful, and just a Google search away. I'm planning on articles about topics such as:

- Why is it so important to set up and maintain great development and test
environments?
- Databases and bug tracking software - best practices
- Tracking database usage - why bother?

I welcome comments and ideas!

Databases Are Fun