Databases Are Fun

Saturday, September 29, 2012

Develop better database code in one-tenth the time

Okay, maybe one-tenth the time is an exaggeration. But in some cases, it's possible to cut down development time tremendously.

When working on complex ETL processes, or huge stored procedures, the sheer size of the data and processes often make the development process much slower than it needs to be. For instance, if when making a fix you need to run a stored procedure that takes 20 minutes, then every single minor mistake you make in your code will cost you at the very least more than 20 minutes to find and fix.

However, if you're able to cut down your code or data in such a way that it runs in one minute in the development environment, you've just improved the speed of development tremendously, because each iteration of fixes will take much less time.

So, what are some ways to do this? The first would be to, if possible, reduce the size of the data you're working with in your development environment. Can you truncate most of the data, leaving only what's critical to test your fix? This can be a little tricky in a database with a lot of foreign keys set up. But on the other hand, once you have a script to trim your data, you can reuse it. Of course, before actually releasing code tested in an environment with limited data, you'd want to test it in an integration environment with a full data set.

The next best way is to ruthlessly trim the process until you have the chunk of code that matters, and only work with that. For instance, recently I was working on a stored procedure that usually takes about five minutes to run. Five minutes isn't that bad, but I was working on some tricky algorithm changes and I knew I'd need to be iterating a lot before I got it right. So, instead of putting in my changes and running them (and leaving myself 5 minutes each run to get distracted), I just isolated the subset of the process that really mattered, commented out everything else I could, and wrote the data to a global temporary table that I could then examine for problems. So instead of having every single problem show up only after 5 minutes, I could find it after just a few seconds. I ended up getting it done very quickly, because there was no down-time.

I find it's more fun to work this way, too. If you constantly have these 5 or 10 minute chunks when you're just waiting for something to happen, it's easy to get bored and distracted, and forget the details of what you're working on.

Monday, March 19, 2012

How to use a column name as an argument in a TOP clause

I needed to use a column name as an argument in a TOP clause recently in SQL Server. I had to research and experiment quite a bit to find what I was looking for, so I put together some sample code that should help get you started, if you're trying to do the same thing.

Note that in the last result set, with the Top clause, the number of records from each Market corresponds to the associated value in the MarketType.TopCustomersToGet. I'm using a cross apply here, which worked for me. Be sure and test performance, some sources indicate that you might get better performance with a CTE.


Declare @MarketType table (MarketTypeID tinyint, MarketImportance varchar(10), TopCustomersToGet int)
insert into @MarketType  values
(1  , 'High', 4)
,(2 , 'Low' , 2)

Declare @Market Table (MarketName varchar(20), MarketTypeID int) 
insert into @Market values
('Paris'    ,1)
,('London'  ,1)
,('Miami'   ,2)
,('Seattle' ,2)

Declare @Customer table (MarketName varchar(20), CustomerName varchar(20), CustomerRank int)
insert into @Customer  values
('Paris','Wayne',1)
,('Paris','Colleen',2)
,('Paris','Manuel',3)
,('Paris','Michelle',4)
,('Paris','Jesse',5)
,('London','Jenny',1)
,('London','Patrick',2)
,('London','Megan',3)
,('London','Alice',4)
,('London','Olga',5)
,('Miami','Brandon',1)
,('Miami','Alfonso',2)
,('Miami','Benjamin',3)
,('Miami','Harry',4)
,('Miami','Stephen',5)
,('Seattle','Willie',1)
,('Seattle','Allen',2)
,('Seattle','Megan',3)
,('Seattle','Danny',4)
,('Seattle','Manuel',5)

Select * from @MarketType
Select * from @Market
Select * from @Customer

Select 
    MarketType.MarketImportance
    ,Market.MarketName
    ,TopCustomers.CustomerName
From @Market Market
    join @MarketType MarketType
        on Market.MarketTypeID = MarketType.MarketTypeID 
Cross apply (
    Select top 
        (MarketType.TopCustomersToGet)
        CustomerName
    from @Customer Customer
    where 
        Customer.MarketName = Market.MarketName
    order by
        Customer.CustomerRank desc
    ) as TopCustomers

Wednesday, November 16, 2011

Quickly set Identity_Insert off for all tables in a database

In some situations, you may be turning the identity_insert property on and off for tables - for example, when doing an initial setup of data. Since only one table can have this turned on at a time (per session), here's a script to turn the identity_insert property off on all tables, so that you can turn it on for the table that you need.

This only works because SQL Server won't fail if the identity_insert is not turned on. However, it will fail when the table does not have an identity property set (this is why you can't just use sp_msforeachtable).

Please note that this is a quick script - use with caution and test first! A better way to do this would probably be just to turn the identity_insert off right after you use it.

Declare

    @NewLine char(1)

    ,@SQLStatement nvarchar(max)

Set @NewLine = char(13) + char(10)



-- procedures

select @SQLStatement = isnull( @SQLStatement + @NewLine, '' ) +

    'set identity_insert ' + object_name(object_id) + ' off'

from SYS.IDENTITY_COLUMNS



-- exec sp_executesql @SQLStatement

Print @SQLStatement

Wednesday, June 1, 2011

Masking PII (Personally Identifiable Information)

Privacy has been an increasing concern in information technology, and many companies have policies in place that prevent data that could identify a person from being easily accessed. This data is called PII (Personally Identifiable Information). Laws like HIPAA (Health Insurance Portability and Accountability Act) in particular are restrictive of how PII data can be accessed

If these laws are strictly interpreted, it can cause concerns even with staightforward database tasks, such as making a copy of a production database available for development work. Some companies now forbid this common and useful practice.

Using a script such as ##Common_PIIInfoDelete can diminish these concerns. This script will update all fields identified as PII to a neutral value (such as an empty string).

I've included a sample that shows how to call ##Common_PIIInfoDelete. Before you run it, please note the following:

This script must never be run in any production database. There's a validation in the script that checks the server name. This should be configured for your environment.
You'll need to update the values in the ##Settings table to match the fields that you want to mask
The script currently runs all the updates in one chunk. It could be improved by using the primary key of the table to run in chunks.
This script is set up as a temporary stored procedure, which works best in my environment. It could easily be made a regular stored procedure, or a regular sql script without stored procedure.

Create Procedure ##Common_PIIInfoDelete
as
/*
*********************************************************************
Description:

Mask PII info (Personally Identifiable Information).
This temporary stored procedure expects a temp table (##Settings) to exist which
contains the fields that should be masked.

*********************************************************************
*/

set nocount on

-- Declarations -----------------------------------------------------------
declare
    @RC_FAILURE int
    ,@RC_SUCCESS int
    ,@ExitCode int
    ,@ProcedureName sysname
    ,@RC int -- Return code from called SP
    ,@RowCount int
    ,@ERRUNEXPECTED int
    ,@RaiseMessage nvarchar(4000)
    ,@ErrorNumber int
    ,@ErrorSeverity int
    ,@ErrorState int
    ,@ErrorLine int
    ,@UpdateSQL nvarchar(4000)
    ,@SetSQL nvarchar(4000)
    ,@Table_Schema sysname
    ,@Table_Name sysname
    ,@MaskValue varchar(10)
    ,@NewLine char(2)

-- Initializations -----------------------------------------------------------
select
    @RC_FAILURE = -100
    ,@RC_SUCCESS = 0
    ,@RC = 0
    ,@ExitCode = @RC_SUCCESS
    ,@ProcedureName = '##Common_PIIInfoDelete'
    ,@ErrorSeverity = 16
    ,@ErrorState = 1
    ,@ErrorLine = 0
    ,@RC = 0
    ,@NewLine = char(13) + char(10)

begin try

-- Validations -----------------------------------------------------------
if object_id('tempdb..##Settings') is null begin
    Select @RaiseMessage = 'Error - Temporary table ##Settings is required.'
    Goto ErrorHandler
end

If @@ServerName not like '%' begin
    Select @RaiseMessage = '!!! Error - Do NOT run in production !!!'
    Goto ErrorHandler
end

-- Loop through ##Settings table and run update to mask PII for each table
select @RaiseMessage = 'Beginning stored procedure ' + @ProcedureName + ' at ' + convert(varchar(50), getdate(),120)
RAISERROR (@RaiseMessage, 0, 1) WITH NOWAIT

Declare Settings cursor local forward_only read_only
For
Select distinct
    Table_Schema
    ,Table_Name
from
##Settings
Open Settings
Fetch next from Settings into @Table_Schema, @Table_Name

While @@fetch_status=0
    begin
    -- Create the sql needed for the main part of the update statement
    Select @UpdateSQL = 'Update ' + @Table_Schema + '.' + @Table_Name + ' Set '
    
    -- Create the sql needed for the Set part of the update statement
    Select @SetSQL = ''
    Select @SetSQL = @SetSQL + ', ' + Column_Name + '=' + MaskValue from ##Settings where Table_Schema = @Table_Schema and Table_Name = @Table_Name
    Select @SetSQL = substring(@SetSQL, 2, 4000) -- Trim the first comma from @SetSQL
    select @UpdateSQL = @UpdateSQL + @SetSQL
    
    -- Execute the sql
    select @RaiseMessage = 'Beginning sql statement: ' + convert(varchar(50), getdate(),120) + ':' + @UpdateSQL
    RAISERROR (@RaiseMessage, 0, 1) WITH NOWAIT
    exec sp_executesql @UpdateSQL
    Select @Rowcount = @@Rowcount
    select @RaiseMessage = 'Completed sql statement: ' + convert(varchar(50), getdate(),120) + '; Updated: ' + convert(varchar(10), @Rowcount)
    RAISERROR (@RaiseMessage, 0, 1) WITH NOWAIT
    
    Fetch next from Settings into @Table_Schema, @Table_Name

end
Close Settings
Deallocate Settings

-- Catch block -----------------------------------------------------------
end try
begin catch
    Select @RaiseMessage = 'Error on ' + @RaiseMessage
    select 
        @ExitCode = @RC_FAILURE
        , @RaiseMessage = @RaiseMessage + ' : ' + error_message()
        , @ErrorNumber = error_number()
        , @ErrorSeverity = error_severity()
        , @ErrorState = error_state()
        , @ErrorLine = error_line()
    
    goto ErrorHandler
end catch
goto ExitProc

-- Error Handler -----------------------------------------------------------
ErrorHandler:
    RAISERROR (@RaiseMessage, 16, 1) WITH NOWAIT
    goto ExitProc

-- Exit Procedure -----------------------------------------------------------
ExitProc:
    return (@ExitCode)
go

In order to use the stored procedure ##Common_PIIInfoDelete, you need to first create the ##Settings table, populate it, and then call it. Here's an example:

if object_id('tempdb..##Settings') is not null drop table ##Settings
Create Table ##Settings (
    Table_Schema sysname
    ,Table_Name sysname
    ,Column_Name sysname
    ,MaskValue varchar(50)
)

-- Now, insert into the ##Settings table. These values will be used when calling the stored procedure
Insert into ##Settings
    -- (Table_Schema Table_Name           Column_Name               MaskValue
-------------------------------------------------------------------------------------
select 'dbo'          ,'CCDetailLog'      ,'FirstName'              ,''''''                     union all
select 'dbo'          ,'CCDetailLog'      ,'LastName'               ,''''''                     union all
select 'dbo'          ,'CCDetailLog'      ,'Address1'               ,''''''                     union all
select 'dbo'          ,'CCDetailLog'      ,'CustomerEmailAddress'   ,''''''                     union all
select 'dbo'          ,'CCDetailLog'      ,'CreditCardNbr'          ,'convert(varbinary, '''')' union all
select 'Payment'      ,'CCDetailLog'      ,'PhoneAreaCodeNbr'       ,''''''                     union all
select 'Payment'      ,'CCDetailLog'      ,'PhoneLocalNbr'          ,''''''                     union all
select 'Payment'      ,'CCDetailLog'      ,'PhoneExtensionNbr'      ,''''''                     union all
select 'Stage'        ,'CCDetailLogStage' ,'LastName'               ,''''''                     union all
select 'Stage'        ,'CCDetailLogStage' ,'Address1'               ,''''''                     union all
select 'Stage'        ,'CCDetailLogStage' ,'CustomerEmailAddress'   ,''''''

-- Call the stored procedure to mask the PII values
exec ##Common_PIIInfoDelete

Tuesday, February 1, 2011

Using Temporary Stored Procedures

Does it make sense to use temporary stored procedures? My answer before running into this particular situation would be been, "No—why would you, when you can create a regular stored procedure?".

However, I know now that there ARE some cases where it makes sense to use a temporary stored procedure. Specifically, if these 2 conditions exist:

1. The business has very rigid procedural requirements that must be met before a new stored procedure can be put into production, and

2. You need to continually make data changes that are not possible with the regular tools and services that use the database. For instance, manually deactivating or deleting records, adding complex domain data, etc. You frequently have to make the same type of complex change over and over.

Before starting to use temporary stored procedures, I or someone on my team would write up a separate script to do whatever task needed doing. For instance, if I needed to add a set of domain data (including validation, lookups, and logging) we would script it out, save it to the source code control system, and send in a service request to get it run in production. The next time a similar script was needed we would copy and paste the old script (assuming we could find it), change the applicable values, check it in and submit another service request.

However, this type of copy and paste programming is very error prone and time-consuming (see the wikipedia entry here). It's best to avoid it where possible. What I've started doing instead is creating temporary stored procedures for these types of tasks. I script them out and save them to an "OperationalScripts" folder in our source code control tool. Then, whenever I need to do a similar task, I will create a two-step service request.

Step 1 is to create the temporary stored procedure. For instance,

Create Procedure #ConfigurationAddDomainData
(
    @pProductID         int               
    ,@pComponentID      int
    ,@pLastUpdatedBy    varchar(32)   
)

as
-- long script with complex error handling, validation and logging

Step 2 is a call to the temporary stored procedure.

exec #ConfigurationAddDomainData
    @pProductID         = 15
    ,@pComponentID      = 28
    ,@pLastUpdatedBy    = 'Bug 1234'

When time and resources allow, this temporary stored procedure can be incorporated into an administrative tool. But meanwhile, nobody is wasting time and effort rewriting throwaway scripts.

Thursday, November 11, 2010

Playing with SQL Azure; part 3

Well, I think I'm done with SQL Azure. Not because I'm finding any kind of major problems with it, just because—there's not enough that's different between regular SQL Server and SQL Azure!

The last thing I did here was create some additional tables and stored procedures in my SQL Azure database. I inserted to these tables via BCP, that was also fairly straightforward once I figured out a few gotchas (naming for the user and server).

I thought there would be so much to learn, but honestly, for a database developer (as opposed to a DBA) it appears to be 99% the same as SQL Server. Now, if I were doing a full fledged pilot project, with stress tests, investigating privacy issues, size issues etc, it would be different. But for just playing around for my own interest, I'm all done, and it's been fun!

Thursday, November 4, 2010

Playing with SQL Azure; part 2

I'm working my way through the SQL Azure tutorial (the one I found most useful was here).

Ran into some surprises - you can't use the "Use DatabaseName" syntax to switch from one database to another (you have to open a connection in another database).

Also, it appears that you can't reference another database on the same server. For instance, when running the following:

select * from TestSylvia..test1

I got this error:

Msg 40515, Level 16, State 1, Line 1

Reference to database and/or server name in 'TestSylvia..test1' is not supported in this version of SQL Server.

From some searching online, it looks like that is being worked on.

So, there's some thing that are not yet possible. But what really struck me is the fact that overall, working on SQL Azure is so similar to working with an on-premises MS SQL Server database. I look forward to exploring further. And writing up a micro blog post on SQL Azure really motivates me to explore further.