Wednesday, June 1, 2011

Masking PII (Personally Identifiable Information)

Privacy has been an increasing concern in information technology, and many companies have policies in place that prevent data that could identify a person from being easily accessed. This data is called PII (Personally Identifiable Information). Laws like HIPAA (Health Insurance Portability and Accountability Act) in particular are restrictive of how PII data can be accessed

If these laws are strictly interpreted, it can cause concerns even with staightforward database tasks, such as making a copy of a production database available for development work. Some companies now forbid this common and useful practice.

Using a script such as ##Common_PIIInfoDelete can diminish these concerns. This script will update all fields identified as PII to a neutral value (such as an empty string).

I've included a sample that shows how to call ##Common_PIIInfoDelete. Before you run it, please note the following:

  • This script must never be run in any production database. There's a validation in the script that checks the server name. This should be configured for your environment.
  • You'll need to update the values in the ##Settings table to match the fields that you want to mask
  • The script currently runs all the updates in one chunk. It could be improved by using the primary key of the table to run in chunks.
  • This script is set up as a temporary stored procedure, which works best in my environment. It could easily be made a regular stored procedure, or a regular sql script without stored procedure.


Create Procedure ##Common_PIIInfoDelete
as
/*
*********************************************************************
Description:

Mask PII info (Personally Identifiable Information).
This temporary stored procedure expects a temp table (##Settings) to exist which
contains the fields that should be masked.

*********************************************************************
*/

set nocount on

-- Declarations -----------------------------------------------------------
declare
    @RC_FAILURE int
    ,@RC_SUCCESS int
    ,@ExitCode int
    ,@ProcedureName sysname
    ,@RC int -- Return code from called SP
    ,@RowCount int
    ,@ERRUNEXPECTED int
    ,@RaiseMessage nvarchar(4000)
    ,@ErrorNumber int
    ,@ErrorSeverity int
    ,@ErrorState int
    ,@ErrorLine int
    ,@UpdateSQL nvarchar(4000)
    ,@SetSQL nvarchar(4000)
    ,@Table_Schema sysname
    ,@Table_Name sysname
    ,@MaskValue varchar(10)
    ,@NewLine char(2)

-- Initializations -----------------------------------------------------------
select
    @RC_FAILURE = -100
    ,@RC_SUCCESS = 0
    ,@RC = 0
    ,@ExitCode = @RC_SUCCESS
    ,@ProcedureName = '##Common_PIIInfoDelete'
    ,@ErrorSeverity = 16
    ,@ErrorState = 1
    ,@ErrorLine = 0
    ,@RC = 0
    ,@NewLine = char(13) + char(10)

begin try

-- Validations -----------------------------------------------------------
if object_id('tempdb..##Settings') is null begin
    Select @RaiseMessage = 'Error - Temporary table ##Settings is required.'
    Goto ErrorHandler
end

If @@ServerName not like '%' begin
    Select @RaiseMessage = '!!! Error - Do NOT run in production !!!'
    Goto ErrorHandler
end

-- Loop through ##Settings table and run update to mask PII for each table
select @RaiseMessage = 'Beginning stored procedure ' + @ProcedureName + ' at ' + convert(varchar(50), getdate(),120)
RAISERROR (@RaiseMessage, 0, 1) WITH NOWAIT

Declare Settings cursor local forward_only read_only
For
Select distinct
    Table_Schema
    ,Table_Name
from
##Settings
Open Settings
Fetch next from Settings into @Table_Schema, @Table_Name

While @@fetch_status=0
    begin
    -- Create the sql needed for the main part of the update statement
    Select @UpdateSQL = 'Update ' + @Table_Schema + '.' + @Table_Name + ' Set '
    
    -- Create the sql needed for the Set part of the update statement
    Select @SetSQL = ''
    Select @SetSQL = @SetSQL + ', ' + Column_Name + '=' + MaskValue from ##Settings where Table_Schema = @Table_Schema and Table_Name = @Table_Name
    Select @SetSQL = substring(@SetSQL, 2, 4000) -- Trim the first comma from @SetSQL
    select @UpdateSQL = @UpdateSQL + @SetSQL
    
    -- Execute the sql
    select @RaiseMessage = 'Beginning sql statement: ' + convert(varchar(50), getdate(),120) + ':' + @UpdateSQL
    RAISERROR (@RaiseMessage, 0, 1) WITH NOWAIT
    exec sp_executesql @UpdateSQL
    Select @Rowcount = @@Rowcount
    select @RaiseMessage = 'Completed sql statement: ' + convert(varchar(50), getdate(),120) + '; Updated: ' + convert(varchar(10), @Rowcount)
    RAISERROR (@RaiseMessage, 0, 1) WITH NOWAIT
    
    Fetch next from Settings into @Table_Schema, @Table_Name

end
Close Settings
Deallocate Settings

-- Catch block -----------------------------------------------------------
end try
begin catch
    Select @RaiseMessage = 'Error on ' + @RaiseMessage
    select 
        @ExitCode = @RC_FAILURE
        , @RaiseMessage = @RaiseMessage + ' : ' + error_message()
        , @ErrorNumber = error_number()
        , @ErrorSeverity = error_severity()
        , @ErrorState = error_state()
        , @ErrorLine = error_line()
    
    goto ErrorHandler
end catch
goto ExitProc

-- Error Handler -----------------------------------------------------------
ErrorHandler:
    RAISERROR (@RaiseMessage, 16, 1) WITH NOWAIT
    goto ExitProc

-- Exit Procedure -----------------------------------------------------------
ExitProc:
    return (@ExitCode)
go



In order to use the stored procedure ##Common_PIIInfoDelete, you need to first create the ##Settings table, populate it, and then call it. Here's an example:


if object_id('tempdb..##Settings') is not null drop table ##Settings
Create Table ##Settings (
    Table_Schema sysname
    ,Table_Name sysname
    ,Column_Name sysname
    ,MaskValue varchar(50)
)

-- Now, insert into the ##Settings table. These values will be used when calling the stored procedure
Insert into ##Settings
    -- (Table_Schema Table_Name           Column_Name               MaskValue
-------------------------------------------------------------------------------------
select 'dbo'          ,'CCDetailLog'      ,'FirstName'              ,''''''                     union all
select 'dbo'          ,'CCDetailLog'      ,'LastName'               ,''''''                     union all
select 'dbo'          ,'CCDetailLog'      ,'Address1'               ,''''''                     union all
select 'dbo'          ,'CCDetailLog'      ,'CustomerEmailAddress'   ,''''''                     union all
select 'dbo'          ,'CCDetailLog'      ,'CreditCardNbr'          ,'convert(varbinary, '''')' union all
select 'Payment'      ,'CCDetailLog'      ,'PhoneAreaCodeNbr'       ,''''''                     union all
select 'Payment'      ,'CCDetailLog'      ,'PhoneLocalNbr'          ,''''''                     union all
select 'Payment'      ,'CCDetailLog'      ,'PhoneExtensionNbr'      ,''''''                     union all
select 'Stage'        ,'CCDetailLogStage' ,'LastName'               ,''''''                     union all
select 'Stage'        ,'CCDetailLogStage' ,'Address1'               ,''''''                     union all
select 'Stage'        ,'CCDetailLogStage' ,'CustomerEmailAddress'   ,''''''

-- Call the stored procedure to mask the PII values
exec ##Common_PIIInfoDelete