Tuesday, April 21, 2020

Efficiently analyzing deadlocks on SQL Server with Extended Events

At my current consulting contract, we were running into a lot of deadlocks. It's been a long time since I analyzed deadlocks. And the system I'm working on had a lot of them, right after we had a very sudden and extreme increase in usage.

There were so many deadlocks that we needed to be efficient in deciding which ones to fix first. So I researched. I hit a lot of dead ends, and ran into a lot of advertising for specific database monitoring tools. But it turns out that there's some very good deadlock reporting, built right into SQL Server, out of the box.

Basically, SQL Server Extended events can be queried for deadlock occurrences. You have to use XML querying, with nodes, but there's plenty of examples online. The script I started out with is the one here: Querying Deadlocks From System_Health XEvent. I made some modifications to be able to do more analysis without constantly rerunning the base query, which takes a while.

So, Step 1 is to run the below, which creates a global temp table called ##DeadlockReporting, which can be queried later.

 if object_id('tempdb..##DeadlockReporting') is not null drop table tempdb..##DeadlockReporting  
      
 DECLARE @SessionName SysName   
 SELECT @SessionName = 'system_health'  
 /*   
 SELECT Session_Name = s.name, s.blocked_event_fire_time, s.dropped_buffer_count, s.dropped_event_count, s.pending_buffers  
 FROM sys.dm_xe_session_targets t  
      INNER JOIN sys.dm_xe_sessions s ON s.address = t.event_session_address  
 WHERE target_name = 'event_file'  
 --*/  
   
 IF OBJECT_ID('tempdb..#Events') IS NOT NULL BEGIN  
      DROP TABLE #Events  
 END  
   
 DECLARE @Target_File NVarChar(1000)  
      , @Target_Dir NVarChar(1000)  
      , @Target_File_WildCard NVarChar(1000)  
   
 SELECT @Target_File = CAST(t.target_data as XML).value('EventFileTarget[1]/File[1]/@name', 'NVARCHAR(256)')  
 FROM sys.dm_xe_session_targets t  
      INNER JOIN sys.dm_xe_sessions s ON s.address = t.event_session_address  
 WHERE s.name = @SessionName  
      AND t.target_name = 'event_file'  
   
 SELECT @Target_Dir = LEFT(@Target_File, Len(@Target_File) - CHARINDEX('\', REVERSE(@Target_File)))   
   
 SELECT @Target_File_WildCard = @Target_Dir + '\' + @SessionName + '_*.xel'  
   
 --Keep this as a separate table because it's called twice in the next query. You don't want this running twice.  
 SELECT DeadlockGraph = CAST(event_data AS XML)  
      , DeadlockID = Row_Number() OVER(ORDER BY file_name, file_offset)  
 INTO #Events  
 FROM sys.fn_xe_file_target_read_file(@Target_File_WildCard, null, null, null) AS F  
 WHERE event_data like '<event name="xml_deadlock_report%'  
   
 ;WITH Victims AS  
 (  
      SELECT VictimID = Deadlock.Victims.value('@id', 'varchar(50)')  
           , e.DeadlockID   
      FROM #Events e  
           CROSS APPLY e.DeadlockGraph.nodes('/event/data/value/deadlock/victim-list/victimProcess') as Deadlock(Victims)  
 )  
 , DeadlockObjects AS  
 (  
      SELECT DISTINCT e.DeadlockID  
           , ObjectName = Deadlock.Resources.value('@objectname', 'nvarchar(256)')  
      FROM #Events e  
           CROSS APPLY e.DeadlockGraph.nodes('/event/data/value/deadlock/resource-list/*') as Deadlock(Resources)  
 )  
 SELECT *  
 Into ##DeadlockReporting  
 FROM  
 (  
      SELECT   
     e.DeadlockID  
           ,TransactionTime = Deadlock.Process.value('@lasttranstarted', 'datetime')  
           , DeadlockGraph  
           , DeadlockObjects = substring((SELECT (', ' + o.ObjectName)  
                                    FROM DeadlockObjects o  
                                    WHERE o.DeadlockID = e.DeadlockID  
                                    ORDER BY o.ObjectName  
                                    FOR XML PATH ('')  
                                    ), 3, 4000)  
           , Victim = CASE WHEN v.VictimID IS NOT NULL   
                                    THEN 1   
                               ELSE 0   
                               END  
           , SPID = Deadlock.Process.value('@spid', 'int')  
           , ProcedureName = Deadlock.Process.value('executionStack[1]/frame[1]/@procname[1]', 'varchar(200)')  
           , LockMode = Deadlock.Process.value('@lockMode', 'char(1)')  
           , Code = Deadlock.Process.value('executionStack[1]/frame[1]', 'varchar(1000)')  
     , ClientApp = CASE LEFT(Deadlock.Process.value('@clientapp', 'varchar(100)'), 29)  
                               WHEN 'SQLAgent - TSQL JobStep (Job '  
                                    THEN 'SQLAgent Job: ' + (SELECT name FROM msdb..sysjobs sj WHERE substring(Deadlock.Process.value('@clientapp', 'varchar(100)'),32,32)=(substring(sys.fn_varbintohexstr(sj.job_id),3,100))) + ' - ' + SUBSTRING(Deadlock.Process.value('@clientapp', 'varchar(100)'), 67, len(Deadlock.Process.value('@clientapp', 'varchar(100)'))-67)  
                               ELSE Deadlock.Process.value('@clientapp', 'varchar(100)')  
                               END   
           , HostName = Deadlock.Process.value('@hostname', 'varchar(20)')  
           , LoginName = Deadlock.Process.value('@loginname', 'varchar(20)')  
           , InputBuffer = Deadlock.Process.value('inputbuf[1]', 'varchar(1000)')  
      FROM #Events e  
           CROSS APPLY e.DeadlockGraph.nodes('/event/data/value/deadlock/process-list/process') as Deadlock(Process)  
           LEFT JOIN Victims v ON v.DeadlockID = e.DeadlockID AND v.VictimID = Deadlock.Process.value('@id', 'varchar(50)')  
 ) X --In a subquery to make filtering easier (use column names, not XML parsing), no other reason  
 ORDER BY TransactionTime DESC  
   

Now I have a global temp table called ##DeadlockReporting.

Step 2 is to run queries against it. Here's some that I wrote up. The most useful is the one with the header "Get the worst offenders in past 24 hours". This will give you a good picture of the deadlock—the affected tables, the stored procedures that are being called, the specific problem code. Then, click on the DeadlockXML in order to see the full details of the deadlock.

You may find some really obvious targets for fixes that need to be made. I found some update statements that contained subqueries in the Set clause that needed to be fixed.

 Select top 100 * From ##DeadlockReporting order by TransactionTime desc  
   
 -- Get the worst offenders in past 24 hours  
 -- Click on the DeadlockXML to show details  
 Select   
   DeadlockObjects  
   ,ProcedureName  
   ,Code = min(Code)  
      ,DeadlockXML = convert(xml, min(convert(varchar(max), DeadlockGraph) ))  
      ,FirstDeadlock = Min(TransactionTime)  
      ,LastDeadlock = Max(TransactionTime)  
   ,PresumedTotalDeadlocks = Count(*)    
 from ##DeadlockReporting    
 Where TransactionTime >= dateadd (hh, -24, getdate() )  
 Group by   
   DeadlockObjects  
   ,ProcedureName  
 Order by Count(*)  desc  
   
 -- Get summary past 24 hours, by hour  
 ;with DeadlockReportingFormatted as (  
   Select   
     TransactionHour_UTC = convert(varchar(13), TransactionTime, 120) + ':00'  
     ,TransactionHour_PST =   
       convert(varchar(13), TransactionTime AT TIME ZONE 'UTC' AT TIME ZONE 'Pacific Standard Time', 120)   
       + ':00'  
     ,*  
   from ##DeadlockReporting     
 )  
 Select   
   TransactionHour_UTC  
   ,TransactionHour_PST  
   ,DeadlockObjects  
   ,ProcedureName  
   ,PresumedTotalDeadlocks = Count(*)    
 From DeadlockReportingFormatted  
 Where TransactionTime >= dateadd(hh, -24, getdate())  
 Group by   
   TransactionHour_UTC  
   ,TransactionHour_PST  
   ,DeadlockObjects  
   ,ProcedureName  
 Order by   
   TransactionHour_UTC desc  
   ,PresumedTotalDeadlocks desc