Sunday, June 14, 2020

Analyzing a slow stored procedure using Query Store

I'll be honest - I've never used Query Store before in SQL Server, even though it's been out now since SQL Server 2016.

Now that I'm using it, I'm blown away at how useful it is. And the truth is that few people know how to use it well. There's not much written up on it. It takes quite some time for articles and blog posts (like this one) to appear that tell people when and how to use the new features.

Before Query Store was available, here's what I often did when I had to analyze long stored procedures for performance:

UPDATE	dbo.HotelFact
SET	SalesForceLastUpdateDate = ##SalesForce.LastUpdateDate          
FROM
    dbo.HotelFact
    INNER JOIN dbo.dimhotel  
        ON dimhotel.HotelKey = HotelFact.HotelKey
    INNER JOIN ##SalesForce
        ON ##SalesForce.SalesForceAccountNumber = dimHotel.SFAccountNumber
WHERE
    isnull(HotelFact.SalesForceLastUpdateDate,0)  <> isnull(##SalesForce.LastUpdateDate, 0)

exec MessageLog @DatabaseName, @ProcedureName , 'Info', 'Update records in HotelFact with SalesForce updates', @@Rowcount


Notice the call at the end, to MessageLog? After every significant statement, I would call it, and it logged the duration to a Message table. I'd check the Message to see what took longest.

That was a hassle. And it made the stored procedure harder to read. And of course, if the stored procedure didn't already use MessageLog, you'd have to update it.

But Query Store data blows this away. It gives you all the information you need to troubleshoot stored procedure performance at the statement level, with any metric, such as logical reads or CPU.

Say you have a stored procedure that's hundreds of lines long, and does dozens of reads and writes. And it's slow. You look at it and groan, imaging what a pain it will be to even figure out where you need to focus your attention.

No need to worry. Once you have Query Store turned on, you can analyze it in moments. This query gave me, for the specific procedure that was causing a problem, all the query plans, ordered by cpu time, for the last 24 hours.

 -- Query plan details on one object, perhaps a stored procedure that has a high duration
-- In this case, getting details on SE_SaveBuffer
SELECT qsq.query_id
	,qsp.plan_id
	,OBJECT_NAME(qsq.object_id) AS ObjectName
	,SUM(rs.count_executions) AS TotalExecutions
	,AVG(rs.avg_duration) AS Avg_Duration
	,AVG(rs.avg_cpu_time) AS Avg_CPU
	,AVG(rs.avg_logical_io_reads) AS Avg_LogicalReads
	,MIN(qst.query_sql_text) AS Query
FROM sys.query_store_query qsq
JOIN sys.query_store_query_text qst
	ON qsq.query_text_id = qst.query_text_id
JOIN sys.query_store_plan qsp
	ON qsq.query_id = qsp.query_id
JOIN sys.query_store_runtime_stats rs
	ON qsp.plan_id = rs.plan_id
JOIN sys.query_store_runtime_stats_interval rsi
	ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
WHERE qsq.object_id = OBJECT_ID(N'dbo.SE_SaveBuffer')
	AND rs.last_execution_time > DATEADD(HOUR, - 24, GETUTCDATE())
	AND rs.execution_type = 0
GROUP BY qsq.query_id
	,qsp.plan_id
	,OBJECT_NAME(qsq.object_id)
ORDER BY
	AVG(rs.avg_cpu_time) DESC

Once I got this, I knew exactly where to look—at the top of the list, the ones that are taking the longest. This is where the analysis needs to occur. The following query will allow you to see the query plan for a specific statement. You need to update it with your plan_id.

SELECT
    ConvertedPlan = TRY_CONVERT(XML, [query_plan])
    ,*
FROM sys.query_store_plan
WHERE
    plan_id IN (1830037)

At this point in my troubleshooting, it was easy. The worst performing statement, that took most of the CPU time, had a missing index. Adding that index improved the overall performance dramatically. 

So there you have it—a new tool that can make performance troubleshooting of long-running stored procedures a piece of cake. 

I haven't gone into all the details of how to actually set up the Query Store. The Microsoft documentation is great, and I also like the articles by Erin Stellato (https://sqlperformance.com/author/erinsqlskills-com), 

Sunday, May 10, 2020

Performance analysis on SQL Server—how to find resource hogs

There's many different ways of doing performance analysis in SQL Server, and multiple tools you can use. Or, you can also wait for users to complain, and tell you that that certain parts of the application are slow.

Of course, the built-in dynamic management view dm_exec_query_stats is a great tool to add to your list. It can be very useful just as it is. But in doing some performance research recently, I was able to get some very targeted comparative information on performance. This information was extremely useful in deciding which stored procedures, precisely, should be analyzed and improved. Often the stored procedures ones that are at the top of the list have a glaringly obvious flaw that can be fixed, for a big performance boost.

The below script creates a global temporary table, called ##PerformanceReporting. The table is then populated with information from dm_exec_query_stats. The ObjectName field is parsed out from the query text.

A few caveats needs to be given here. The code below is best suited for systems where the database is accessed through stored procedures. Also, stored procedures and functions are listed separately, but most often functions will be used within stored procedures, so that may be misleading.

 

if object_id('tempdb..##PerformanceReporting') is not null drop table tempdb..##PerformanceReporting
go

Create table ##PerformanceReporting (
    ObjectName varchar(100)
    ,QueryText varchar(100)
    ,TotalLogicalReads bigint
    ,ExecutionCount int
    ,TotalCPUTimeInSeconds int
    ,LastExecutionTime date
    ,Rank_LogicalReads int
    ,Rank_CPU int
    ,SumTotalLogicalReads bigint
    ,SumTotalCPU bigint
)    

-- Insert high logical reads
Insert into ##PerformanceReporting (
    QueryText
    ,TotalLogicalReads 
    ,ExecutionCount 
    ,TotalCPUTimeInSeconds 
    ,LastExecutionTime 
    ,Rank_LogicalReads 
    ,SumTotalLogicalReads
    ,SumTotalCPU
)
select 
    QueryText = convert(varchar(100), SQLText.text) 
    ,TotalLogicalReads = sum(total_logical_reads )
    ,ExecutionCount = sum(execution_count )
    ,TotalCPUTimeInSeconds = sum(total_worker_time/1000000 )    
    ,LastExecutionTime = max(last_execution_time )
    ,RowNumber = ROW_NUMBER() OVER (ORDER BY sum(total_logical_reads)  desc)
    ,SumTotalLogicalReads = (select sum(total_logical_reads) from sys.dm_exec_query_stats )
    ,SumTotalCPU = (select sum( total_worker_time)/1000000 from sys.dm_exec_query_stats )    
from 
    (select top 30
        last_execution_time
        ,execution_count
        ,plan_handle
        ,total_worker_time
        ,total_logical_reads
    from sys.dm_exec_query_stats 
    order by total_logical_reads desc
    ) QueryStats
cross apply sys.dm_exec_sql_text(plan_handle) SQLText
Group by convert(varchar(100), SQLText.text) 

-- Insert high CPU 
Insert into ##PerformanceReporting (
    QueryText
    ,TotalLogicalReads 
    ,ExecutionCount 
    ,TotalCPUTimeInSeconds 
    ,LastExecutionTime 
    ,Rank_CPU 
    ,SumTotalLogicalReads    
    ,SumTotalCPU
)
Select 
     QueryText = convert(varchar(100), [text]) 
    ,TotalLogicalReads = sum(total_logical_reads)
    ,ExecutionCount = sum(execution_count )
    ,TotalCPUTimeInSeconds = sum( total_worker_time)/1000000 
    ,LastExecutionTime = max(last_execution_time )
    ,RowNumber = ROW_NUMBER() OVER (ORDER BY sum(total_worker_time) desc)
    ,SumTotalLogicalReads = (select sum(total_logical_reads) from sys.dm_exec_query_stats )
    ,SumTotalCPU = (select sum( total_worker_time)/1000000 from sys.dm_exec_query_stats )    
from 
    (select top 30
        last_execution_time
        ,total_logical_reads
        ,execution_count
        ,plan_handle 
        ,total_worker_time
    from sys.dm_exec_query_stats 
    order by total_worker_time desc) QueryStats
cross apply sys.dm_exec_sql_text(plan_handle) 
Group by convert(varchar(100), [text]) 

-- Update ObjectName to clean up the QueryText
;With CleanUpQueryText as (
 Select
        ObjectNameClean = 
   replace (
    QueryText 
    ,'CREATE Function [dbo].['
    ,'Func: ' )
  ,QueryText
 From ##PerformanceReporting
 Where QueryText like '%function%'
 Union all 
 Select
  ObjectNameClean = 
   replace (
    QueryText 
    ,'CREATE Procedure [dbo].['
    ,'Proc: ' )
  ,QueryText
 From ##PerformanceReporting
 Where QueryText like '%Procedure%'
)
Update ##PerformanceReporting
Set ObjectName = Substring(ObjectNameClean, 1, Charindex ( ']', ObjectNameClean)- 1)
From ##PerformanceReporting 
    join CleanUpQueryText
        on ##PerformanceReporting.QueryText = CleanUpQueryText.QueryText

-- Update ObjectName - only if null 
--(which can happen if it's not a procedure or a function)
Update ##PerformanceReporting
Set ObjectName = QueryText 
Where ObjectName is null

Once the ##PerformanceReporting table has been created, you can query it in a number of useful ways. The first 2 queries below show the details of the objects that have the highest logical reads, and highest CPU usage. The third query shows those objects that have both high logical reads and high CPU. Targeting these should give you the most bang for the buck, in terms of analysis and improvement.
 

-- High Logical Reads
Select 
    ObjectName 
    ,TotalLogicalReads 
    ,ExecutionCount 
    ,TotalCPUTimeInSeconds 
    ,LastExecutionTime 
    ,Rank_LogicalReads
    ,PercentOfTotal = 
        convert(decimal(4, 3), 
            (TotalLogicalReads * 1.0)/SumTotalLogicalReads
            )
From ##PerformanceReporting
Where 
    Rank_LogicalReads  is not null
Order by Rank_LogicalReads 

-- High CPU
Select 
    ObjectName 
    ,TotalLogicalReads 
    ,ExecutionCount 
    ,TotalCPUTimeInSeconds 
    ,LastExecutionTime 
    ,Rank_CPU
    ,SumTotalCPU
    ,PercentOfTotal = 
        convert(decimal(4, 3), 
            ( TotalCPUTimeInSeconds* 1.0)/SumTotalCPU 
            )    
From ##PerformanceReporting
Where 
    Rank_CPU  is not null
Order by Rank_CPU


-- High for BOTH CPU and logical reads
Select 
    ObjectName = 
        isnull(HighLogicalReads.ObjectName, HighCPU.ObjectName)
    ,HighLogicalReads.Rank_LogicalReads
    ,HighCPU.Rank_CPU    
    ,Percent_TotalLogicalReads = 
        convert(decimal(4, 3), 
            (HighLogicalReads.TotalLogicalReads * 1.0)/HighLogicalReads.SumTotalLogicalReads
            )   
    ,Percent_TotalCPU = 
        convert(decimal(4, 3), 
            ( HighCPU.TotalCPUTimeInSeconds* 1.0)/HighCPU.SumTotalCPU 
            )    
From (Select * from ##PerformanceReporting where Rank_LogicalReads is not null) HighLogicalReads
    Full Outer Join (Select * from ##PerformanceReporting where Rank_CPU is not null) HighCPU
        on HighCPU.ObjectName = HighLogicalReads.ObjectName
Order by     
    isnull(HighLogicalReads.Rank_LogicalReads, 0) 
    +
    isnull(HighCPU.Rank_CPU, 0) 

Tuesday, April 21, 2020

Efficiently analyzing deadlocks on SQL Server with Extended Events

At my current consulting contract, we were running into a lot of deadlocks. It's been a long time since I analyzed deadlocks. And the system I'm working on had a lot of them, right after we had a very sudden and extreme increase in usage.

There were so many deadlocks that we needed to be efficient in deciding which ones to fix first. So I researched. I hit a lot of dead ends, and ran into a lot of advertising for specific database monitoring tools. But it turns out that there's some very good deadlock reporting, built right into SQL Server, out of the box.

Basically, SQL Server Extended events can be queried for deadlock occurrences. You have to use XML querying, with nodes, but there's plenty of examples online. The script I started out with is the one here: Querying Deadlocks From System_Health XEvent. I made some modifications to be able to do more analysis without constantly rerunning the base query, which takes a while.

So, Step 1 is to run the below, which creates a global temp table called ##DeadlockReporting, which can be queried later.

 if object_id('tempdb..##DeadlockReporting') is not null drop table tempdb..##DeadlockReporting  
      
 DECLARE @SessionName SysName   
 SELECT @SessionName = 'system_health'  
 /*   
 SELECT Session_Name = s.name, s.blocked_event_fire_time, s.dropped_buffer_count, s.dropped_event_count, s.pending_buffers  
 FROM sys.dm_xe_session_targets t  
      INNER JOIN sys.dm_xe_sessions s ON s.address = t.event_session_address  
 WHERE target_name = 'event_file'  
 --*/  
   
 IF OBJECT_ID('tempdb..#Events') IS NOT NULL BEGIN  
      DROP TABLE #Events  
 END  
   
 DECLARE @Target_File NVarChar(1000)  
      , @Target_Dir NVarChar(1000)  
      , @Target_File_WildCard NVarChar(1000)  
   
 SELECT @Target_File = CAST(t.target_data as XML).value('EventFileTarget[1]/File[1]/@name', 'NVARCHAR(256)')  
 FROM sys.dm_xe_session_targets t  
      INNER JOIN sys.dm_xe_sessions s ON s.address = t.event_session_address  
 WHERE s.name = @SessionName  
      AND t.target_name = 'event_file'  
   
 SELECT @Target_Dir = LEFT(@Target_File, Len(@Target_File) - CHARINDEX('\', REVERSE(@Target_File)))   
   
 SELECT @Target_File_WildCard = @Target_Dir + '\' + @SessionName + '_*.xel'  
   
 --Keep this as a separate table because it's called twice in the next query. You don't want this running twice.  
 SELECT DeadlockGraph = CAST(event_data AS XML)  
      , DeadlockID = Row_Number() OVER(ORDER BY file_name, file_offset)  
 INTO #Events  
 FROM sys.fn_xe_file_target_read_file(@Target_File_WildCard, null, null, null) AS F  
 WHERE event_data like '<event name="xml_deadlock_report%'  
   
 ;WITH Victims AS  
 (  
      SELECT VictimID = Deadlock.Victims.value('@id', 'varchar(50)')  
           , e.DeadlockID   
      FROM #Events e  
           CROSS APPLY e.DeadlockGraph.nodes('/event/data/value/deadlock/victim-list/victimProcess') as Deadlock(Victims)  
 )  
 , DeadlockObjects AS  
 (  
      SELECT DISTINCT e.DeadlockID  
           , ObjectName = Deadlock.Resources.value('@objectname', 'nvarchar(256)')  
      FROM #Events e  
           CROSS APPLY e.DeadlockGraph.nodes('/event/data/value/deadlock/resource-list/*') as Deadlock(Resources)  
 )  
 SELECT *  
 Into ##DeadlockReporting  
 FROM  
 (  
      SELECT   
     e.DeadlockID  
           ,TransactionTime = Deadlock.Process.value('@lasttranstarted', 'datetime')  
           , DeadlockGraph  
           , DeadlockObjects = substring((SELECT (', ' + o.ObjectName)  
                                    FROM DeadlockObjects o  
                                    WHERE o.DeadlockID = e.DeadlockID  
                                    ORDER BY o.ObjectName  
                                    FOR XML PATH ('')  
                                    ), 3, 4000)  
           , Victim = CASE WHEN v.VictimID IS NOT NULL   
                                    THEN 1   
                               ELSE 0   
                               END  
           , SPID = Deadlock.Process.value('@spid', 'int')  
           , ProcedureName = Deadlock.Process.value('executionStack[1]/frame[1]/@procname[1]', 'varchar(200)')  
           , LockMode = Deadlock.Process.value('@lockMode', 'char(1)')  
           , Code = Deadlock.Process.value('executionStack[1]/frame[1]', 'varchar(1000)')  
     , ClientApp = CASE LEFT(Deadlock.Process.value('@clientapp', 'varchar(100)'), 29)  
                               WHEN 'SQLAgent - TSQL JobStep (Job '  
                                    THEN 'SQLAgent Job: ' + (SELECT name FROM msdb..sysjobs sj WHERE substring(Deadlock.Process.value('@clientapp', 'varchar(100)'),32,32)=(substring(sys.fn_varbintohexstr(sj.job_id),3,100))) + ' - ' + SUBSTRING(Deadlock.Process.value('@clientapp', 'varchar(100)'), 67, len(Deadlock.Process.value('@clientapp', 'varchar(100)'))-67)  
                               ELSE Deadlock.Process.value('@clientapp', 'varchar(100)')  
                               END   
           , HostName = Deadlock.Process.value('@hostname', 'varchar(20)')  
           , LoginName = Deadlock.Process.value('@loginname', 'varchar(20)')  
           , InputBuffer = Deadlock.Process.value('inputbuf[1]', 'varchar(1000)')  
      FROM #Events e  
           CROSS APPLY e.DeadlockGraph.nodes('/event/data/value/deadlock/process-list/process') as Deadlock(Process)  
           LEFT JOIN Victims v ON v.DeadlockID = e.DeadlockID AND v.VictimID = Deadlock.Process.value('@id', 'varchar(50)')  
 ) X --In a subquery to make filtering easier (use column names, not XML parsing), no other reason  
 ORDER BY TransactionTime DESC  
   

Now I have a global temp table called ##DeadlockReporting.

Step 2 is to run queries against it. Here's some that I wrote up. The most useful is the one with the header "Get the worst offenders in past 24 hours". This will give you a good picture of the deadlock—the affected tables, the stored procedures that are being called, the specific problem code. Then, click on the DeadlockXML in order to see the full details of the deadlock.

You may find some really obvious targets for fixes that need to be made. I found some update statements that contained subqueries in the Set clause that needed to be fixed.

 Select top 100 * From ##DeadlockReporting order by TransactionTime desc  
   
 -- Get the worst offenders in past 24 hours  
 -- Click on the DeadlockXML to show details  
 Select   
   DeadlockObjects  
   ,ProcedureName  
   ,Code = min(Code)  
      ,DeadlockXML = convert(xml, min(convert(varchar(max), DeadlockGraph) ))  
      ,FirstDeadlock = Min(TransactionTime)  
      ,LastDeadlock = Max(TransactionTime)  
   ,PresumedTotalDeadlocks = Count(*)    
 from ##DeadlockReporting    
 Where TransactionTime >= dateadd (hh, -24, getdate() )  
 Group by   
   DeadlockObjects  
   ,ProcedureName  
 Order by Count(*)  desc  
   
 -- Get summary past 24 hours, by hour  
 ;with DeadlockReportingFormatted as (  
   Select   
     TransactionHour_UTC = convert(varchar(13), TransactionTime, 120) + ':00'  
     ,TransactionHour_PST =   
       convert(varchar(13), TransactionTime AT TIME ZONE 'UTC' AT TIME ZONE 'Pacific Standard Time', 120)   
       + ':00'  
     ,*  
   from ##DeadlockReporting     
 )  
 Select   
   TransactionHour_UTC  
   ,TransactionHour_PST  
   ,DeadlockObjects  
   ,ProcedureName  
   ,PresumedTotalDeadlocks = Count(*)    
 From DeadlockReportingFormatted  
 Where TransactionTime >= dateadd(hh, -24, getdate())  
 Group by   
   TransactionHour_UTC  
   ,TransactionHour_PST  
   ,DeadlockObjects  
   ,ProcedureName  
 Order by   
   TransactionHour_UTC desc  
   ,PresumedTotalDeadlocks desc