Sunday, December 12, 2021

UPDATED! Performance tuning a stored procedure with user defined function calls in SQL Server

I had a previous post on the topic of performance tuning a stored procedure with multiple user defined function calls. I thought the method I reviewed was a pretty nifty way of getting very useful performance information. 

However, it turns out there's a much more straightforward method of getting a comprehensive  (including functions) performance picture when executing a stored procedure. 

This method (only to be used on a development server, not production) uses DBCC FREEPROCCACHE to completely clear out sys.dm_exec_procedure_stats and sys.dm_exec_function_stats. It also clears out other DMVs, so be aware of that.

Here's a sample of how it can be used. Basically, for each version of the code, you free the procedure cache, run the code, and then query sys.dm_exec_procedure_stats and sys.dm_exec_function_stats. Then, compare the two outputs. You'll get a comprehensive picture of the resource usage for each version, including functions.

  -- Run this for both versions of the stored procedure  
 DBCC FREEPROCCACHE  
 exec API_GetTiers_Old  @LocaleID= 1033, @CountryCode = 'US',@UserID = 60190667  
 -- exec API_GetTiers_New  @LocaleID= 1033, @CountryCode = 'US',@UserID = 60190667

   
 -- Get stats from functions  
 Select   
      Object_Name = Object_Name(Object_ID)  
      ,execution_count  
      ,Total_Worker_Time  
      ,Total_Logical_Reads   
	  ,Total_Elapsed_Time      
 From sys.dm_exec_function_stats  
 Union all  
 -- Get stats from procedures  
 Select   
      Object_Name = Object_Name(Object_ID)  
      ,execution_count  
      ,Total_Worker_Time  
      ,Total_Logical_Reads   
	  ,Total_Elapsed_Time      
 From sys.dm_exec_procedure_stats  
 -- Sum everything for convenience  
 Union all  
 Select 'TOTAL', null  
 , Total_Worker_Time =   
      IsNull((Select sum(Total_Worker_Time) From sys.dm_exec_function_stats ) , 0)
      + IsNull((Select sum(Total_Worker_Time) From sys.dm_exec_procedure_stats ) , 0) 
 , Total_Logical_Reads   
      = IsNull((Select sum(total_logical_reads) From sys.dm_exec_function_stats ) , 0)  
      + IsNull((Select sum(total_logical_reads) From sys.dm_exec_procedure_stats ) , 0) 
 , Total_Elapsed_Time   
      = IsNull((Select sum(Total_Elapsed_Time) From sys.dm_exec_function_stats ) , 0)  
      + IsNull((Select sum(Total_Elapsed_Time) From sys.dm_exec_procedure_stats ) , 0)
   


Friday, September 17, 2021

Tracking the total number of stored procedure executions, over time

UPDATE: Turns out there are a number of ways that this can be inaccurate. Use this as a potential pointer of what to investigate via other means, not as a final result.

******************************************* *******************************************

I had a situation recently where the performance of a particular stored procedure (previously never a resource hog), changed for the worse. It suddenly jumped to the top of the list of procedures returned by the Top Resource Consuming Queries in Query Store, in Duration, CPU, and Logical Reads. And it stayed there.

What's was going on? I started diving into the indexes, execution plan and logical reads. No big insights. Finally I started looking at the average CPU of the stored procedure, over time. This is available if only if you write SQL against the Query Store. Here's what I put together:

 SELECT   
   RunDate = convert(varchar(16), first_execution_time,121)  
   ,AverageCPU = round(rs.avg_cpu_time , 0)  
   ,AverageLogicalReads = round(rs.avg_logical_io_reads ,0)  
 FROM sys.query_store_runtime_stats rs  
 JOIN sys.query_store_plan p ON p.plan_id = rs.plan_id  
 JOIN sys.query_store_query q ON q.query_id = p.query_id  
 WHERE   
   Object_Name(q.object_id) = 'API_GetTiers'  
 ORDER BY RunDate desc  

When you'll run this, you'll notice that you have multiple rows for each day. This has to do with the Statistics Collection Interal set up in Query Store. Since you can't average an average, I just eyeballed the results for AverageCPU and AverageLogicalReads.  

And looking at the results for the past couple weeks, I saw something strange, which was—the AverageCPU and AverageLogicalReads didn't actually go up over time! I poked around a little more, and then added the count of executions. 

Bingo. It was the number of executions that had skyrocked, by about 8 times. It turns out that the api calling this stored procedure had changed, resulting in many more executions. It's a good thing I didn't waste too much time trying to optimize this stored procedure, because the first step to figure out was—why was this stored procedure suddenly called so many more times, and was it possible to avoid that?

Here's the spreadsheet, with chart, that I put together to explain the situation to the API developers. It was very helpful in showing where the problem was. 


Monday, August 30, 2021

Testing stored procedures - getting great coverage with zero external tools

I've been working on performance tuning stored procedures recently. These are stored procedure that just return data, with no data modification, mostly for reporting. Often the performance tuning isn't the hard part. The hard part is making sure that the changes that I make don't introduce bugs into the output.

Here's a process I use to make the testing processes easier for this type of procedure.

Step 1: Creating a select statement that outputs an executable script, with a random assortment of IDs

Most of the stored procedures take, as a parameter, a UserID (or some other ID field). In order to get a set of stored procedure calls with a random UserID, you can use a script like the following:

 Select Top 10  
   'exec api_GetTiers '   
   + '''' +   
   + convert(varchar(100), UserID) + ''''  
 From Users  
 Order by NEWID()  

When you run the above script, it will give you an output that is a set of calls to your target stored procedure. The NewID() function is used in order to get a random set of UserIDs for the parameter.

Step 2: Execute the stored procedure, with random UserIDs

The output of this script will look something like this:

 exec api_GetTiers '40273230'  
 exec api_GetTiers '60372087'  
 exec api_GetTiers '30128477'  
 exec api_GetTiers '60008969'  
 exec api_GetTiers '60121799'  
 exec api_GetTiers '00303810'  
 exec api_GetTiers '60466614'  
 exec api_GetTiers '60147429'  
 exec api_GetTiers '70278452'  
 exec api_GetTiers '50542343'  

Copy and paste the output into a new query window in SQL Server Management Studio, and change the output to Text mode instead of Grid Mode (right-click in the query window, click on Results To, then choose Text). 

Next, execute the set of stored procedure calls. 

Copy and paste the output (since the Results are now in text mode, it's easy to do) into a text editing tool. My tool of choice is Notepad++. 


The final step is to update the stored procedure that you're working on, and then follow the above procedure again. You will now have two chunks of text, one with the output of the old stored procedure and one with the output of the new stored procedure, both called with the same UserIDs. Compare the output from the old and the new versions of the stored procedure with your preferred text comparison tool (I use the Compare plugin, in Notepad++).

Any differences that show up are potential bugs.



Monday, August 23, 2021

Performance tuning a stored procedure with user defined function calls in SQL Server

UPDATE: It turns out that there are easier ways of doing this. Take a look at this post instead of following the below.

******************************************* *******************************************

Have you ever tried to do performance tuning on a stored procedure that has user defined function calls?  It can be very challenging. This is mainly because one of the of the most basic tools of performance tuning—checking the number of logical reads using Set Statistics IO—is just not accurate in this scenario. The logical reads of any user defined functions called during a stored procedure are not reported using Set Statistics IO, so you get output that doesn't reflect their true cost.  

Before I get into the details of how to work around this, let's just get this out of the way—yes, user defined functions have a very mixed reputation because of performance issues. Functions in the Select or Order By clause are likely to be okay. But functions have no place in the Where clause, and can cause huge performance issues.

Yet you may be working on a system that has hundreds of user defined functions scattered everywhere, including the Where clause, within hundreds of stored procedures. That's what I was recently faced with.

It's not possible to fix all these issues quickly, so you need to target the the worst performers, and figure out how to improve them. I use Query Store (usually using the Top Resource Consuming Queries report) to find the worst performers in general. But once you have a stored procedure targeted for performance improvement, how do you get the full performance picture if it contains user defined functions?

Here's a process you can use in order to get data that accurately reflects performance (including Logical Reads), even when there are user defined functions inside your stored procedure. The key is to use  the built-in reporting in Query Store, and not Statistics IO.

Step 1: Purge Query Store data

Make sure Query Store is turned on for your database. Information on how to do this is easy to find online. After ensuring it's turned on, click on the Purge Query Data button. (This is, of course, assuming you're not working on a production server). This will give you a blank slate in the Query Store.



Step 2: Run the stored procedure

Execute the stored procedure in a query window. This will populate the previously empty Query Store with just the performance metrics for this specific call.

Step 3: Check the Query Store: Top Resource Consuming Queries

Next, go to the Top Resource Consuming Queries report in Query Store, and open it. By default it gives you the graph output, which doesn't give you numbers. Instead, switch to the simple grid output, and copy and paste the results to a spreadsheet. I usually do this for both Logical Reads and CPU.

Now what? When I first ran through this with my stored procedure, I found that the user defined function calls in the stored procedure were actually not as big of a problem as I had suspected. Instead of focusing on user defined functions, I targetted other areas of the stored procedure and got a very substantial performance improvement. 

To validate the improvement, I ran through the same steps as above, but using the updated stored procedure. Comparing Logical Reads and CPU numbers between the old and new versions of the stored procedure gave me the results I was looking for.