Databases Are Fun

Friday, December 20, 2024

Query for getting a quick overview of both index usage and structure, at the same time

In tuning up some indexes, I was always switching back and forth between different screens and resultsets. There's two main sets of information that you need:

- For a particular table, how much are all the indexes being used?

- What is the structure of all the indexes on a table?

Having these two pieces of information at the same time can be really useful.

Here's a quick script that I put together to do that:

 -- Get index usage  
 if object_id('tempdb..#IndexUsage') is not null drop table tempdb..#IndexUsage   
 if object_id('tempdb..#IndexStructure') is not null drop table tempdb..#IndexStructure  
 Declare @TableName varchar(100) = 'Users'  
 SELECT i.name AS IndexName  
      ,ius.user_seeks AS Seeks  
      ,ius.user_scans AS Scans  
      ,ius.user_lookups AS Lookups  
 INTO #IndexUsage  
 FROM sys.indexes i  
 INNER JOIN sys.dm_db_index_usage_stats ius ON ius.index_id = i.index_id  
      AND ius.OBJECT_ID = i.OBJECT_ID  
 INNER JOIN (  
      SELECT sch.name  
           ,sch.schema_id  
           ,o.OBJECT_ID  
           ,o.create_date  
      FROM sys.schemas sch  
      INNER JOIN sys.objects o ON o.schema_id = sch.schema_id  
      ) s ON s.OBJECT_ID = i.OBJECT_ID  
 LEFT JOIN (  
      SELECT OBJECT_ID  
           ,index_id  
           ,SUM(used_page_count) AS usedpages  
           ,SUM(CASE   
                     WHEN (index_id < 2)  
                          THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)  
                     ELSE lob_used_page_count + row_overflow_used_page_count  
                     END) AS pages  
      FROM sys.dm_db_partition_stats  
      GROUP BY object_id  
           ,index_id  
      ) AS ps ON i.object_id = ps.object_id  
      AND i.index_id = ps.index_id  
 WHERE ius.database_id = DB_ID()  
      AND OBJECT_NAME(i.OBJECT_ID) = @TableName  
 ORDER BY i.name  
 -- Get index structure with included fields  
 SELECT   
   OBJECT_NAME(i.[object_id]) TableName ,  
   i.[name] IndexName ,  
   c.[name] ColumnName ,  
   ic.is_included_column ,  
   i.index_id ,  
   i.type_desc ,  
   i.is_unique ,  
   i.data_space_id ,  
   i.ignore_dup_key ,  
   i.is_primary_key ,  
   i.is_unique_constraint  
 INTO #IndexStructure  
 FROM   
   sys.indexes i  
   JOIN sys.index_columns ic ON ic.object_id = i.object_id and i.index_id = ic.index_id  
   JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id  
 Where   
   OBJECT_NAME(i.[object_id]) = @TableName  
 ORDER BY   
    tableName ,  
    ic.index_id ,  
    ic.index_column_id  
 -- Combine into one output   
 ;With   
   IndexStructure as   
     (Select Distinct  
       IndexName  
       , type_desc = case when type_desc = 'nonclustered' then 'no' else 'yes' end  
       , is_unique = case When is_unique = 0 then 'no' else 'yes' end  
       , is_primary_key = case When is_primary_key = 0 then 'no' else 'yes' end  
     From #IndexStructure   
     )  
   ,IndexedFields as   
     (select IndexName, IndexedFields = STRING_AGG (CONVERT(NVARCHAR(max),ColumnName), ', ')   
     from #IndexStructure Where is_included_column = 0 Group by IndexName)  
   ,IncludedFields as   
     (select IndexName, IncludedFields = STRING_AGG (CONVERT(NVARCHAR(max),ColumnName), ', ')   
     from #IndexStructure Where is_included_column = 1 Group by IndexName)  
 Select   
      IndexStructure.IndexName  
   ,IndexUsage.Seeks  
   ,IndexUsage.Scans  
   ,IndexUsage.Lookups  
      ,IsClustered = type_desc  
      ,IsUnique = is_unique  
      ,IsPrimaryKey = is_primary_key  
      ,IndexedFields.IndexedFields  
      ,IncludedFields = IsNull(IncludedFields.IncludedFields, '-')  
 From IndexStructure   
   JOIN #IndexUsage IndexUsage on IndexUsage.IndexName = IndexStructure .IndexName  
      Join IndexedFields on IndexStructure.IndexName = IndexedFields.IndexName  
      Left Join IncludedFields on IndexStructure.IndexName = IncludedFields.IndexName  
 -- Where Seeks < 500 and Scans < 500   
 Order by   
   Seeks asc

Monday, July 31, 2023

Which objects are using those barely used indexes, that you want to get rid of?

Indexes in SQL Server are obviously critical, but easy to overuse - to the point that the indexes themselves cause problems. For instance, you may have tables with indexes that are almost, but not quite, identical. Figuring out which queries are using specific indexes can be extremely helpful in consolidating indexes.

The problem is, there's not an easy way to do this. The standard advice, when figuring out which indexes are used and which are not, is to query against sys.dm_db_index_usage_stats. That advice is great, as far as it goes. However, it only tells you if an index is used, and whether it's used for a scan, or a seek. It doesn't tell you which objects/queries are using the index.

But what if you have two indexes that are very quite similar, and each have some usage? You suspect they could be consolidated. You need to know which queries are using each index.

Unfortunately sys.dm_db_index_usage_stats will leave you hanging here. It'll tell you whether an index is being used, but not which specific object/query is using the index.

In order to do that, you need to use Query Store data. Query Store is a SQL Server component that tracks all kinds of very useful performance information, so it's important to learn about. By default it's turned off, so you'll need to turn it on.

Here's a query you can use to figure out specifically which objects are using an index. It needs to be run on the production server, and can be resource intensive, so consider running it at a time when there's not much activity. Substitute the index name IX_IndexName for your specific index name.

 ;With FilteredPlans as (  
   SELECT plan_id   
   FROM sys.query_store_plan  
   WHERE query_plan like '%IX_IndexName%'  
 )  
 SELECT distinct  
   ObjectName = Object_Name(q.object_id)  
   ,q.query_id   
   ,p.plan_id   
   ,qt.query_sql_text  
 FROM sys.query_store_runtime_stats rs  
   JOIN sys.query_store_plan p ON p.plan_id = rs.plan_id  
   JOIN sys.query_store_query q ON q.query_id = p.query_id  
   JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id  
 WHERE   
      p.plan_id in (Select plan_id from FilteredPlans)

Note that some of the usage of the index that comes up will be when the index is updated, because rows have been modified. This is when the underlying table has been updated, deleted, etc. You can ignore those, because that's just maintenance to keep the index up to date, and not useful for an index that's not being used.

Thursday, June 8, 2023

Clustered indexes—how to make the right decisions

Oftentimes, with database systems, nobody is specifically in charge of the database. It's just seen as one of the many tools that's being used by the front end code, and as long as it functions, it's fine.

But sometimes when the system has been running and growing for a few years, problems become evident. Deadlocks. Slow query plans. Timeouts. By default, the primary key is made into the clustered index, unless it's specifically created as a nonclustered index. This can make sense, because you really want a clustered index on a table, and the primary key is sometimes a good choice. But definitely not always!

I've run into some situations like this. The clustered index is created on a field that is probably rarely used in searches. But how to be sure? In order to more easily research these situations, and figure out which field might be a better potential candidate for clustereding, I wrote the following helpful query.

It queries the system tables to see which objects are referencing the target table, and then returns the objects (usually stored procedures, but sometimes functions) in order of decreasing CPU usage. This way, if you have hundreds of objects that use the table and don't have time to check each one, you can make sure that you at least check the most resource intensive objects.

You need to have Query Store turned on (everybody should, in any case, if you need performance data), because it uses the table sys.query_store_runtime_stats in order to show the results in order of CPU usage.

It returns a handy column that you can click on to easily see the object definition. Thus, you can see the SQL behind the object, and double check whether the clustered index change you're thinking of is useful. If the current field that has the clustered index is rarely used in filtering, and the new field is used heavily, then that's a sign you should switch.

 ;with Main as (  
      -- Get data on everything that's using this table and/or field  
   SELECT   
     ObjectName = Object_Name(Query.object_id)  
     ,TotalExecutions = sum((RuntimeStats.count_executions))  
     ,TotalCPU = sum((RuntimeStats.count_executions * RuntimeStats.avg_cpu_time))  
     ,TotalLogicalReads = sum((RuntimeStats.count_executions * RuntimeStats.avg_logical_io_reads))  
     ,TotalDuration = sum((RuntimeStats.count_executions * RuntimeStats.avg_duration))  
     ,QueryStorePlan.plan_id  
     ,Query.query_id  
   FROM sys.query_store_runtime_stats RuntimeStats  
   JOIN sys.query_store_plan QueryStorePlan ON QueryStorePlan.plan_id = RuntimeStats.plan_id  
   JOIN sys.query_store_query Query ON Query.query_id = QueryStorePlan.query_id  
   JOIN sys.query_store_query_text QueryText ON Query.query_text_id = QueryText.query_text_id  
   WHERE   
     1=1  
     and first_execution_time >= dateadd(d, -2, getdate())  
     and QueryText.query_sql_text like '%' + @TableName + '%'  
           and QueryText.query_sql_text like '%' + @FieldName + '%'  
   GROUP BY Object_Name(Query.object_id), Query.query_id, QueryStorePlan.Plan_id  
 )  
 -- Now return the data, along with resource usage stats and some handy columns you can click on to view the SQL,  
 -- and the query plan  
 Select   
      ObjectName   
      ,TotalExecutions   
      ,TotalCPU   
      ,TotalLogicalReads  
      ,TotalDuration   
   ,SQLTextXML = CAST((SELECT query_sql_text AS [processing-instruction(x)] FOR XML PATH('')) AS XML)    
      ,QueryPlanXML = CAST(query_plan as XML)   
   ,QueryID = Query.query_id  
   ,PlanID = QueryStorePlan.plan_id  
 from Main  
   JOIN sys.query_store_plan QueryStorePlan ON QueryStorePlan.plan_id = Main.plan_id  
   JOIN sys.query_store_query Query ON Query.query_id = QueryStorePlan.query_id  
   JOIN sys.query_store_query_text QueryText ON Query.query_text_id = QueryText.query_text_id  
 Order by TotalCPU desc

Sunday, February 5, 2023

Help! The field I need to query isn't indexed, and I can't change that!

Some things you can change, and some you can't, right? And for whatever reason, you may need to query a huge table, and not be able to index the field that you need to query. I ran into this situation recently.

Well, if it's a log table, that at least has some kind of index (preferably clustered), and you only need to query recent records, you're in luck. Here's a workaround that can be useful:

 Select *   
 From DBA..CommandLog   
 Where   
   ID > (Select Max(ID) from DBA..CommandLog)- 5000  
   And Command like '%ApplicationErrorLogs%'  
   And DatabaseName like 'rds_qt_DB'

How does this work? This is a log table, and there's a clustered index on the ID field of this table. Many, many tables are structured like this.

You use the subquery to get the Max(ID) of the table. Then, subtract 5000 to only query the last 5000 records. That way, the index will be used to do the initial filter, then the other filters will be applied to that limited set. It should be much faster.

Monday, November 14, 2022

Improved and updated, round 2: performance tuning a stored procedure with user defined function calls in SQL Server

Recently I was reviewing performance on a troublesome stored procedure. I used the method detailed in my recent blog post: UPDATED! Performance tuning a stored procedure with user defined function calls in SQL Server.

But in looking at the output, one of the functions just never showed up, as though it were not taking any resources. Why was that? After some research, I found that only scalar functions show up in the system table sys.dm_exec_function_stats, but the function that didn't show up was a table valued function.

So, back to the drawing board. It turns out that another way to get all the performance-related stats for stored procedures and all types of functions, is by using the system table sys.dm_exec_query_stats.

The steps are very similar, in that you're clearing the cache with FREEPROCCACHE, running the stored procedure, and then checking the query statistics. When I checked performance this way, the table valued function showed up properly.

This is what I ran:

 DBCC FREEPROCCACHE   
 Exec API_REPORT_Transcript   
 Select top 20  
   ObjectName = Object_Name(QueryText.objectid, QueryText.dbid)   
   ,Executions = QueryStats.execution_count   
   ,CPUTime = QueryStats.total_worker_time   
   ,LogicalReads = QueryStats.total_logical_reads   
   ,LogicalWrites = QueryStats.total_logical_writes   
   ,Duration = QueryStats.total_elapsed_time   
   ,QueryText =   
     convert(  
       varchar(200), SUBSTRING(QueryText.text,QueryStats.statement_start_offset/2 +1,   
       (CASE WHEN QueryStats.statement_end_offset = -1   
       THEN LEN(CONVERT(NVARCHAR(MAX), QueryText.text)) * 2   
       ELSE QueryStats.statement_end_offset END -  
       QueryStats.statement_start_offset  
       )/2  
       )   
     )   
 From sys.dm_exec_query_stats AS QueryStats   
      CROSS APPLY sys.dm_exec_sql_text(QueryStats.sql_handle) AS QueryText   
 Where   
   DB_NAME(QueryText.dbid) = 'MyDatabaseName'  
 Order by CPUTime desc

Tuesday, July 26, 2022

The magic of Query Store for tracking application timeouts

If I were writing a click-baity title for an article on this, it would be "One Crazy Trick To Find Application Timeouts That Will Blow Your Mind!!!" It really is that useful.

The bottom line is, Query Store is useful for far more than you might think, and specifically researching application timeouts.

A system I've been working on lately has had many application timeouts, cause by the default setting (timeout after 30 seconds). The tool we were using for error tracking had some flaws, and it was very difficult to actually come up with the stored procedure name, that had the timeout.

I had set up the Extended Event session for timeouts (on sqlserver.attention). However, potentially due to certain issues with the front end setup, the sqlserver.sql_text value was almost always blank when there was a timeout. I didn't investigate in depth, and moved on to other tasks.

But once I started spending more time on Query Store and using it more, I discovered the execution_type_desc field in query_store_runtime_stats. What does it contain?

Well, 99.9% of the time it has this string "regular". But occasionally, it contains this one "aborted". And these are your timeouts! It's so easy to find timeouts with this method that I'm surprised it's not more well known.

By using the execution_type_desc field in query_store_runtime_stats, I was able to find clusters of linked timeouts, and then dig deeper (also using query_store_runtime_stats, but that's another story) to find the root cause.

Here's the query I frequently use, commenting out lines when necessary, to get the details I need. This will show 1 row for each QueryID and PlanID. If it's a stored procedure with many separate statements, you may want to filter on one particular QueryID or SQL text.

 SELECT   
   ObjectName = Object_Name(q.object_id)  
   ,RunTimeFirst = convert(varchar(16), rs.first_execution_time,121)  
   ,RunTimeLast = convert(varchar(16), rs.first_execution_time,121)  
   ,TotalExecutions = rs.count_executions  
   ,AverageCPU = round(rs.avg_cpu_time , 0)  
   ,AverageLogicalReads = round(rs.avg_logical_io_reads ,0)  
   ,AverageDuration = round(rs.avg_duration ,0)  
   ,q.query_id   
   ,p.plan_id   
   ,p.Is_Forced_Plan  
   ,rs.runtime_stats_id  
   ,execution_type_desc  
   ,qt.query_sql_text  
   -- Comment out unless necessary...takes a long time.   
   -- ,ExecutionPlan = CAST(p.query_plan as XML)  
 FROM sys.query_store_runtime_stats rs  
 JOIN sys.query_store_plan p   
   ON p.plan_id = rs.plan_id  
 JOIN sys.query_store_query q   
   ON q.query_id = p.query_id  
 JOIN sys.query_store_query_text qt   
   ON q.query_text_id = qt.query_text_id  
 WHERE   
   1=1 -- make it easier to comment/uncomment
   -- just show last 2 days by default  
   and first_execution_time >= dateadd(d, -2, getdate())  
   and Object_Name(q.object_id) = 'API_GetUserLearningPlanObjects'  
   and execution_type_desc = 'Aborted'  
   -- and q.query_id in (4013401)  
   -- and qt.query_sql_text like '%c.CourseID as ID%'  
 Order by RunTimeFirst desc

Tuesday, May 17, 2022

Why is my transaction log growing?

I advise on the management of multiple large databases. One of them had a steadily growing transaction log. Figuring out exactly why the log was growing was not a simple process. Here's the steps that I took to narrow down specifically which processes were causing the growth.

First, the most precise way to figure out what, specifically, is causing log growth, is through extended events. The event database_file_size_change will log all available data about the change (size, time, specifically which SQL cause the size change, etc).

If you've never used extended events before, do a little reading to learn about them first. They're insanely handy and highly recommended, but not necessarily straightforward. The below SQL will create an extended event to track the details.

 CREATE EVENT SESSION [DB_Size_Tracking] ON SERVER   
 ADD EVENT sqlserver.database_file_size_change(  
 ACTION(sqlserver.client_app_name,  
 sqlserver.client_hostname,  
 sqlserver.database_name,  
 sqlserver.nt_username,  
 sqlserver.plan_handle,  
 sqlserver.query_hash,  
 sqlserver.query_plan_hash,  
 sqlserver.server_principal_name,  
 sqlserver.session_id,  
 sqlserver.session_nt_username,  
 sqlserver.sql_text,  
 sqlserver.username))  
 ADD TARGET package0.event_file(SET filename=N'D:\ExtendedEvents\DB_Size_Tracking.xel')  
 WITH (MAX_MEMORY=4096 KB,  
 EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,  
 MAX_DISPATCH_LATENCY=30 SECONDS,  
 MAX_EVENT_SIZE=0 KB,  
 MEMORY_PARTITION_MODE=NONE,  
 TRACK_CAUSALITY=OFF,  
 STARTUP_STATE=ON)  
 GO

Once this extended event has been started, you can begin to see what, specifically is causing the size changes. Here's what I use to return data from the extended event session.

 If object_id('tempdb..#SizeChanges') is not null drop table tempdb..#SizeChanges  
 Select  
   -- Object_Name = n.value ('(data[@name="object_name"]/value)[1]', 'nvarchar(256)')   
   -- ,Duration = n.value ('(data[@name="duration"]/value)[1]', 'int')   
   SQLText = n.value('(action[@name="sql_text"]/value)[1]', 'nvarchar(max)')      
   ,Size_Change_KB = n.value ('(data[@name="size_change_kb"]/value)[1]', 'int')   
   ,Timestamp = convert(smalldatetime, n.value('(@timestamp)[1]', 'datetime2'))  
   ,database_name = n.value ('(data[@name="database_name"]/value)[1]', 'nvarchar(256)')   
   ,database_name_Action = n.value('(action[@name="database_name"]/value)[1]', 'nvarchar(256)')      
   ,file_name = n.value ('(data[@name="file_name"]/value)[1]', 'nvarchar(256)')   
 Into #SizeChanges  
 From (  
   Select cast(event_data as XML) as event_data  
   -- need to edit this for correct file, then UNCOMMENT  
   From sys.fn_xe_file_target_read_file('D:\ExtendedEvents\DB_Size_Tracking*.xel', null, null, null)  
   ) ed  
 Cross apply ed.event_data.nodes('event') as q(n)  
 Select * From #SizeChanges

Once you have the temp table #SizeChanges created, you can do all kinds of queries and analysis.

In my situation, I found that a regular index maintenance job was causing the problem. Specifically, a large table with a clustered index built on a Unique Identifier field (never a good idea) was causing transaction growth, when the index was maintained. Once this was fixed, the transaction log growth was no longer a problem.