Monday, July 31, 2023

Which objects are using those barely used indexes, that you want to get rid of?

Indexes in SQL Server are obviously critical, but easy to overuse - to the point that the indexes themselves cause problems. For instance, you may have tables with indexes that are almost, but not quite, identical. Figuring out which queries are using specific indexes can be extremely helpful in consolidating indexes.

The problem is, there's not an easy way to do this. The standard advice, when figuring out which indexes are used and which are not, is to query against sys.dm_db_index_usage_stats. That advice is great, as far as it goes. However, it only tells you if an index is used, and whether it's used for a scan, or a seek. It doesn't tell you which objects/queries are using the index.

But what if you have two indexes that are very quite similar, and each have some usage? You suspect they could be consolidated. You need to know which queries are using each index. 

Unfortunately sys.dm_db_index_usage_stats will leave you hanging here. It'll tell you whether an index is being used, but not which specific object/query is using the index.

In order to do that, you need to use Query Store data. Query Store is a SQL Server component that tracks all kinds of very useful performance information, so it's important to learn about. By default it's turned off, so you'll need to turn it on. 

Here's a query you can use to figure out specifically which objects are using an index. It needs to be run on the production server, and can be resource intensive, so consider running it at a time when there's not much activity. Substitute the index name IX_IndexName for your specific index name. 

 ;With FilteredPlans as (  
   SELECT plan_id   
   FROM sys.query_store_plan  
   WHERE query_plan like '%IX_IndexName%'  
 )  
 SELECT distinct  
   ObjectName = Object_Name(q.object_id)  
   ,q.query_id   
   ,p.plan_id   
   ,qt.query_sql_text  
 FROM sys.query_store_runtime_stats rs  
   JOIN sys.query_store_plan p ON p.plan_id = rs.plan_id  
   JOIN sys.query_store_query q ON q.query_id = p.query_id  
   JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id  
 WHERE   
      p.plan_id in (Select plan_id from FilteredPlans)  

Note that some of the usage of the index that comes up will be when the index is updated, because rows have been modified. This is when the underlying table has been updated, deleted, etc. You can ignore those, because that's just maintenance to keep the index up to date, and not useful for an index that's not being used. 





Thursday, June 8, 2023

Clustered indexes—how to make the right decisions

Oftentimes, with database systems, nobody is specifically in charge of the database. It's just seen as one of the many tools that's being used by the front end code, and as long as it functions, it's fine. 

But sometimes when the system has been running and growing for a few years, problems become evident. Deadlocks. Slow query plans. Timeouts. By default, the primary key is made into the clustered index, unless it's specifically created as a nonclustered index. This can make sense, because you really want a clustered index on a table, and the primary key is sometimes a good choice. But definitely not always! 

I've run into some situations like this. The clustered index is created on a field that is probably rarely used in searches. But how to be sure? In order to more easily research these situations, and figure out which field might be a better potential candidate for clustereding, I wrote the following helpful query. 

It queries the system tables to see which objects are referencing the target table, and then returns the objects (usually stored procedures, but sometimes functions) in order of decreasing CPU usage. This way, if you have hundreds of objects that use the table and don't have time to check each one, you can make sure that you at least check the most resource intensive objects. 

You need to have Query Store turned on (everybody should, in any case, if you need performance data), because it uses the table sys.query_store_runtime_stats in order to show the results in order of CPU usage. 

It returns a handy column that you can click on to easily see the object definition. Thus, you can see the SQL behind the object, and double check whether the clustered index change you're thinking of is useful. If the current field that has the clustered index is rarely used in filtering, and the new field is used heavily, then that's a sign you should switch.

 ;with Main as (  
      -- Get data on everything that's using this table and/or field  
   SELECT   
     ObjectName = Object_Name(Query.object_id)  
     ,TotalExecutions = sum((RuntimeStats.count_executions))  
     ,TotalCPU = sum((RuntimeStats.count_executions * RuntimeStats.avg_cpu_time))  
     ,TotalLogicalReads = sum((RuntimeStats.count_executions * RuntimeStats.avg_logical_io_reads))  
     ,TotalDuration = sum((RuntimeStats.count_executions * RuntimeStats.avg_duration))  
     ,QueryStorePlan.plan_id  
     ,Query.query_id  
   FROM sys.query_store_runtime_stats RuntimeStats  
   JOIN sys.query_store_plan QueryStorePlan ON QueryStorePlan.plan_id = RuntimeStats.plan_id  
   JOIN sys.query_store_query Query ON Query.query_id = QueryStorePlan.query_id  
   JOIN sys.query_store_query_text QueryText ON Query.query_text_id = QueryText.query_text_id  
   WHERE   
     1=1  
     and first_execution_time >= dateadd(d, -2, getdate())  
     and QueryText.query_sql_text like '%' + @TableName + '%'  
           and QueryText.query_sql_text like '%' + @FieldName + '%'  
   GROUP BY Object_Name(Query.object_id), Query.query_id, QueryStorePlan.Plan_id  
 )  
 -- Now return the data, along with resource usage stats and some handy columns you can click on to view the SQL,  
 -- and the query plan  
 Select   
      ObjectName   
      ,TotalExecutions   
      ,TotalCPU   
      ,TotalLogicalReads  
      ,TotalDuration   
   ,SQLTextXML = CAST((SELECT query_sql_text AS [processing-instruction(x)] FOR XML PATH('')) AS XML)    
      ,QueryPlanXML = CAST(query_plan as XML)   
   ,QueryID = Query.query_id  
   ,PlanID = QueryStorePlan.plan_id  
 from Main  
   JOIN sys.query_store_plan QueryStorePlan ON QueryStorePlan.plan_id = Main.plan_id  
   JOIN sys.query_store_query Query ON Query.query_id = QueryStorePlan.query_id  
   JOIN sys.query_store_query_text QueryText ON Query.query_text_id = QueryText.query_text_id  
 Order by TotalCPU desc  

Sunday, February 5, 2023

Help! The field I need to query isn't indexed, and I can't change that!

Some things you can change, and some you can't, right? And for whatever reason, you may need to query a huge table, and not be able to index the field that you need to query. I ran into this situation recently.

Well, if it's a log table, that at least has some kind of index (preferably clustered), and you only need to query recent records, you're in luck. Here's a workaround that can be useful:

 Select *   
 From DBA..CommandLog   
 Where   
   ID > (Select Max(ID) from DBA..CommandLog)- 5000  
   And Command like '%ApplicationErrorLogs%'  
   And DatabaseName like 'rds_qt_DB'  

How does this work? This is a log table, and there's a clustered index on the ID field of this table. Many, many tables are structured like this.

You use the subquery to get the Max(ID) of the table. Then, subtract 5000 to only query the last 5000 records. That way, the index will be used to do the initial filter, then the other filters will be applied to that limited set. It should be much faster.