The QDS Toolbox

The QDS Toolbox is set of tools that can help you review and store the performance related data in Query Store. This was released by ChannelAdvisor last September thanks to the hard work of a number of my coworkers.

If you aren’t experienced with Query Store, this can provide a good starting point for getting familiar with data that is available and what you can do with it. If you are experienced with Query Store, this may give you an easy way to set up customizable reports that help you find issues and see trends.

The QDS Toolbox has several components, and I intend to post about each in turn. Two new components were added to this recently by @sqlozano (https://www.sqlozano.com/), bringing the current total to eight.

  1. Server Top Queries
  2. Query Waits
  3. QDS Cache Cleanup
  4. Pivoted Waits Stats
  5. Query Variation
  6. Waits Variation
  7. Statistics Used
  8. Plan Miner

Getting Started

First things first, let’s get it downloaded and installed. I’ve linked the github page above. You can download a ZIP of the current package, or you can clone it with GitHub Desktop or a similar tool.

Once you have the package local, there’s an Installer folder. You could install them a la carte, or the QDSToolBox_Installer script will install all of the components. It will prompt you for the instance of SQL Server you want to use as well as the database you want to install it in. I’m putting mine in it’s own database, as I frequently restore other databases (WideWorldImporters, AdventureWorks2014) when I’m testing things.

Moving to Query Store

In my own experience with performance tuning with SQL Server, I started off using Profiler and PSSDiag constantly when I worked for Microsoft. After a number of years, I moved a lot of my focus to using queries against DMVs (Dynamic Management Views). DMVs allowed me to get most of the same data I reviewed from PSSDiag, but I’m able to get that data with a query that takes a few seconds (typically) instead of having to gather a trace for 30 minutes to feel like I have enough data.

By focusing on a DMV like sys.dm_exec_query_stats, you can easily find which queries on your server have the highest CPU usage, duration, or logical reads. This can make if very easy to identify a problem query, and you can find the query even if it isn’t running currently.

The caveat is that the DMVs only track what is in the cache, and once a query’s plan is no longer cached, it’s gone. No historical data is kept in the DMVs, and that’s why I use Query Store almost exclusively these days.

Server Top Queries

This is a great place to start exploring what is available in the QDS Toolbox, because looking for the top resource consuming queries is a common task.

Once you have the tool installed, you can run the dbo.ServerTopQueries to generate reports based on the metric you choose. That report will remain in the database where the QDS Toolbox was installed, and you can review them whenever. Here’s an example execution of the procedure:

USE QDSToolBox
GO
DECLARE 
	@StartTime DATETIME2,
	@EndTime DATETIME2;

SELECT
	@StartTime = DATEADD(DAY,-1,GETUTCDATE()),
	@EndTime = GETUTCDATE();

EXEC dbo.ServerTopQueries
	@ServerIdentifier	= @@SERVERNAME,
	@DatabaseName	= 'WideWorldImporters',
	@ReportIndex	= 'dbo.ServerTopQueriesIndex',	--provide both Report options to store results
	@ReportTable	= 'dbo.ServerTopQueriesStore',	--provide both Report options to store results
	@StartTime		= @StartTime,
	@EndTime		= @EndTime,
	@Top			= 25,
	@Measurement	= 'cpu_time',	--duration, cpu_time, logical_io_reads, logical_io_writes, 
									--physical_io_reads, clr_time, query_used_memory, log_bytes_used, tempdb_space_used
	@IncludeQueryText	= 1, --default: 0
	@ExcludeAdhoc		= 0,
	@ExcludeInternal	= 0,
	@VerboseMode		= 1,
	@TestMode			= 0;

GO

DECLARE
	@LatestReport INT;

SELECT TOP 1
	@LatestReport = tqi.ReportID
FROM dbo.vServerTopQueriesIndex tqi
ORDER BY 
	tqi.CaptureDate DESC;

SELECT *
FROM dbo.vServerTopQueriesStore tqs
WHERE
	tqs.ReportID = @LatestReport
ORDER BY 
	tqs.CPU DESC;
GO

This script will run in the QDSToolbox database and store the report there, but we will be gathering the Query Store data on activity in the WideWorldImporters databse. I’ve just run some scripts there to generate activity.

The @StartTime and @EndTime parameters will restrict us to looking at activity in the last day.

The @ReportIndex and @ReportTable parameter define where we will store the data. I’ve left these at the default names for the tables, as defined by the tool.

The report is going to be focusing on high cpu queries, defined by the @Measurement variable being set to ‘cpu_time’. If I were using this, I’d be very likely to use ‘duration’ and ‘logical_io_reads’ often as a matter of course as well.

The @Top parameter is set to 25, and that’s probably reasonable. If I’m troubleshooting an active problem, I’d be unlikely to look at more than the top 5 queries in whatever metric, but a larger view makes sense for historical purposes.

I’m taking the default option for @ExcludeAdhoc and @ExcludeInternal; if it’s at the top of my CPU usage, I want to see it.

I am setting @IncludeQueryText to 1 so I get the query_store_query_text.query_sql_text details in my report. That may help you identify a specific query. Even without this option the query_store_query.query_text_id will be available, so you could look up the exact query text directly in Query Store.

I’ve also enabled @VerboseMode to see the exact statement used to generate the report. If you are less familiar with Query Store, you might want to review this to see where all the data being used here is found.

With all this, generating the report took less than a second. Certainly you could set up a scheduled take to create a report on CPU activity every day, which you could review\aggregate\splice later. You could create tasks to do the same for ‘duration’ and ‘logical_io_reads’ and have a good set of data to review for potential issues.

Reviewing the report

Here’s a short script to view my latest report and all the data in the dbo.vServerTopQueriesStore view:

SELECT TOP 1
	@LatestReport = tqi.ReportID
FROM dbo.vServerTopQueriesIndex tqi
ORDER BY 
	tqi.CaptureDate DESC;

SELECT *
FROM dbo.vServerTopQueriesStore tqs
WHERE
	tqs.ReportID = @LatestReport
ORDER BY 
	tqs.CPU DESC;
GO

Report Output

Report Details

The result set includes the capture date for the report, which is the same for each row. We also have the server name, database name, and metric used for the report.

The PlanID, QueryID, and QueryTextID are values you can use to get more information directly from Query Store. I frequently use the QueryID in particular with the Tracked Queries interface in SQL Server Management Studio to look at the plan for a query I’ve already identified.

Object Details

We will see the ObjectName, ObjectID, and SchemaName for any queries that are part of a procedure, function, or other object.

The ExecutionTypeDesc here indicates these queries ran successfully. This value could also be ‘Aborted’ (for a timeout) or ‘Exception’ (for an error).

Performance Statistics

And here we have all of our performance statistics along with the query text. I most often look at three of these, but this includes memory usage (in pages), tempdb usage (also in pages), and log bytes.

The execution count is included, so you can use this to calculate averages for any of these measurements. I tend to focus on the queries that have the highest numbers overall, but you may find a query farther down on your list with a longer average duration is more important.

Coming up

These reports will be used as well by other aspects of the QDS Toolkit, so understanding how to create the reports themselves will be necessary for those.

As I’ve said, there are 7 more parts to this tool. I’m planning to do a post on each in the weeks (and months) ahead. I’ll also be doing more “foundation” posts like my last post on key lookups.

I hope you find this post helpful. If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion.

You can follow me on twitter (@sqljared) and contact me if you have questions.

USE WideWorldImporters
GO
SELECT TOP 100 
	sol.OrderID, 
	sol.UnitPrice,
	sol.Quantity,
	sol.PickedQuantity,
	sol.LastEditedWhen
FROM Sales.OrderLines sol 
WHERE 
	sol.StockItemID = 20
GO

Key Lookups

In working on my presentation for Data Saturday #8 – Southwest US, I hadn’t realized how many topics come up at least briefly in the talk. I wanted to make a few posts about to go into details on each of these topics and why they are important.

My thanks again to Deborah Melkin for her review and feedback of the presentation.

A key lookup is an operation that occurs when a query has used a nonclustered index on a given table, but needs to access more columns to complete the query. It may need to check columns not in that index for additional filters, or it may just need to return that column as part of its result set.

In the simple query above, we’re retrieving 100 rows from the seek against a nonclustered index, then performing a key lookup against the clustered index. There is a nested loops operator between the two and understanding how that operates is important; for each row we receive from the first table, we perform the second operation once. So, in this query we are seeking 100 rows from the nonclustered index, then performing the key lookup 100 times. We go through the index once for each row we return, and you can see the cost of the key lookup operator is 99% of the query.

Operator Details

Details for the Key Lookup operator

Mouseover

If we mouseover the key lookup, we can see the details of this operation. We actually read 100 rows . The “Estimated Operator Cost” (0.324977) is nearly 100 times that of the index seek (0.0035899).

The “Number of Executions” is 100, so for each row received from the index seek, we traverse the clustered index (its index and leaf pages) once to get that row. And we do 100 separate seeks of that index to get 100 rows. This is a lot more work than we did to get 100 rows with 1 index seek from the nonclustered index.

The estimates match our actuals, but the TOP clause is a very good hint for how many rows we should receive.

If you have a table scan somewhere in your plan is table scanning millions of rows, you should probably address that first. But removing the key lookup by returning fewer columns drops this query from 12.5 milliseconds to 73 microseconds. That’s a 94.16% duration reduction (thank you Query Store).

Resolution

There’s two ways to handle a query like this with a key lookup.

  1. Do we need these columns in our query?
  2. Create a covering index.

Addition by Subtraction

We are doing the key lookup because we want to return columns, or filter\otherwise use columns, that are not in the nonclustered index. Let’s first ask this: do we need these columns in our query?

If we check the code or application that’s retrieving the results, does it actually consume those columns from the result set and use them? If we are filtering on that column, does that filter still make sense? If not, let’s just take it out of the query to simplify matters.

And it is very clear which columns are the issue. If you look at the details of the key lookup in the image above, the Output List for that operator shows which columns we are using the clustered index to retrieve. If you don’t need any of them, you can remove them from the query. Your new execution plan will be missing a key lookup.

CREATE COVERING INDEX

The heading is a joke; there’s no such command, of course. A covering index is a nonclustered index that supplies all the information you need from a given table to complete a given query. So far, we’re doing key lookups for this query because no such an index exists. We could get all these columns from the clustered index, but we would have to scan the whole index because our WHERE clause doesn’t match the sorting of the clustered index.

Normally when we create an index, we want our index to include any columns we are filtering on. So it would include columns in our WHERE clause, or the columns in our JOIN clause if we are joining from another table. In some cases, you might want the index to match an ORDER BY. Here just the section in red.

For a covering index on this query, we need to include the SELECT list (in the green section) in our index. In general, every column for this table referenced in the query needs to be in our index.

The INCLUDE column is a great way to add in the columns in our SELECT list.

We could add those 5 columns to our index normally as key values, but that would unnecessarily bloat all the pages of the index. We aren’t filtering on any of those columns, so we don’t need the columns in the index pages for us to filter properly. If we use the INCLUDE clause, these columns will be present only in the leaf page of our index. This is similar to how the columns from the clustered index are added to all nonclustered indexes.

So a script for the new index would look like this:

CREATE NONCLUSTERED INDEX [IX_Sales_OrderLines_AllocatedStockItems] ON [Sales].[OrderLines](	
	[StockItemID] ASC
) INCLUDE(
	[PickedQuantity],
	[OrderID],
	[UnitPrice],
	[Quantity],
	[LastEditedWhen]	
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, ONLINE = ON, 
	ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [USERDATA]
GO

With the index in place, our original query took 95 microseconds. Slightly longer than the query with the reduced result set, but we did increase the size of the index some.

Conclusion

A key lookup might be an operation you don’t notice often, but I’ve been impressed with the result of removing them when I can.

I’ll be posting other blogs with foundational topics in the near future and more posts in general than I’ve had recently. Maybe this isn’t foundational; it might be on the first floor.

I hope you’ve learned something from this post. Please follow me on twitter (@sqljared) or contact me if you have questions.

So let’s talk about the best metric no one thinks about.

A Good Question

I was working with a client troubleshooting an issue several years back now, and they asked a question that was difficult to answer.

We were troubleshooting a slow query, and the reason it was taking so long was oddly opaque.

  • It wasn’t blocked
  • It wasn’t waiting on CPU or driving the CPU on the server
  • It wasn’t waiting on memory
  • It wasn’t waiting on the disks

So, the usual suspects were out. We didn’t have the plan for this query yet, and were operating without much information yet. So, the client asked the simple question, “If it isn’t waiting on anything, why isn’t it done?”

I thought it a simple question at first, then I really thought about it. With the usual suspects out, we really couldn’t see what SQL Server was doing. What could be taking up that time?

Invisible culprit

It finally occurred to me that just because we weren’t seeing the disks spike didn’t mean we weren’t doing a lot of reads. Logical reads don’t involve the disks; we’re only reading pages that are already in memory. They are faster than physical reads for sure, but that doesn’t mean they are instant.

The logical reads themselves won’t cause waits. They will use some CPU cycles, but in this case we saw no CPU related wait type or a high overall CPU %.

Take the plan above. It took 84 seconds to complete, and only returned 16 rows. But it read millions of rows at the lower levels. If the tables involved are largely in memory, we won’t be waiting on physical IO. It’s a SELECT statement, so it won’t be blocked or blocking if you are using read-committed snapshot isolation level (RCSI, I should blog about that).

With no waits or blocking, how would you see this query is less than optimal unless you are looking for it?

Detecting logical reads

We eventually got the execution plan and it confirmed my suspicion. The plan was reading many, many rows and joining across a number of tables. Once we had the plan, were able to come up with a plan to filter it down faster.

But the lesson here is the effects of logical reads are hard to detect. This makes logical reads a good metric to check for if you want to look at your activity within SQL Server. Here’s an example script for returning the queries with the highest logical reads in the execution cache, centering around sys.dm_exec_query_stats :

SELECT TOP 50 
	qs.creation_time, 
	qs.execution_count, 
	qs.total_logical_reads, 
	qs.total_logical_reads/qs.execution_count AS ave_logical_reads, 
	qs.total_elapsed_time, 
	qs.max_elapsed_time, 
	qs.total_elapsed_time/qs.execution_count as ave_duration, 
	substring(t.text, qs.statement_start_offset/2+1, 
		(CASE WHEN qs.statement_end_offset=-1 THEN (len(t.text)- qs.statement_start_offset)/2
		ELSE (qs.statement_end_offset- qs.statement_start_offset)/2 END )+1) AS statement_text,
	qp.query_plan,
	t.[text], 
	db_name(t.dbid) as db_name, 
	OBJECT_NAME( t.objectid, t.dbid) AS object_name, 
	qs.total_worker_time AS total_cpu_time, 
	qs.total_worker_time/qs.execution_count AS ave_cpu_time, 
	qs.total_physical_reads, 
	qs.total_physical_reads/qs.execution_count AS ave_physical_reads
	--,qs.plan_handle, qs.plan_generation_num
FROM sys.dm_exec_query_stats qs 
OUTER APPLY sys.dm_exec_sql_text(plan_handle) AS t
OUTER APPLY sys.dm_exec_query_plan(plan_handle) AS qp
ORDER BY qs.total_logical_reads DESC;
GO

This will let you look through the query stats for anything currently in the cache, but of course a restart of SQL Server or another action that clears the cache will mean there isn’t much to see.

If you use Query Store, you could see the queries with the most logical reads in the last two hours with the following:

SELECT TOP 10 
	SUM(rs.avg_logical_io_reads) AS sum_logical_io_reads,
	q.query_id,
	p.plan_id, 
	qt.query_sql_text, 
	p.query_plan,
	rsi.start_time
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q
    ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan AS p
    ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats AS rs
    ON p.plan_id = rs.plan_id
JOIN sys.query_store_runtime_stats_interval rsi
	ON rsi.runtime_stats_interval_id = 
		rs.runtime_stats_interval_id
WHERE 
	rsi.start_time > DATEADD(HOUR,-2,GETUTCDATE())
GROUP BY 
	q.query_id,
	p.plan_id, 
	qt.query_sql_text, 
	p.query_plan,
    rsi.start_time
ORDER BY SUM(rs.avg_logical_io_reads) DESC

Conclusion

Another point to consider, while physical reads take more time, the difference between the two is somewhat random. Whether given pages are in memory or not will affect how long a query takes, but we can’t really control what’s in memory without taking unusual measures.

Logical reads will always occur, and should be a more consistent measure of how much work a query requires.

Based on the plan, we may also end up reading to the same page in memory many times, for example if we have a cross product in our query.

I hope you find this post helpful.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to @ me and make a suggestion.

Please follow me on twitter (@sqljared) or contact me if you have questions.

Happy Thanksgiving.