QDS Toolbox: QDS Cache Cleanup

Posted on 09/20/202109/20/2021 by sqljared

When configuring Query Store, we have a few options for deciding how it retains data but little control over how it cleans up that data. We can set the max size of our query store, the max number of plans to keep per query, and how long to keep query statistics.

The QDS Cache Cleanup component of the QDS Toolbox gives us a number of other options for what data to remove:

Internal queries
Adhoc queries
Orphaned queries (from deleted stored procedure and other dropped objects)
Queries that have not run recently
Target queries with fewer than X executions
Remove only stats associated with targeted queries

Performance

There was a performance issue where I work that related to the QDS size-based cleanup that Mark Wilkinson discusses here. There were a number of symptoms and issues seen at the time and hats off to our DBAs for determining the root cause. This is something I wanted to highlight.

These issues didn’t start until QDS started hitting max size, so that was over a month in some cases, and it didn’t hit all instances and databases at the same time. This means the issue seemed “random” when it was happening.
Mark Wilkinson

Using the QDS Cache Cleanup, you can prevent the typical size-based cleanup from running, while having more control on what gets cleaned up. The procedure provided will identify which queries or stats in Query Store to remove, then calls system procedures to remove them:

sp_query_store_unforce_plan
sp_query_store_remove_query
sp_query_store_reset_exec_stats

Options

Let’s discuss the options and how they interact.

@InstanceIdentifier: You can use this to gather data from another instance of SQL Server.
@DatabaseName: Defaults to the current database. I keep my QDS data in a separate database, so I will use it in my examples.
@CleanAdhocStale: Binary option; default is 0. Setting this to 1 will clean up data related to any stale queries that are ad-hoc queries, i.e. not associated with an object like a stored procedure. Only stale ad-hoc queries are affected, as defined by the @Retention and @MinExecutionCount.
@CleanStale: Binary option; default is 1. Cleans up all stale queries. Also depends on the settings for @Retention and @MinExecutionCount.
@Retention: This setting helps define which queries are considered “stale”. Takes in a number of hours since the query was last run. The default is 168 hours (a week), meaning any query that had run in the last week would not count as stale. If you set this to 0, it will include all queries that match the @CleanStale/@CleanAdhocStale options.
@MinExecutionCount: Queries with fewer than this many executions are stale. Default is 2. So, if we take both default options, this will clean up queries with less than 2 executions and anything that has not run in the last week. Setting this to 0, again, would include all queries or all ad-hoc queries.
@CleanInternal: Binary option; default is 1. Cleans up any internal queries. This includes queries being run by SQL Server itself to do things like update statistics, and is based on a filter on sys.query_store_query.is_internal_query. This isn’t a “stale” option, so @Retention and @MinExecutionCount don’t affect this.
@CleanOrphan: Binary option; default is 1. Cleans up any queries that are associated with an object that no longer exists.
@CleanStatsOnly: Binary option; default is 0. When enabled, this option doesn’t remove the query and plan, but uses sp_query_store_reset_exec_stats to remove the statistics for any targeted queries. When not enabled, the default behavior unforces any forced plans with sp_query_store_unforce_plan, and then removes the query, its plans, and its execution statistics with sp_query_store_remove_query. Applies to any query included by the above options.
@ReportAsText and @ReportAsTable: Both default to 0. You can enable either or both to get details on the amount of space recovered from the cleanup, in whichever format you prefer.
@ReportIndexOutputTable: Default NULL. Setting this allows you to store the report data in a table like dbo.QDSCacheCleanupIndex, created during setup.
@ReportDetailsAsTable: Default 0. When enabled, returns details about each query being deleted from Query Store.
@ReportDetailsOutputTable: Default NULL. allows you to store the report details in a table like dbo.QDSCleanSummary, created during setup.
@TestMode: Default 0. Doesn’t actually delete data, but provides output as though it does. You could use this to see what the effect of a given set or parameters would be before actually taking a destructive action.
@VerboseMode: Default 0. Provides the queries being used in the messages tab.

One point for clarity, if the @Retention or @MinExecutionCount is 0,

QDS Cache Cleanup examples:

EXECUTE [dbo].[QDSCacheCleanup]
	@DatabaseName = 'WideWorldImporters'
	,@CleanAdhocStale = 0
	,@CleanStale = 1
	,@Retention = 24
	,@MinExecutionCount = 2
	,@CleanOrphan = 1
	,@CleanInternal = 1
	,@ReportAsTable = 1
	,@ReportDetailsAsTable = 1
	,@TestMode = 1;
GO

This is a example execution given in the comments of the procedure. Since this is run in test mode, nothing is actually deleted; the reports provided give information on what would be deleted if we ran this process normally.

CleanAdhocStale is not used, but CleanStale is a superset of it; so stale ad-hoc queries are included with all stale queries. Anything that has been executed at least twice in the last day is not considered stale.

This process does include any queries from dropped objects and any internal queries, regardless of when they were executed.

The report from the QDS Cache Cleanup gives one line per type of query affected. The output includes when this was generated and where, along with the count of queries and plans cleaned up and the space that would be recovered.

The report details shows the object name (where possible), Query ID, LastExecutionTime, ExecutionCount and QueryText for all queries included in the cleanup.

In my case, it found queries from a procedure I was testing recently (though I had to up the execution count so there would be some stale queries). Most of what the QDS Cache Cleanup flagged for deletion were internal queries involved in updating stats. But nothing was removed, since this was still in TestMode.

There were also some stale ad-hoc queries that look like statistics activity (you can see StatMan in the QueryText). These are internal queries but had executed few times and not recently, so they were also flagged as stale.

This raises an important point. Based on the @Retention and @MinExecutionCount options, orphaned or internal queries can count as stale and be included in your cleanup. If either setting is 0, all queries will be included by the QDS Cache Cleanup; essentially a full wipe of Query Store.

Here’s a few more examples with comments:

USE QDSToolBox
GO
	-- Test Mode; no deletion
	-- Includes Stale queries (24 hours or &lt; 20 executions)
	-- Also includes orphaned and internal queries
	-- Provides report, details, and verbose output (Messages tab)
EXECUTE [dbo].[QDSCacheCleanup]
	@DatabaseName = 'WideWorldImporters'
	,@Retention = 24
	,@MinExecutionCount = 20
	,@CleanStale = 1
	,@CleanAdhocStale = 0
	,@CleanOrphan = 1
	,@CleanInternal = 1
	,@ReportAsTable = 1
	,@ReportDetailsAsTable = 1
	,@TestMode = 1
	,@VerboseMode = 1;
GO

	-- Test Mode; no deletion
	-- Only includes orphaned and internal queries
	-- Provides report, details, and verbose output (Messages tab)
EXECUTE [dbo].[QDSCacheCleanup]
	@DatabaseName = 'WideWorldImporters'
	,@CleanStale = 0
	,@CleanOrphan = 1
	,@CleanInternal = 1
	,@ReportAsTable = 1
	,@ReportDetailsAsTable = 1
	,@TestMode = 1
	,@VerboseMode = 1;
GO

	-- Test Mode; no deletion
	-- Disables most default options
	-- Only includes ad-hoc queries only executed once
	--		and not run in the last 24 hours.
	-- Provides report, details, and verbose output (Messages tab)
EXECUTE [dbo].[QDSCacheCleanup]
	@DatabaseName = 'WideWorldImporters'
	,@Retention = 24
	,@MinExecutionCount = 2
	,@CleanStale = 0
	,@CleanAdhocStale = 1
	,@CleanOrphan = 0
	,@CleanInternal = 0
	,@ReportAsTable = 1
	,@ReportDetailsAsTable = 1
	,@TestMode = 1
	,@VerboseMode = 1;
GO

	-- Test Mode; no deletion
	-- @CleanStale = 1, other clean uptions disabled
	-- Setting @Retention or @MinExecutionCount to 0 means all queries are stale
	-- Output table options included but commented.
	-- Provides report, details, and verbose output (Messages tab)
DECLARE
	@ReportID BIGINT;

	EXECUTE [dbo].[QDSCacheCleanup]
		@DatabaseName = 'WideWorldImporters',
		@CleanAdhocStale = 0,
		@CleanStale = 1,
		@Retention = 0,			--	All Queries Stale
		@MinExecutionCount = 0,	--	All Queries Stale
		@CleanOrphan = 0,
		@CleanInternal = 0,
		@CleanStatsOnly	= 0,
		@ReportAsText = 1,
		@ReportAsTable = 1,
		@ReportDetailsAsTable = 1,
		--@ReportIndexOutputTable	= '[dbo].[QDSCacheCleanupIndex]',
		--@ReportDetailsOutputTable	= '[dbo].[QDSCacheCleanupDetails]',
		@TestMode = 1,
		@VerboseMode = 1,
		@ReportID = @ReportID OUTPUT;

SELECT @ReportID;
GO

More to come

Some of the options in the QDS Cache Cleanup didn’t function like I expected at first, so I think this post should be helpful. I’ll continue to post on the QDS Dashboard, but I’ll likely include a few on other topics in the weeks to come.

I will be speaking at PASS Summit, which is free and virtual this year, so please sign up if you haven’t already.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.

QDS Toolbox and Server Top Queries:

Posted on 07/05/202108/14/2021 by sqljared

The QDS Toolbox

The QDS Toolbox is set of tools that can help you review and store the performance related data in Query Store. This was released by ChannelAdvisor last September thanks to the hard work of a number of my coworkers.

If you aren’t experienced with Query Store, this can provide a good starting point for getting familiar with data that is available and what you can do with it. If you are experienced with Query Store, this may give you an easy way to set up customizable reports that help you find issues and see trends.

The QDS Toolbox has several components, and I intend to post about each in turn. Two new components were added to this recently by @sqlozano (https://www.sqlozano.com/), bringing the current total to eight.

Server Top Queries
Query Waits
QDS Cache Cleanup
Pivoted Waits Stats
Query Variation
Waits Variation
Statistics Used
Plan Miner

Getting Started

First things first, let’s get it downloaded and installed. I’ve linked the github page above. You can download a ZIP of the current package, or you can clone it with GitHub Desktop or a similar tool.

Once you have the package local, there’s an Installer folder. You could install them a la carte, or the QDSToolBox_Installer script will install all of the components. It will prompt you for the instance of SQL Server you want to use as well as the database you want to install it in. I’m putting mine in it’s own database, as I frequently restore other databases (WideWorldImporters, AdventureWorks2014) when I’m testing things.

Moving to Query Store

In my own experience with performance tuning with SQL Server, I started off using Profiler and PSSDiag constantly when I worked for Microsoft. After a number of years, I moved a lot of my focus to using queries against DMVs (Dynamic Management Views). DMVs allowed me to get most of the same data I reviewed from PSSDiag, but I’m able to get that data with a query that takes a few seconds (typically) instead of having to gather a trace for 30 minutes to feel like I have enough data.

By focusing on a DMV like sys.dm_exec_query_stats, you can easily find which queries on your server have the highest CPU usage, duration, or logical reads. This can make if very easy to identify a problem query, and you can find the query even if it isn’t running currently.

The caveat is that the DMVs only track what is in the cache, and once a query’s plan is no longer cached, it’s gone. No historical data is kept in the DMVs, and that’s why I use Query Store almost exclusively these days.

Server Top Queries

This is a great place to start exploring what is available in the QDS Toolbox, because looking for the top resource consuming queries is a common task.

Once you have the tool installed, you can run the dbo.ServerTopQueries to generate reports based on the metric you choose. That report will remain in the database where the QDS Toolbox was installed, and you can review them whenever. Here’s an example execution of the procedure:

USE QDSToolBox
GO
DECLARE 
	@StartTime DATETIME2,
	@EndTime DATETIME2;

SELECT
	@StartTime = DATEADD(DAY,-1,GETUTCDATE()),
	@EndTime = GETUTCDATE();

EXEC dbo.ServerTopQueries
	@ServerIdentifier	= @@SERVERNAME,
	@DatabaseName	= 'WideWorldImporters',
	@ReportIndex	= 'dbo.ServerTopQueriesIndex',	--provide both Report options to store results
	@ReportTable	= 'dbo.ServerTopQueriesStore',	--provide both Report options to store results
	@StartTime		= @StartTime,
	@EndTime		= @EndTime,
	@Top			= 25,
	@Measurement	= 'cpu_time',	--duration, cpu_time, logical_io_reads, logical_io_writes, 
									--physical_io_reads, clr_time, query_used_memory, log_bytes_used, tempdb_space_used
	@IncludeQueryText	= 1, --default: 0
	@ExcludeAdhoc		= 0,
	@ExcludeInternal	= 0,
	@VerboseMode		= 1,
	@TestMode			= 0;

GO

DECLARE
	@LatestReport INT;

SELECT TOP 1
	@LatestReport = tqi.ReportID
FROM dbo.vServerTopQueriesIndex tqi
ORDER BY 
	tqi.CaptureDate DESC;

SELECT *
FROM dbo.vServerTopQueriesStore tqs
WHERE
	tqs.ReportID = @LatestReport
ORDER BY 
	tqs.CPU DESC;
GO

This script will run in the QDSToolbox database and store the report there, but we will be gathering the Query Store data on activity in the WideWorldImporters databse. I’ve just run some scripts there to generate activity.

The @StartTime and @EndTime parameters will restrict us to looking at activity in the last day.

The @ReportIndex and @ReportTable parameter define where we will store the data. I’ve left these at the default names for the tables, as defined by the tool.

The report is going to be focusing on high cpu queries, defined by the @Measurement variable being set to ‘cpu_time’. If I were using this, I’d be very likely to use ‘duration’ and ‘logical_io_reads’ often as a matter of course as well.

The @Top parameter is set to 25, and that’s probably reasonable. If I’m troubleshooting an active problem, I’d be unlikely to look at more than the top 5 queries in whatever metric, but a larger view makes sense for historical purposes.

I’m taking the default option for @ExcludeAdhoc and @ExcludeInternal; if it’s at the top of my CPU usage, I want to see it.

I am setting @IncludeQueryText to 1 so I get the query_store_query_text.query_sql_text details in my report. That may help you identify a specific query. Even without this option the query_store_query.query_text_id will be available, so you could look up the exact query text directly in Query Store.

I’ve also enabled @VerboseMode to see the exact statement used to generate the report. If you are less familiar with Query Store, you might want to review this to see where all the data being used here is found.

With all this, generating the report took less than a second. Certainly you could set up a scheduled take to create a report on CPU activity every day, which you could review\aggregate\splice later. You could create tasks to do the same for ‘duration’ and ‘logical_io_reads’ and have a good set of data to review for potential issues.

Reviewing the report

Here’s a short script to view my latest report and all the data in the dbo.vServerTopQueriesStore view:

SELECT TOP 1
	@LatestReport = tqi.ReportID
FROM dbo.vServerTopQueriesIndex tqi
ORDER BY 
	tqi.CaptureDate DESC;

SELECT *
FROM dbo.vServerTopQueriesStore tqs
WHERE
	tqs.ReportID = @LatestReport
ORDER BY 
	tqs.CPU DESC;
GO

Report Output

The result set includes the capture date for the report, which is the same for each row. We also have the server name, database name, and metric used for the report.

The PlanID, QueryID, and QueryTextID are values you can use to get more information directly from Query Store. I frequently use the QueryID in particular with the Tracked Queries interface in SQL Server Management Studio to look at the plan for a query I’ve already identified.

We will see the ObjectName, ObjectID, and SchemaName for any queries that are part of a procedure, function, or other object.

The ExecutionTypeDesc here indicates these queries ran successfully. This value could also be ‘Aborted’ (for a timeout) or ‘Exception’ (for an error).

And here we have all of our performance statistics along with the query text. I most often look at three of these, but this includes memory usage (in pages), tempdb usage (also in pages), and log bytes.

The execution count is included, so you can use this to calculate averages for any of these measurements. I tend to focus on the queries that have the highest numbers overall, but you may find a query farther down on your list with a longer average duration is more important.

Coming up

These reports will be used as well by other aspects of the QDS Toolkit, so understanding how to create the reports themselves will be necessary for those.

As I’ve said, there are 7 more parts to this tool. I’m planning to do a post on each in the weeks (and months) ahead. I’ll also be doing more “foundation” posts like my last post on key lookups.

I hope you find this post helpful. If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion.

You can follow me on twitter (@sqljared) and contact me if you have questions.

Key Lookups

Posted on 05/31/202109/24/2021 by sqljared

USE WideWorldImporters
GO
SELECT TOP 100 
	sol.OrderID, 
	sol.UnitPrice,
	sol.Quantity,
	sol.PickedQuantity,
	sol.LastEditedWhen
FROM Sales.OrderLines sol 
WHERE 
	sol.StockItemID = 20
GO

Key Lookups

In working on my presentation for Data Saturday #8 – Southwest US, I hadn’t realized how many topics come up at least briefly in the talk. I wanted to make a few posts about to go into details on each of these topics and why they are important.

My thanks again to Deborah Melkin for her review and feedback of the presentation.

A key lookup is an operation that occurs when a query has used a nonclustered index on a given table, but needs to access more columns to complete the query. It may need to check columns not in that index for additional filters, or it may just need to return that column as part of its result set.

In the simple query above, we’re retrieving 100 rows from the seek against a nonclustered index, then performing a key lookup against the clustered index. There is a nested loops operator between the two and understanding how that operates is important; for each row we receive from the first table, we perform the second operation once. So, in this query we are seeking 100 rows from the nonclustered index, then performing the key lookup 100 times. We go through the index once for each row we return, and you can see the cost of the key lookup operator is 99% of the query.

Operator Details

Mouseover

If we mouseover the key lookup, we can see the details of this operation. We actually read 100 rows . The “Estimated Operator Cost” (0.324977) is nearly 100 times that of the index seek (0.0035899).

The “Number of Executions” is 100, so for each row received from the index seek, we traverse the clustered index (its index and leaf pages) once to get that row. And we do 100 separate seeks of that index to get 100 rows. This is a lot more work than we did to get 100 rows with 1 index seek from the nonclustered index.

The estimates match our actuals, but the TOP clause is a very good hint for how many rows we should receive.

If you have a table scan somewhere in your plan is table scanning millions of rows, you should probably address that first. But removing the key lookup by returning fewer columns drops this query from 12.5 milliseconds to 73 microseconds. That’s a 94.16% duration reduction (thank you Query Store).

Resolution

There’s two ways to handle a query like this with a key lookup.

Do we need these columns in our query?
Create a covering index.

Addition by Subtraction

We are doing the key lookup because we want to return columns, or filter\otherwise use columns, that are not in the nonclustered index. Let’s first ask this: do we need these columns in our query?

If we check the code or application that’s retrieving the results, does it actually consume those columns from the result set and use them? If we are filtering on that column, does that filter still make sense? If not, let’s just take it out of the query to simplify matters.

And it is very clear which columns are the issue. If you look at the details of the key lookup in the image above, the Output List for that operator shows which columns we are using the clustered index to retrieve. If you don’t need any of them, you can remove them from the query. Your new execution plan will be missing a key lookup.

CREATE COVERING INDEX

The heading is a joke; there’s no such command, of course. A covering index is a nonclustered index that supplies all the information you need from a given table to complete a given query. So far, we’re doing key lookups for this query because no such an index exists. We could get all these columns from the clustered index, but we would have to scan the whole index because our WHERE clause doesn’t match the sorting of the clustered index.

Normally when we create an index, we want our index to include any columns we are filtering on. So it would include columns in our WHERE clause, or the columns in our JOIN clause if we are joining from another table. In some cases, you might want the index to match an ORDER BY. Here just the section in red.

For a covering index on this query, we need to include the SELECT list (in the green section) in our index. In general, every column for this table referenced in the query needs to be in our index.

The INCLUDE column is a great way to add in the columns in our SELECT list.

We could add those 5 columns to our index normally as key values, but that would unnecessarily bloat all the pages of the index. We aren’t filtering on any of those columns, so we don’t need the columns in the index pages for us to filter properly. If we use the INCLUDE clause, these columns will be present only in the leaf page of our index. This is similar to how the columns from the clustered index are added to all nonclustered indexes.

So a script for the new index would look like this:

CREATE NONCLUSTERED INDEX [IX_Sales_OrderLines_AllocatedStockItems] ON [Sales].[OrderLines](	
	[StockItemID] ASC
) INCLUDE(
	[PickedQuantity],
	[OrderID],
	[UnitPrice],
	[Quantity],
	[LastEditedWhen]	
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, ONLINE = ON, 
	ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [USERDATA]
GO

With the index in place, our original query took 95 microseconds. Slightly longer than the query with the reduced result set, but we did increase the size of the index some.

Conclusion

A key lookup might be an operation you don’t notice often, but I’ve been impressed with the result of removing them when I can.

I’ll be posting other blogs with foundational topics in the near future and more posts in general than I’ve had recently. Maybe this isn’t foundational; it might be on the first floor.

I hope you’ve learned something from this post. Please follow me on twitter (@sqljared) or contact me if you have questions.