Millisecond Improvements

Posted on 03/23/202006/17/2020 by sqljared

In my last post, I spoke about optimizing a procedure that was being executed hundreds of millions of times per day, and yes, that is expected behavior.

The difficult thing about trying to optimize this procedure is that it only takes 2.5ms on average to run. Tuning this isn’t a matter of changing a scan to a seek; we’ll have to look hard to find the opportunities here. A one millisecond Improvement on a procedure running 100 million times a day would save 100,000 seconds every day.

Well, I’ve found a few more options since my last post, and wanted to share my findings.

Setup

The procedure has some complex logic but only runs a few queries.

There are a few simple SELECT statements to populate some variables. These take a small percentage of the overall runtime.
There are two UPDATE statements, and we will run one or the other. Both join a table to a table variable; one has a second CTE doing some aggregation the other lacks. The majority of our time is spent running these UPDATEs.
An INSERT statement that takes place every time. This is to ensure that if we didn’t update a record because it didn’t exist, we make sure we insert the row. It’s very likely on a given run we will INSERT 0 rows.

Brainstorming

Since we 80% of our time in the UPDATE (I love Query Store), that’s the place with the most potential for gain.

But, on first look (first couple) it seems difficult to see room for improvement here. We’re doing index seeks with small row counts. The index scans are against memory optimized table variables, and you may notice they are cheaper than than the index seeks.

But, looking at plan one thing did draw my attention:

There’s a table spool, and following that I see the plan is updating an indexed view. Which we would do every time there’s an UPDATE. Hundreds of millions of times a day…huh. So, removing the index on that view would eliminate this entire middle branch from the plan.

The view is based on two columns that are the first two columns of the clustered PK of the underlying table. The view does some aggregation, but the difference between querying the view or the table is reading 1 row versus maybe 2 or 3 rows, most of the time. Dropping that index seems like a good thing to try.

And I did mention this in my last blog post, but we perform the INSERT statement every execution of the procedure, and we run the trigger on this table even if we inserted 0 rows. So, if we can detect whether the INSERT is needed, we can potentially skip the majority of the executions of the statement and the trigger.

The logic for the procedure uses a TVP and a couple of table variables, which isn’t optimal. SQL Server doesn’t have statistics on table variables, so it’s not able to make good estimates of how many rows are going to be returned (unless you are using table variable deferred compilation in SQL Server 2019). We could change these to temp tables, and see if we have better results. Hopefully, we’ll have a more stable plan across the many databases running it.

Both of the UPDATE statements have a bookmark lookup. We’ll be looking up only a few rows, but this could be a significant improvement for a query that takes so little time. Also, one of the UPDATE statements references the main table an additional time in its CTE. So we have two index seeks plus the key lookup. How much of our time is spent in the second access and the bookmark lookup?

Results

first change

I’ve been working on releasing these changes individually, and the first one is complete. Removing the index on the view, resulted in a 17% reduction in the duration of the procedure (from 2.34ms to 1.94ms), and a 20% reduction in CPU. Come to think of it, not having to update that index would have helped with our INSERT statement as well.

I’ll update this post once I have details on the other changes.

If you liked this post, please follow me on twitter and contact me if you have questions.

Updates

Second change

So, the second idea I’ve tried on this issue, is replacing the table variables with temp tables. The idea was that temp tables have statistics and table variables don’t, so we should tend to have better execution plans using temp tables. However, it didn’t work out that was in this case because it also caused pagelatch contention in tempdb.

Tempdb pagelatch contention is a very well documented issue. There are things you can do to mitigate this issue, but at a certain point you just need to create fewer temp tables. I’m not sure where the line is in the case of my environment, but it was clear that creating 3 temp tables in a proc running this often crosses that line.

This change has been reverted, but I’ll update this post again shortly. I should be making the change to skip unnecessary INSERT statements this week.

Skipping Inserts

This change has been made, and the results are fairly minor. The first measure I took showed the average duration of the proc dropping from 1.95 to 1.91 ms, but this varies enough from day to day that I could cherry pick numbers and get a different, even negative improvement.

This confirms something interesting. The cost of the procedure overall was skewed very heavily to the UPDATE statements, with the INSERT being significantly less. The most expensive operator in a DML operation is typically the Insert\Update\Delete operator itself; the step where we actually change data. My expectation all along has been that there were very few rows actually being inserted by our INSERT statement. It seems this is true and since the statement doesn’t actually insert data, the Insert operator doesn’t do work or take any significant amount of time. So we gained very little by this change, at least from the procedure.

This change also prevented us from calling the trigger with empty inserts, so we saw the trigger drop from executing ~350 million times per day to 6 million. This saves us about 14,000 seconds per day. I don’t think about context switching in the context of SQL Server very often, but in addition to the time benefit, there’s a small benefit in just not having to set up and switch to the context for the trigger itself.

Overall a small victory, but I’ll take it.

Pending

I should have my final update on this topic in the next week. This will change the logic of one of the UPDATE statements to remove an unnecessary CTE which references the main table, and adds an index hint to use the primary key, removing a bookmark\key lookup from both statements. The first update hits the table twice (once for the nonclustered, another for the key lookup) and the second hits it thrice (because of the CTE), but this change will drop both to 1 access of the PK, and should only query a few rows.

I’m very interested to see how this affects performance, as this should affect the largest part of work done outside of the actual Update operator in those statements.

Finale

So my change to the UPDATE statement is out, and the results are pretty good. First, The runtime of the procedure has dropped to 1.399 ms. If you recall, it originally took about 2.5 ms, so we’ve dropped this overall by more than 1 millisecond, which would be 40% of its original runtime.

Second, I love the simplicity of the new plan. The previous anonymized plan is at the top of this post; it took 2 screenshots to cover most of it. In particular, I noticed we were hitting the central table 3 times. Once in a CTE that was doing aggregation that was unnecessary, I removed that entirely. The second reference was in the main query, and it caused a key lookup which was the third access. I hinted our query to use the clustered primary key, which leads with the same two columns that are in the nonclustered index the optimizer seems to prefer.

The index scans are both in memory optimized table variables, and you can see the estimates are lower than accessing the main table in the clustered index seek. I also love seeing that the actual update operator on our left is such a large amount of the effort; that’s what I expect to see on a DML operation that’s tuned well.

In summary

I had 4 ideas originally to tune this query. I wanted to see if we could drop it by 1 millisecond, and we gained 1.1.

Removing the index from an indexed view referencing this table had a significant effect. This required a significant amount of research, as I had to find all the places where we referenced the view and determine if any would experience significantly worse performance without the aggregation from that view. I didn’t see any red flags, and this changed dropped the procedure’s duration from 2.5ms to 1.94ms (22% reduction).
The procedure uses a few in memory table variables with temp tables. The idea is that temp tables have statistics. That could lead to SQL Server making better plans, because it can estimate how many rows are in a given operation. This would work in other cases, but not for a proc that runs this often. Performance was actually slowed because of significant PAGELATCH waits. Every execution we were creating multiple temp tables, and at this pace our threads were constantly waiting on key pages in tempdb. This change was reverted.
Reducing the INSERTs was a gain, but a minimal one. The INSERT statement itself took very little of our time in the procedure. We also got to skip running the INSERT trigger, but that also did not take long to run. It’s possible we ended up with less waiting in the main table, or the table our trigger was updating, but if so the gains were too small to quantify.
Simplifying the logic of the two UPDATE statements that were taking most of the time in the procedure was a success. We went from 3 operations on the only permanent table in our query to 1, and removed an aggregation step we didn’t need. This dropped our runtime from 1.94ms to 1.399ms (27.8%). Every operation counts.

Hopefully you’ve learned something from this post and its updates. If so, please follow me on twitter or contact me if you have questions.

Reducing Trigger Executions

Posted on 03/05/202006/22/2020 by sqljared

Reducing Trigger Executions

I’ve never been a fan of triggers. I don’t like the idea of them adding an additional tax on every operation. It’s easy to forget they are even there, consuming your cycles. I’ve even seen a few nasty death-by-a-thousand-cuts scenarios with them. But I found something out this week that makes me like them even less.

I was tuning a procedure that runs 284 million times a day.

Over a number of servers and databases, but yes, that number is correct. It takes 2.5ms to run on average, with 1.0ms of CPU time. I’ll spare you the math, but that means over 3 cores of SQL Server are doing nothing but running this procedure 24/7/365. Anything we can do to improve this will be significant, even if we just shave off half a millisecond.

Small Improvements

The procedure is attempting to update or insert a few records based on a TVP input. Sometimes we run an UPDATE statement, but update the column to its current value based on one input parameter. Based on another parameter, we may filter out all rows and update nothing. But in both cases, even though we haven’t changed any data we still took the time to run the UPDATE statement. I’m unsure how often those two parameters match one of those two checks, but if we check them first we can entirely skip the UPDATE.

Like I said, every little bit helps.

After the UPDATE, there is an INSERT to make sure if we didn’t UPDATE the row because it doesn’t exist, we INSERT it. There are conditionals around the UPDATE, but not the INSERT. So, we run it every time we run the proc. 284 million times a day. Even if there’s nothing to insert.

Trigger Happy

The revelation is this:

“These triggers fire when any valid event fires, whether table rows are affected or not.”

Ouch. And I really wasn’t aware of this. So, we pay for the INSERT execution every time, even if we don’t need to insert anything. And the INSERT trigger on that table fires as well, trying to update another table even though the INSERTED and DELETED tables are empty. Charming.

But if our TVP had x items to update\insert and our UPDATE statement handled y of them, we only need to try the INSERT if y<x.

We won’t even need to query the underlying table or the TVP; just check the row counts.

We’re likely to update these records many times, and we’ll obviously only insert them once. Checking the execution counts for the INSERT and UPDATE triggers and subtracting the number of executions from this proc, it seems we UPDATE the table several hundred times for each INSERT. So, we should be able to reduce trigger executions by a huge amount.

Well then. I’ll look forward to seeing those executions drop shortly.

If you liked this post, please follow me on twitter or contact me if you have questions.

Postscript

This change reduced trigger executions and INSERT statement executions from ~350 million per day, to ~5 million.

Graphs like this make me happy. 🙂

Case Study: Slow Garbage Collection

Posted on 02/17/202007/25/2020 by sqljared

Slow Garbage Collection

I encountered a curious issue recently, and immediately knew I needed to blog about it. Having already blogged about implicit conversions and how the TOP operator interacts with blocking operators, I found a problem that looked like the combination of the two.

I reviewed a garbage collection process that’s been in place for some time. The procedure populates a temp table with the key values for the table that is central to the GC. We use the temp table to delete from the related tables, then delete from the primary table. However, the query populating our temp table was taking far too long, 84 seconds when I tested it. We’re scanning and returning 1.4 million rows from the first table, doing a key lookup on all of them. We scan another table to look up the retention period for the related account, as this database has information from multiple accounts, and return 8.4 million rows there. We join two massive record sets, then have a filter operator that gets us down to a more reasonable size.

In general, when running a complex query you want your most effective filter, that results in the fewest rows, to happen first. We want to get the result set small early, because every operation after becomes less expensive. Here’s part of our anonymized query:

 WHERE
   Object5.Column10 = ?
   AND Object5.Column6 < Variable4 - COALESCE(Object12.Column1,Variable1,?)

So, Object5 is our main table. Column10 is a status column, and we compare that to a scalar value to make sure this record is ready for GC. That’s fine. The second is checking if the record is old enough to GC, and this is where things get interesting. The code here is odd; we’re subtracting a COALESCE of three values from a DATETIME variable, relying on behavior that subtracting an INT from a DATETIME subtracts that number of days from the DATETIME. There’s an implicit conversion, but it is on the variable side of the inequality. And the plan above appears to be our result.

So, Bad Date Math Code?

Seems like a good root cause, and I don’t like this lazy coding pattern, but let’s test it out with tables we can all look at. Would this be any better if we had written this to use the DATEADD function to do the math properly?


USE AdventureWorks2014
GO
--CREATE INDEX IX_SalesOrderHeader_ModifiedDate ON Sales.SalesOrderHeader (ModifiedDate);

SELECT TOP 100
	soh.ModifiedDate,
	soh.SalesOrderID
FROM Sales.SalesOrderHeader soh
WHERE
	soh.ModifiedDate < DATEADD(day, -1, GETUTCDATE());
	
SELECT TOP 100
	soh.ModifiedDate,
	soh.SalesOrderID
FROM Sales.SalesOrderHeader soh
WHERE
	soh.ModifiedDate < GETUTCDATE() - 1;

So, if we had an index on the OrderDate column, would it perform differently?

Apparently not. Same plan, same cost. But when you think about it, this tests the date math by subtracting an integer from the DATETIME provided by GETUTCDATE(). The original was subtracting a COALESCE of three values. One of those values was a float. Could the COALESCE or the resulting data type have made this more complicated?

Testing COALESCE


USE AdventureWorks2014
GO
DECLARE
	@AccoutSetting FLOAT = 8.0,
	@Default INT = 365;
	
SELECT TOP 100
	soh.ModifiedDate,
	soh.SalesOrderID
FROM Sales.SalesOrderHeader soh WITH(INDEX(IX_SalesOrderHeader_ModifiedDate))
WHERE
	soh.ModifiedDate < DATEADD(DAY, -COALESCE(@AccoutSetting, @Default), GETUTCDATE());

Running this, again we see a nice seek that reads and returns 100 rows. So the different data type and the COALESCE makes no difference.

Looking at the original query again, the first value isn’t a variable, it’s a column from a different table. We can’t filter by this column until we’ve read the other table, which affects our join order. But we have no criteria to seek the second table with.

Joined Filtering

One more test. Let’s see what the behavior looks like if we join to the Customers table to look for the RetentionPeriod. First, I’ll create and populate some data in that column:


USE AdventureWorks2014
GO
IF NOT EXISTS(
	SELECT 1
	FROM INFORMATION_SCHEMA.COLUMNS isc
	WHERE
		isc.TABLE_NAME = 'Customer'
		AND isc.TABLE_SCHEMA = 'Sales'
		AND isc.COLUMN_NAME = 'RetentionPeriod'
)
BEGIN
	ALTER TABLE Sales.Customer
		ADD RetentionPeriod INT NULL;
END;
GO
UPDATE TOP (100) sc 
	SET sc.RetentionPeriod = (sc.CustomerID%4+1)*90
FROM Sales.Customer sc;
GO
CREATE INDEX IX_Customer_RetentionPeriod ON Sales.Customer (RetentionPeriod);
GO

I only populated a few of the records to better match the production issue; only some accounts have the value I’m looking for, hence the COALESCE.


USE AdventureWorks2014
GO
DECLARE
	@Default INT = 365;

SELECT TOP 100
	soh.ModifiedDate,
	soh.SalesOrderID,
	sc.RetentionPeriod
FROM Sales.SalesOrderHeader soh
LEFT JOIN Sales.Customer sc
	ON sc.CustomerID = soh.CustomerID
	AND sc.RetentionPeriod IS NOT NULL
WHERE
	soh.ModifiedDate < DATEADD(DAY, -COALESCE(sc.RetentionPeriod, @Default), GETUTCDATE());

Now we’re trying to filter records in SalesOrderHeader, based on the Customer’s RetentionPeriod. How does this perform?

Well, the row counts aren’t terrible, but we are scanning both tables. The optimizer opted to start with Customer table, which is much smaller. We’re not filtering on the date until the filter operator, after the merge join.

I’d be worried that with a larger batch size or tables that don’t line up for a merge join, we’d just end up doing a hash match. That would force us to scan the first table, and without any filter criteria that would be a lot of reads.

Solution

The solution I applied to my production query was to create a temp table that had the account ID and the related retention period. I joined this to my primary table, and the query that was taking 84 seconds was replaced with one that took around 20 milliseconds.


USE AdventureWorks2014
GO
IF OBJECT_ID('tempdb..#AccountDates') IS NULL
BEGIN
	CREATE TABLE #AccountDates(
		CustomerID INT,
		GCDate DATETIME
	);
END
GO
DECLARE
	@Default INT = 365;

INSERT INTO #AccountDates
SELECT 
	sc.CustomerID,
	DATEADD(DAY, -COALESCE(sc.RetentionPeriod, @Default), GETUTCDATE()) AS GCDate
FROM Sales.Customer sc
WHERE
	sc.RetentionPeriod IS NOT NULL;

SELECT 
	soh.SalesOrderID
FROM #AccountDates ad
JOIN Sales.SalesOrderHeader soh
	ON soh.CustomerID = ad.CustomerID
	AND soh.ModifiedDate < ad.GCDate;

TRUNCATE TABLE #AccountDates;
GO

Here’s how I’d apply that thought to our example. I’ve created a temp table with CustomerID and its associated RetentionDate, so we could use that to search Sales.SalesOrderHeader.

You may have noticed my filter on the Sales.Customer table. In the live issue, the temp table had dozens of rows for dozens of accounts, not the tens of thousands I’d get from using all rows Sales.Customer in my example. So, I filtered it to get a similar size in my temp table.

Infinitely better. Literally in this case, since the ElapsedTime and ActualElapsedms indicators in the plan XML are all 0 milliseconds. I’m declaring victory.

If you liked this post, please follow me on twitter or contact me if you have questions.