Estimates and statistics are often discussed in our community, but I doubt the average DBA knows how they flow. So I wanted to write a post with examples showing how SQL Server estimates the rows for a specific operation.

Statistics

The SQL Server optimizer will estimate how many rows it expects to return for a query using statistics. This is part of how it determines the cost of plans and decides which plan to use.

Let’s take a look at the statistics for the Sales.Order table in the WideWorldImporters database. There is an index on the CustomerID column, so we’ll look at the statistics for that object.

DBCC SHOW_STATISTICS('Sales.Orders', FK_Sales_Orders_CustomerID);

Here’s most of the result from the DBCC command:

You can also get the same information by double-clicking the statistic in SSMS, and going to Detail on the popup:

Let’s look at the most important elements.

  • Updated: Very important. How often you should update statistics depends on several factors, but it’s good to know how old your statistics are when troubleshooting a bad plan. This date is from 2016 if you’ve just restored the WideWorldImporters database, but I just rebuilt that index (which updates the stats with a 100% sampling rate for free).
  • Rows and Rows Sampled: How many rows are in the index, and how many were sampled for the statistic object. This is a 100% sampling rate, but when stats are updated automatically this number can easily be less than 1%. The lower the sampling rate, the less accurate the statistics will be. This can lead to bad decisions by the optimizer. I prefer a higher rate, but we have to decide how long we want this to take on a large table.
  • Steps: Each step contains information about a range of key values for the index. Each step is defined by the last range value included. The final output shows each step as a row. 200 is the maximum number of steps.
  • All Density: This number is the inversion of the number of unique values for this column (1/distinct values). If the key has multiple columns, this will have multiple rows, and you can see how much more selective the index is when the additional columns are included. The value of 0.001508296 corresponds to 663 unique CustomerID values in the index. There are multiple rows in the second result set because this index has multiple columns. The All Density value gives the uniqueness for each combination of columns.

The third result set is a bit more involved, so I wanted to discuss how we can use it.

Histogram

  • RANGE_HI_KEY: Indicates the highest value for this range of values, as each row represents a step I referenced earlier. There is a row with a RANGE_HI_KEY of 3, and the next row is 6. This means CustomerID 4 and 5 are in the same step as 6, which is the RANGE_HI_KEY.
  • RANGE_ROWS: Gives the number of rows for all values of this step\range, excluding the RANGE_HI_KEY.
  • EQ_ROWS: Gives the number of rows equal to the RANGE_HI_KEY.
  • DISTINCT_RANGE_ROWS: This gives how many distinct values there are in the RANGE_ROWS, excluding the RANGE_HI_KEY again.
  • AVG_RANGE_ROWS: How many values are there for a key value in this range, on average.

Looking at the third row where the RANGE_HI_KEY is 6, the EQ_ROWS are 106. So if we query for CustomerID = 6, this histogram tells us we should return 106 rows.

The DISTINCT_RANGE_ROWS is 2, so there is only one more CustomerID in this range (either 4 or 5). RANGE_ROWS is 214, so the other CustomerID should have 108 rows, but if there are more than 2 distinct values we won’t know any number besides the EQ_ROWS. So if we look up a range value that isn’t the RANGE_HI_KEY, we’ll estimate based on AVG_RANGE_ROWS.

Example #1: RANGE_HI_KEY

To see how these numbers are used, let’s look at a simple query against the Sales.Orders table.

--RANGE_HI_KEY and EQ_ROWS
SELECT 
	OrderID
FROM Sales.Orders so
WHERE
	so.CustomerID = 99;
GO

Since 99 is the RANGE_HI_KEY for its step, we just need to check the EQ_ROWS. This query should return 122 rows. Since 99 is the RANGE_HI_KEY for its step, we just need to check the EQ_ROWS. This query should return 122 rows.

So let’s look at the plan to confirm.

That’s the estimate the plan displays, and it’s accurate because I just updated my stats.

Example #2: AVG_RANGE_ROWS

Slightly different if we search for a value that is not the RANGE_HI_KEY for its step.

--RANGE_HI_KEY and AVG_RANGE_ROWS
SELECT 
	OrderID
FROM Sales.Orders so
WHERE
	so.CustomerID = 98;
GO

Each step is defined by its highest value. So if there is a step for 96 and 99, 98 is in the same step as 99. We don’t know the number of rows for any row in the step except for the RANGE_HI_KEY, so the estimate should be the value stored in AVG_RANGE_ROWS, 125.5.

The operator indicates an estimate of 126, but if we hover over it we can see the exact value of 125.5. We returned 127 rows. There are limitations here, but this is a pretty close estimate.

Exampled #3: Variables

We see a different behavior if we use a variable in the query.

--Variable Estimate
DECLARE @CustomerID INT = 99;

SELECT 
	OrderID
FROM Sales.Orders so
WHERE
	so.CustomerID = @CustomerID;
GO

The estimate is off. When we had the value inline, the optimizer “sniffed” that value to see how many rows to expect. It didn’t do the same for the local variable.

Since the optimizer doesn’t know what the value inside that variable is, it can’t use the histogram. It has to make an estimate based on the details at the table level instead of the step level.

I mentioned the All Density value earlier. It’s a measure of how unique a given column is. If you multiply that value by the number of rows in the table, you should get an average number of rows that would be returned for a given CustomerID.

--Calculated Estimate
DECLARE
	@density numeric(12,12) = 0.009183228,
	@all_density numeric (12,12) = 0.001508296,
	@rows int = 73595.

SELECT
	1.0/@all_density AS Distinct_CustomerID,
	@all_density * @rows AS Estimated_Rows;
GO

This matches the value of 111.003 from the execution plan. So this is a somewhat blind estimate for any CustomerID when the optimizer doesn’t know the value before compiling.

Sidenote on Example #3:

This didn’t calculate correctly for me at first. It caused much consternation as I did more reading to try to understand why the numbers didn’t match. I likely would have posted this blog some time ago without this issue.

Then I realized WideWorldImporters was created for SQL Server 2016. These statistics were created in SQL Server 2016, and were restored with the rest of the database on my instance of SQL Server 2022.

Maybe I should rebuild the index? Then I did. And now it works perfectly.

Something to keep in mind for your endeavors.

Exampled #4: RECOMPILE

If we add OPTION(RECOMPILE) to our query, the optimizer will sniff the local variable and

--Recompile to Sniff Local Variable
DECLARE @CustomerID INT = 99;

SELECT 
	OrderID
FROM Sales.Orders so
WHERE
	so.CustomerID = @CustomerID
OPTION(RECOMPILE);
GO

And now our estimate is accurate again. It’s good to understand this point because you may see a large difference if you use OPTION(RECOMPILE) with a query that depends on a local variable.

Range Estimates

With range seeks (inequality comparisons), things are a bit different. We’re not looking for a specific value, and we’ll use our statistics differently. And there is one issue I’d like to point out. Consider this query:

SELECT so.OrderID
FROM Sales.Orders so
WHERE
	so.CustomerID >= 1001;
GO

When I execute this, the estimate is very accurate.

We’re only off by 1 row, and I will happily accept. But how did the optimizer arrive here? Let’s look at the last group of steps in this statistic.

The RANGE_ROWS is useful to use here for this range seek. It will tell the optimizer how many rows to expect from each step, but it excludes the EQ_ROWS, which is the number of rows for the RANGE_HI_KEY itself. So, to estimate this we will need to include the EQ_ROWS and RANGE_ROWS for each of these steps. If you add the highlighted numbers, you’ll end up with 4204.

Of course, the query would exclude CustomerID 1000, which should be in the step for CustomerID 1003. We don’t the exact value for this CustomerID; it’s one of our range values. So our best estimate is the AVG_RANGE_ROWS for that step, 121. If we subtract the 121 rows we estimate for CustomerID 1000, we have a final estimate of 4083.

Inequalities and Variables

But what if you used a variable with that same query?

DECLARE @CustomerID INT = 1001;

SELECT so.OrderID
FROM Sales.Orders so
WHERE
	so.CustomerID >= @CustomerID;
GO

The estimate is off by 82%. What just happened?

The optimizer can’t probe the variable, so it has nothing to base its estimate on. This estimate of 22078 is a default the optimizer uses guessing our query will return 30% of the rows in the table. This is present in the legacy cardinality estimator and the current CE for SQL Server 2022.

The first time I heard of this estimate was at a talk at PASS Summit. There are some references to it on other blogs, but there aren’t many of them.

These two blogs by Erik Darling and Andrew Pruski refer to the 30% estimate, and if you are reading this blog you might find them enlightening.

This blog by @sqlscotsman has more details about estimates and guesses involving other operators including LIKEs and BETWEENs.

In a more complex query where the range seek isn’t the only option to filter on, SQL Server can use other fields to make a more accurate estimate.

In Summary

Why does accuracy matter? When estimates are inaccurate, SQL Server will compare plans with incorrect costs. We’re more likely to end up with bad plans that over\under allocate resources or use the wrong join type or join order for a query.

The wrong plan can multiply the runtime of a query many times and massive increase blocking issues.

Knowing how our statistics function can help us write better procedures that are less likely to leave the optimizer in the dark.

My social media links are above. Please contact me if you have questions, and I’m happy to consult with you if you have a more complex performance issue. Also, let me know if you have any suggestions for a topic for a new blog post.

Reducing waits is a great way to improve the performance of your SQL Servers. Minimizing PAGELATCH_EX and PAGELATCH_SH wait types are more involved than most. There are generally two causes; one of which is largely solved in recent versions, and one which requires real thought and planning to resolve.

Tempdb Contention

Tempdb contention is caused when there is a high rate of object creation in tempdb. There are specific pages (the GAM, SGAM, and PFS pages) that are locked exclusively when allocating space for the new objects. This creates a bottleneck by serializing part of the process.

This shows up as a PAGELATCH_xx wait type, but to differentiate this from the other major cause of page latch contention, check the wait description. If you see three numbers separated by colons and the first number is a 2, you’re seeing tempdb contention. Consider this wait description: 2:1:1.

  • This three-part number indicates the db_id, file_id, and page_id this thread is waiting for access to.
  • db_id 2 corresponds to tempdb.
  • The file_id of 1 indicates the first data file for that database. If there are additional tempdb files any of them could be referenced. Adding data files reduces the contention by providing another set of key pages, so 1 is the most likely number you will see.
  • Most often the page_id will be 1 or 3 (the PFS or SGAM pages). As a file grows, additional PFS, GAM, and SGAM pages will be created at regular intervals, so if you see much larger numbers, it’s still the same issue.

There were a few options to mitigate this in older versions, but there have been changes since SQL Server 2016 that made this easier or automatic. This has been essentially solved in SQL Server 2022.

For my full post on this topic including a list of the changes in each version, see https://www.sqljared.com/blog/tempdb-contention-in-2023/.

Page Latch and the Hot Page Issue

If you see page latch waits that are not tempdb related, you are likely experiencing the hot page issue. This happens when many different connections are attempting to update the same page in memory.

This can happen with any write operation, but it’s probably easiest to understand when inserting new rows into a table. Imagine inserting new records one at a time into a table with a clustered index based on an IDENTITY(1,1) column.

The IDENTITY value for the next row will be the highest in the table, and the row will be placed in the last page. But that will be the case for each new row. If many different threads are trying to insert into the table simultaneously, they will block each other since they need exclusive access to the last page.

Even though the page is in memory and access is faster, it is not instantaneous.

Most documentation on the subject will focus on the clustered index, and I will also focus on it here. You can see page latch waits when there is a nonclustered index on an IDENTITY (or other sequential) column, but it will typically require an order of magnitude more inserts before you will see the contention on a nonclustered (and typically smaller) index.

So, how do we address this? There are several possibilities.

Change the clustered index

The issue here is the clustered index is based on our IDENTITY column. If you made this a nonclustered index instead, you could create a clustered index on another column. The order of the table would be based on the column(s) of the new clustered index.

This could be a composite index, which could still include the IDENTITY column. However the first column will determine the order of the rows in the table, and the goal would be for us to insert new rows randomly throughout the table.

I would prefer to use a small column for this as the clustered indexes columns get added to all other indexes in the table, duplicating that data. I also would choose a column that has many different values. If you chose a BIT field, you would go from having one hot page to having two. That may not improve matters.

You should also be careful not to use another column that would be, in practice, sequential. For example, a column that has the DATETIME for when the row was created would not be an IDENTITY column, but its values would be sequential and you would always insert it into the final page of the index.

I should also mention that randomizing the location of our inserts also means increasing our fragmentation. We will be getting page splits where we are inserting. That doesn’t happen if we insert it to the end of the index, we create a new empty page. This isn’t a huge concern for me, but it is a valid point.

You could also not have a clustered index at all, and the table will be stored in a heap with no order.

Use a computed column

If you don’t see a column you want to use for your clustered index, you can always create a new one.

ALTER TABLE Sales.OrderLines
ADD HashValue AS (CONVERT(tinyint, (OrderLineID%10))) PERSISTED NOT NULL;

This creates a new column that uses the last digit from the OrderLineID in the same table, so a number from 0-9. If you make this the first column of your clustered index, each new row will be inserted based on the HashValue.

Effectively, we would have 10 pages where we insert instead of one. So there will be less contention on each of those pages. You could change the mod to be higher to spread out the inserts more, but that would also lead to more fragmentation.

Partition the table

If you partitioned the table, you effectively have multiple b-tree structures for the clustered index. If there are 10 partitions, each of them has a hot page. This is another way of going from one hot page where all new rows are inserted to using multiple pages.

Partitioning wouldn’t be my first approach. Unless I want the table to be partitioned for other purposes, like switching or truncating partitions, I wouldn’t want to introduce the complications of partitioning.

If your queries are not aligned (containing and filtering on the partitioning key), your queries will suddenly be reading all of the b-tree structures for the aligned index. This could be a substantial increase in reads and can increase duration.

OPTIMIZE_FOR_SEQUENTIAL_KEY

This is an option you can use when creating or rebuilding your indexes. This attempts to increase throughput by limiting the number of threads that can request the latch and favoring threads that are likely to complete their work in a single quantum.

With this enabled, you will see a new wait type, BTREE_INSERT_FLOW_CONTROL. You may see an increase in overall waits and some threads may see more latency if they are delayed for favored threads, but the result should be an increase in throughput.

For more details, see this blog from Pam Lahoud.

Reduce transaction size Tune your writes

My initial thought was to encourage you to look at the entire transaction containing the write that is running into page latch contention. The idea would be to shrink the transaction as a whole to allow its latches to be released sooner, but a little research showed a problem.

Latches are not held for the entire transaction, only for the operation that requires them. So if you are inserting a new row as part of a larger transaction, you will hold locks related to that INSERT for the entire transaction while latches are released when the INSERT is complete. For reference, please see the comparison of latches and locks in this Microsoft article.

Still, anything we can do to speed up that statement will release the latch sooner and allow greater throughput. If your operation reads a lot of data, optimize your execution plan as best you can. Add hints if that will make performance more consistent. Perhaps you can read the data in a separate statement to get the IDs of the rows to write so that your write operation can be simplified (using Manual Halloween).

Read Committed Snapshot

SELECT statements also require latches while accessing data, and they can add to the contention when running an INSERT or UPDATE on a hot page. If you see your writes operations wait on PAGELATCH_SH, you may want to consider using READ_COMMITTED_SNAPSHOT.

This is a database-level setting, so you will need to consider its impact on all operations in the database. But the benefit is that your read and write processes no longer block each other. If you are experiencing hot page contention, your reads will at least stop contributing to the problem.

Batch your operations

I’ve saved this for last because it’s probably the most effective solution and one requiring the most work. Instead of having 50 connections each calling the same procedure to insert one row each, why not make one call using a TVP input to insert all 50 rows?

If you work in a batch, you reduce the number of threads operating on a given table while processing the same number of rows. Reducing threads reduces contention.

The difficulty is the change to your application. Does it receive a large amount of data to insert all at once? If so, changing to a new procedure would be relatively simple. Do encourage your app developers to keep the batch size reasonable, given the size of one row in your TVP.

If the application only receives individual values to insert, you could queue them to insert a batch at once. This may be more complicated on the code side.

I have an example from a recent presentation where I inserted 500 rows of data into a table in the WideWorldImporters database in three different ways. The results are below.

The first method used a simple procedure to insert 1 row into the table. I ran this from SSMS 500 times. Of course, this meant I was inserting the values serially, and there was no contention from multiple threads running at once. This took 23 microseconds per row.

The second method used the same single-insert procedure, but SQLQueryStress allowed me to run 50 executions across 10 threads. I included sys.query_store_wait_stats in my query against Query Store so we can see “Buffer Latch” waits. Each row took 119 microseconds on average, and interestingly our CPU time for each insert was significantly higher (about 3x).

The third method used a different stored procedure taking a memory-optimized TVP as input and inserting all 500 rows in one call. The batched approach took 4.2 microseconds per row.

Conclusion

It’s been a while since my last post. I was focusing my attention on my presentation for the PASS Data Community Summit in November, which went very well and was a great experience.

And then I changed jobs at the start of December. This has been a good move for me, but it involved a lot of thought and emotion as it meant leaving a team with great people.

Things have settled now, and hopefully, I’ll get back to writing on a more regular cadence.

My social media links are above. Please contact me if you have questions, and I’m happy to consult with you if you have a more complex performance issue. Also, let me know if you have any suggestions for a topic for a new blog post.

I’ve learned a few more items of interest relating to Parameter Sensitive Plan Optimization that I wanted to pass on.

Parameter Sensitive Plan Optimization: Known issue resolved

First, I wanted to point out that there are fixes to issues specific to Parameter Sensitive Plan Optimization in SQL Server 2022. The issue itself is documented at the link below.

https://learn.microsoft.com/en-us/sql/relational-databases/performance/parameter-sensitive-plan-optimization?view=sql-server-ver16#known-issues

When I heard about the issue, I heard that it could cause SQL Server 2022 to cause stack dumps every 15 minutes. The time interval made me think it was related to the default interval for Query Store, and that seems to be the case.

There’s only one known issue documented, but there appear to be updates in CU5 and CU6 for this issue (or some variation of this issue). I’ll link to both below:

https://learn.microsoft.com/en-us/troubleshoot/sql/releases/sqlserver-2022/cumulativeupdate5#2357623

https://learn.microsoft.com/en-us/troubleshoot/sql/releases/sqlserver-2022/cumulativeupdate6#2470755

If you are using PSPO but not encountering this issue, that’s great. But I’d still update as soon as I can.

Don’t be a statistic

I was looking back at a previous blog post and trying to decide why I had not seen PSPO used when I tried testing using the WideWorldImporters database. I had deliberately skewed the data on the Sales.Orders table, and I tried running a procedure with different parameter values that should return a range of row counts.

Nothing.

But the point of PSPO is to take the variable passed in, determine how many rows we should expect it to return, then choose the correct plan based on that cardinality. So, its function depends on checking the histogram for that parameter.

And I hadn’t updated my statistics after I skewed the data…

It worked much better after I updated the stats.

Just thought I would drop that off to save someone else some time.

Odd relationship

With that resolved, I ran a query to see my plans for this procedure. It has only one statement, but I wanted to see the base query, any variants, and their plans. Here’s the query:

SELECT 
	qsq.query_id,
	qsp.plan_id,
	qsq.query_text_id,
	qsp.plan_type,
	qsp.plan_type_desc,
	OBJECT_NAME(qsq.object_id),
	qv.query_id,
	qvp.plan_id,
	CAST(qvp.query_plan AS XML),
	qv.query_text_id,
	qvp.plan_type,
	qvp.plan_type_desc
FROM sys.query_store_query qsq
INNER JOIN sys.query_store_plan qsp
	ON qsp.query_id = qsq.query_id
LEFT JOIN sys.query_store_query_variant vr
	ON vr.parent_query_id = qsq.query_id
LEFT JOIN sys.query_store_query qv
	ON qv.query_id = vr.query_variant_query_id
LEFT JOIN sys.query_store_plan qvp
	ON qvp.query_id = qv.query_id
WHERE
	qsq.object_id = OBJECT_ID('Sales.GetOrders')
ORDER BY
	qsq.query_id,
	qsp.plan_id,
	qv.query_id,
	qvp.plan_id;

Here are the results and they were a little surprising.

The parent query is the first column, with the dispatcher plan next.

It’s probably not immediately obvious what is odd here. The plan_type and plan_type_desc are included in the result set in columns 4 and 5 for the parent query. The first 12 rows show “Compiled Plan” as its type, but that’s the type for a “normal” query that isn’t using PSPO. So why are all the variant columns populated?

Because the query_id didn’t change once we started using a new dispatcher plan. Since we only joined to sys.query_store_query_variant based on the parent_query_id, we related our variant queries to all plans for the parent_query_id.

Joining on the dispatcher_plan_id as well seems appropriate.

LEFT JOIN sys.query_store_query_variant vr
	ON vr.parent_query_id = qsq.query_id
	AND vr.dispatcher_plan_id = qsp.plan_id

Yeah, that’s a lot better.

Multiple variables?

A skewed parameter will return a low (less than 100 rows), middle, or high (either more than 100,000 rows or more than 1 million) number of rows. When the optimizer sees the value of the parameter passed in, it gets the cardinality of that value to decide which plan to use.

But SQL Server can vary depending on up to three different parameters for a query. Each parameter has its low, middle, and high thresholds, and plans are created for the various combinations of those parameters.

I recently saw an example of a query using PSPO with multiple parameters. Here’s the dispatcher block of the XML from the anonymized plan for that query:

          <Dispatcher>
            <ParameterSensitivePredicate LowBoundary="100" HighBoundary="100000">
              <StatisticsInfo LastUpdate="2023-02-09T03:47:24.27" ModificationCount="26303" SamplingPercent="86.3454" Statistics="[_statistic_1]" Table="Object1" Schema="Schema1" Database="Database1" />
              <Predicate>
                <ScalarOperator ScalarString="ScalarString1">
                  <Compare CompareOp="EQ">
                    <ScalarOperator>
                      <Identifier>
                        <ColumnReference Database="Database1" Schema="Schema1" Table="Object1" Alias="Object3" Column="Column1" />
                      </Identifier>
                    </ScalarOperator>
                    <ScalarOperator>
                      <Identifier>
                        <ColumnReference Column="Column2" />
                      </Identifier>
                    </ScalarOperator>
                  </Compare>
                </ScalarOperator>
              </Predicate>
            </ParameterSensitivePredicate>
            <ParameterSensitivePredicate LowBoundary="100" HighBoundary="100000">
              <StatisticsInfo LastUpdate="2023-02-09T03:47:24.27" ModificationCount="7831" SamplingPercent="100" Statistics="[index_nonclustered_1]" Table="Object2" Schema="Schema1" Database="Database1" />
              <Predicate>
                <ScalarOperator ScalarString="ScalarString2">
                  <Compare CompareOp="EQ">
                    <ScalarOperator>
                      <Identifier>
                        <ColumnReference Database="Database1" Schema="Schema1" Table="Object2" Alias="Object4" Column="Column3" />
                      </Identifier>
                    </ScalarOperator>
                    <ScalarOperator>
                      <Identifier>
                        <ColumnReference Column="Column4" />
                      </Identifier>
                    </ScalarOperator>
                  </Compare>
                </ScalarOperator>
              </Predicate>
            </ParameterSensitivePredicate>
          </Dispatcher>

The Dispatcher block has the details for this plan that relate to PSPO.

  • The ParameterSensitivePredicate block shows the boundary values for the parameter, and there are two blocks in this plan. Both plans have boundary values of 100 and 100,000.
  • The StatisticsInfo block is very useful; it shows the statistic being used to make estimates and its sampling rate. The object and database are also indicated
  • The columns that relate to the parameters we are using for PSPO are in the ColumnReference block.

In my case, I saw two plans for the query in question, and if there were any differences they were trivial. Here’s one of the last blocks in the XML plan:

            </RelOp>
            <ParameterList>
              <ColumnReference Column="Column4" ParameterDataType="int" ParameterCompiledValue="Value1" />
              <ColumnReference Column="Column8" ParameterDataType="int" ParameterCompiledValue="Value2" />
              <ColumnReference Column="Column6" ParameterDataType="bit" ParameterCompiledValue="Value3" />
              <ColumnReference Column="Column12" ParameterDataType="int" ParameterCompiledValue="Value4" />
              <ColumnReference Column="Column2" ParameterDataType="int" ParameterCompiledValue="Value5" />
              <ColumnReference Column="Column10" ParameterDataType="bigint" ParameterCompiledValue="Value6" />
            </ParameterList>
          </QueryPlan>

Column2 and Column4 are the two related to our PSPO parameters. But there are several other columns that are important to our plan. The ParameterCompiledValue shows the actual value provided for each column when the plan was created. We see “Value1” because this is an anonymized plan.

Using these values, I was able to see how many records we should expect for each using the histogram from our statistics. While the cardinality of Column2 and Column4 may vary a lot depending on the inputs, other columns would not vary as much. Two, in particular, indicated that we would return 1 or a handful of rows at most.

So while the optimizer does the extra work of generating extra plans based on the parameters that have significant skew, the plan is really being driven by other parameters with much lower cardinality.

IN SUMMARY

I’ll post more on this subject as I come across anything new. I will also be presenting a session on Parameter Sensitive Plan Optimization at PASS Data Community Summit in Seattle this November. I hope you’ll consider attending. Last year was my first as a presenter, and I had a wonderful time.


You can follow me on Twitter (@sqljared) and contact me if you have questions. My other social media links are at the top of the page. Also, let me know if you have any suggestions for a topic for a new blog post.