Today I ran into a performance/scalability problem with a group of stored procs I’d written earlier this week. All the procs passed basic unit-testing, but once I increased the load from processing 100 records to 100,000 records…you know, to make sure it scales and stuff…the test went from < 3 seconds to over 35 minutes!! Houston, we have a problem!
I love running into problems (when I’m not under the gun) because I know it means I’m about to learning something new. Now the first step in any proper database-related troubleshooting effort is to? That’s right, grab a cup of coffee. Not only does this give you some time to think about the problem and start forming a logical plan of attack, but it also gets you hopped up on caffeine which is helpful is staving off gumption traps.
Ok, all jacked up? Good. Now, usually when I’m confronted with a poorly performing query, my second step is to pop it in SSMS and capture the execution plan. However, in this case, the performance problem is with a statement (or statements) buried somewhere in a set of stored procs. Just the thought of plucking out each statement 1 by 1 and running it in SSMS is enough to make me bleed from the ears. Thankfully, there’s an easy way to capture them all…SET STATISTICS XML (ON | OFF)
All I have to do is wrap the top-level stored proc like so…
[sql]SET STATISTICS XML ON
SET STATISTICS XML OFF[/sql]
..and as it executes, the actual execution plan for every statement that runs (whether its in that top level stored proc or in a nested lower-level stored proc) gets captured and a link is provided in the results pane…
It’s somewhat annoying to have to click through the links to find the problem statement, but the annoyance is marginal compared to the option of extracting each statement and executing it one-by-one in SSMS.
After a few minutes, I came across the suspect plan…Notice anything that sticks out?
(depending on your screen resolution, you may need to click the image to see the full size)
How about now?
Yeah, that’s a pretty fat pipe going into the Nested Loops join. Also, don’t read too much into the difference between Actual Rows and Estimated Rows…what’s important is the sheer magnitude of the Actual rows…which is just a cross-product of the outer and inner tables involved. Regardless of the system (which happens to be a local instance of SQL 2012 running on the VM I’m using for development at the client site)…that’s way too many rows for a Nested-Loops join. Instead, I’d like to see a hash-join. Btw, here’s some good info on the various join types: Advanced Query Tuning Concepts.
Why was a bad plan generated?
Probably the most common reason for the optimizer choosing a bad plan is due to statistics being out of date. However, I was pretty confident that wasn’t the case here since this was a controlled test scenario after a full-redeploy of the entire database…and I could always confirm this suspicion by updating all statistics just before the test.
However, another common reason for the optimizer choosing a bad plan…especially when stored procedures are involved…is bad parameter sniffing. Greg Larson wrote an excellent article covering parameter sniffing…
Parameter sniffing is the process whereby SQL Server creates an optimal plan for a stored procedure by using the calling parameters that are passed the first time a stored procedure is executed. By “first time”, I really mean whenever SQL Server is forced to compile or recompile a stored procedures because it is not in the procedure cache. Every subsequent call to the same store procedure with the same parameters will also get an optimal plan, whereas calls with different parameter values may not always get an optimal plan
How do you determine “bad” parameter sniffing from “good” parameter sniffing?
First you have to look determine the parameter values used during plan compilation and compare them to the values used during runtime. If you’re using SQL Sentry Plan Explorer (which is absolutely free and absolutely awesome!) there’s a parameters tab which shows compiled vs runtime values for parameters used in the statement…
As you can see, there’s a difference…the plan was compiled using a value of 4 for @RulePriority, but during the poorly performing test, a value of 1 was used. Note, that just because there’s a difference doesn’t automatically imply a problem. There’s a bit more to it than that…for example, if the execution/processing logic changes drastically for various parameter values, that could be a sign of a problem. In this case, when a value of 4 is passed into the procedure, only a small set of data actually being worked on. On the other hand, when a 1 is passed in, the load is much bigger.
How do we correct this?
In the article mentioned earlier on parameter sniffing, Greg proposes 3 workarounds…
- creating local variables in the stored proc and copying the values from the input parameters
- using WITH RECOMPILE option in the procedure declaration
- creating a separate stored proc for each potential call
…all are valid solutions, but besure you understand the pros/cons for each before making a choice.
In this particular case, and due largely to the nature of the proc (batch processing) I chose to go with option 4…wait, what? Yep, option 4…adding a query hint. OPTIMIZE FOR UNKNOWN is a query hint that was added back in SQL 2008:
This hint directs the query optimizer to use the standard algorithms it has always used if no parameters values had been passed to the query at all. In this case the optimizer will look at all available statistical data to reach a determination of what the values of the local variables used to generate the queryplan should be, instead of looking at the specific parameter values that were passed to the query by the application.
This is essentially the same as using WITH RECOMPILE in the stored proc declaration, except it only applies to a specific statement in the procedure.
After adding the query hint and testing again, we see a much better execution plan is generated…
You can ignore the warnings/cautions on the sorts and second hash-match. They just indcate that data spilled to disk (tempdb) during execution…which should go away once I deploy to a more appropriately sized system. The main thing is that the nested loop join running across 8.3 billion records is gone and in its place is a hash match. Furthermore, the execution time plummeted from 35 minutes down to ~18 seconds. Pretty cool stuff.