This past weekend, after one of my presentations on Analysis Services at SQL Saturday 520, an attendee described a performance issue in his environment that he thought could be related to locking and blocking and asked if there was a way to confirm if this was the cause. The short answer is yes, absolutely!
While I wouldn’t say this is a common issue across *all* analysis services environments, it’s definitely common amongst the subset of analysis services environments that do intra-day processing – which is when the SSAS database is being processed during hours of the day when users might be simultaneously querying it.
Note: this issue has been well documented in this excellent post by Jason Howell over on the MSDN blogs… go read it if you think this is happening in your environment.
Here’s a checklist of risk factors and common symptoms:
- Intra-day processing
- Long-running queries
- Users reporting that Excel/PowerBI/etc hangs or takes a long time to connect to the cube or tabular model
- Queries failing with the following error message: “The operation was cancelled because of locking conflicts”
- Queries that usually run fast occasionally take a really long time.
If you wondering how to check for items 4 & 5, the short answer is you’ll need some performance monitoring software… or if you’re handy and don’t mind free, open-source utilities without formal support, you can check out the SSAS Events Analyzer over on GitHub (and let us know your feedback)!
Now, if you suspect this is occurring in your environment but aren’t sure, then you can use the following extended events trace which will capture everything you need to analyze a blocking issue in your environment:
<Create xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <ObjectDefinition> <Trace> <ID>LockingBlocking</ID> <Name>LockingBlocking</Name> <XEvent xmlns="http://schemas.microsoft.com/analysisservices/2011/engine/300/300"> <event_session name="LockingBlocking" dispatchLatency="0" maxEventSize="0" maxMemory="4" memoryPartition="none" eventRetentionMode="AllowSingleEventLoss" trackCausality="true" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <event package="AS" name="CommandEnd" /> <event package="AS" name="QueryEnd" /> <event package="AS" name="LockTimeout" /> <event package="AS" name="LockWaiting" /> <target package="package0" name="event_file"> <parameter name="filename" value="C:\Program Files\Microsoft SQL Server\MSAS13.SSAS_MD\OLAP\Log\LockingBlocking.xel" /> <parameter name="max_file_size" value="32" /> </target> </event_session> </XEvent> </Trace> </ObjectDefinition> </Create>
Stop the trace, once you think the issue has occurred, and load the xEvent trace data into a table for analysis.
In the most basic analysis, you can simply query the trace data to see if it contains any “lock waiting” events (EventClassID = 54)… the presence of these events in a trace indicate blocking!
However, in order to get the details needed to properly diagnose the issue *and* come up with a solution, you’ll want to gather a bit more detail.
The magic is in the blocking chain information contained in the [textdata] field of the lock waiting event:
The blocking chain contains all the details of the holdlist, waitlist, and locklist for the current transaction…which are explained in this KB article.
- HoldList: transactions that hold a lock that the current transaction tries to obtain
- WaitList: transactions waiting on locks in line ahead of the current transaction
- LockList: list of locks associated with the current transaction… either acquired or requested and waiting
Here’s the query I use to analyze blocking issues… warning: it’s not pretty, but it works
WITH CTE_trc AS ( SELECT * FROM dbo.trc_Query_DemoLockingBlocking /* <—- change this table name */ ) ,CTE_process AS ( SELECT ssas_db = trc.DatabaseName ,spid = trc.SPID ,request_id = trc.RequestId ,cmd_text = trc.textdata ,cmd_type = IIF(trc.EventSubclass = 3, 'XMLA', 'JSON') ,start_time = trc.StartTime ,end_time = trc.EndTime ,duration_ms = trc.Duration FROM CTE_trc trc WHERE trc.EventClass = 16 AND /* Command End */ trc.EventSubclass IN (3, 38) /* 3 = Process; 38 = Tabular JSON Command */ ) ,CTE_query AS ( SELECT ssas_db = trc.DatabaseName ,spid = trc.SPID ,request_id = trc.RequestId ,cmd_text = trc.textdata ,cmd_type = IIF(trc.EventSubclass = 3, 'DAX', 'MDX') ,start_time = trc.StartTime ,end_time = trc.EndTime ,duration_ms = trc.Duration FROM CTE_trc trc WHERE trc.EventClass = 10 AND /* Query End */ trc.EventSubclass IN (3, 0) /* DAXQuery = 3, MDXQuery = 0 */ ) ,CTE_lock_waits AS ( SELECT --TOP 999999 spid = trc.SPID ,request_id = trc.RequestID ,xml_lock_chain = TRY_CAST(textdata AS XML) ,trc.EventClass ,trc.EventSubclass FROM CTE_trc trc WHERE trc.EventClass = 54 /* Lock Waiting */ ) ,CTE_lock_chain_analysis AS ( SELECT lc.spid, lc.request_id ,lock_request_type = xml_shred.val.value('./HoldList/Type','VARCHAR(128)') ,blocked_by_spid = COALESCE( xml_shred.val.value('./WaitList/Transaction/SPID','VARCHAR(128)'), xml_shred.val.value('./HoldList/Transaction/SPID','VARCHAR(128)') ) FROM CTE_lock_waits lc CROSS APPLY lc.xml_lock_chain.nodes('.') xml_shred(val) CROSS APPLY lc.xml_lock_chain.nodes('./LockList/Lock[position()=(last())]') xml_locklist(val) ) ,CTE_lock_chain_summary AS ( SELECT lca.spid ,lca.request_id ,lca.lock_request_type ,lca.blocked_by_spid ,lock_waits = COUNT(*) FROM CTE_lock_chain_analysis lca GROUP BY lca.spid ,lca.request_id ,lca.lock_request_type ,lca.blocked_by_spid ) SELECT p.ssas_db --,request_id ,p.spid ,lcs.blocked_by_spid ,p.cmd_type ,p.cmd_text ,p.start_time ,p.end_time ,p.duration_ms ,lcs.lock_request_type ,lcs.lock_waits FROM CTE_process p LEFT OUTER JOIN CTE_lock_chain_summary lcs ON lcs.request_id = p.request_id UNION ALL SELECT q.ssas_db --,request_id ,q.spid ,lcs.blocked_by_spid ,q.cmd_type ,q.cmd_text ,q.start_time ,q.end_time ,q.duration_ms ,lcs.lock_request_type ,lcs.lock_waits FROM CTE_query q LEFT OUTER JOIN CTE_lock_chain_summary lcs ON lcs.request_id = q.request_id ORDER BY start_time ;
Here’s what the output looks like:
In this example, there’s a long running query (row 1, SPID 21417) that is blocking a processing command (line 2, SPID 21990) from being granted the exclusive write-lock it needs to complete the commit. After 30 seconds (default value for the ForceCommitTimeout setting) the long running query is cancelled and the processing command is granted the lock it needs to complete the commit.
Keep in mind, from the moment the processing command requests the exclusive write-lock, all new queries against the database are blocked until it completes… which why the last query (line 3, SPID 21450), which typically finishes in several milliseconds, takes 29.1 seconds.
Bottom line, if you’re processing cubes or tabular models during business hours, while users are running reports/queries, this is a potential issue you need to be aware of. In this post, you’ve been shown a very easy way to detect and analyze blocking issues… another option, is to implement a proper monitoring solution (regardless if it’s off-the-shelf or custom-rolled).
If you do find blocking issues in your environment, there are a handful of options… ranging from adjustments to the (msmdsrv.ini) configuration file to scaling out the SSAS environment and off-loading processing to a separate server… however, the optimal solution depends on a lot of other factors.
Not Enough? Interested in a more hands-on approach?
I got you covered. I’m now taking on short-term consulting gigs focused on helping clients get a handle on their Analysis Services environments. Let’s have a conversation about your needs.