Lately I’ve been doing a bit of MDX performance tuning and during the process I learned a new “trick” that blew my mind. Instead of just outright telling you what it is, I’d like to show you through an example.
Original Query
Take the following query shown below – which admittedly is a bit contrived but serves as both a fair representation of the calculations required at a client and as a nice example to highlight the performance improvement achieved with this new trick…so bear with me
WITH /* Calculate the customer age as of the 1/1/2007 */ MEMBER [Measures].[CustomerAge] AS DateDiff( "yyyy" ,[Customer].[Customer].CurrentMember.Properties("Birth Date", TYPED) ,Cdate("1/1/2007") ) /* Set of Customers between 18 and 35 yo as of 1/1/2007 */ SET Customers18to35 AS Filter( [Customer].[Customer].[Customer] ,[Measures].[CustomerAge] >= 18 AND [Measures].[CustomerAge] <= 35 ) /* Set of Customers18-35 who purchased a bike in 2007 */ SET CustomerBikePurchase AS NonEmpty( (Customers18to35,[Date].[Calendar].[Date]) ,( [Product].[Product Categories].[Category].&[1] // Bike Category ,[Date].[Calendar].[Calendar Year].&[2007] ,[Measures].[Internet Sales Amount] ) ) SET CustomerHelmetPurchase AS NonEmpty( (Customers18to35,[Date].[Calendar].[Date]) ,( [Product].[Product Categories].[Subcategory].&[31] // Helmet Subcategory ,[Date].[Calendar].[Date].&[20070101]:[Date].[Calendar].[Date].&[20071231].Lead(60) ,[Measures].[Internet Sales Amount] ) ) SET CustomerTirePurchase AS NonEmpty( (Customers18to35,[Date].[Calendar].[Date]) ,( [Product].[Product Categories].[Subcategory].&[37] // Tires and Tube Subcategory ,[Date].[Calendar].[Date].&[20070101]:[Date].[Calendar].[Date].&[20071231].Lead(90) ,[Measures].[Internet Sales Amount] ) ) SET CustomerBikeHelmetPurchase AS GENERATE( CustomerBikePurchase AS alias_Bike ,Filter( CustomerHelmetPurchase AS alias_Helmet , /* Customer = Customer */ alias_Helmet.Current.Item(0).MemberValue = alias_Bike.Current.Item(0).MemberValue AND /* Date of Helmet <= 60 after date of bike */ DateDiff( "d" ,alias_Bike.Current.Item(1).MemberValue ,alias_Helmet.Current.Item(1).MemberValue ) >= 1 AND DateDiff( "d" ,alias_Bike.Current.Item(1).MemberValue ,alias_Helmet.Current.Item(1).MemberValue ) <= 60 ) ) SET CustomerBikeTirePurchase AS GENERATE( CustomerBikePurchase AS alias_Bike ,Filter( CustomerTirePurchase AS alias_Tire , /* Customer = Customer */ alias_Tire.Current.Item(0).MemberValue = alias_Bike.Current.Item(0).MemberValue AND /* Date of Tire <= 90 after date of bike */ DateDiff( "d" ,alias_Bike.Current.Item(1).MemberValue ,alias_Tire.Current.Item(1).MemberValue ) >= 1 AND DateDiff( "d" ,alias_Bike.Current.Item(1).MemberValue ,alias_Tire.Current.Item(1).MemberValue ) <= 90 ) ) /* Customers between 18-35 who purchased a helmet within 60 days after a bike purchase */ SET DenominatorSet AS Extract(CustomerBikeHelmetPurchase,[Customer].[Customer]) /* Customers between 18-35 who purchased a helmet within 60 days after a bike purchase and who purchased a tire within 90 days after bike purchase */ SET NumeratorSet AS Intersect( DenominatorSet ,Extract(CustomerBikeTirePurchase,[Customer].[Customer]) ) MEMBER [Measures].[Denominator] AS Count(DenominatorSet) MEMBER [Measures].[Numerator] AS Count(NumeratorSet) MEMBER [Measures].[Performance] AS IIF( [Measures].[Denominator] = 0 ,Null ,[Measures].[Numerator]/[Measures].[Denominator] ) ,FORMAT_STRING = "percent" MEMBER [Measures].[Missing Customers] AS Generate( Except(DenominatorSet,NumeratorSet) ,[Customer].[Customer].CurrentMember.Name ,", " ) SELECT { [Measures].[Denominator] ,[Measures].[Numerator] ,[Measures].[Performance] ,[Measures].[Missing Customers] } ON 0 FROM [Adventure Works]
The query above calculates a metric defined as follows:
Denominator: Number of customers who bought a bike in 2007 and then bought a helmet within 60 days after the initial bike purchase.
Numerator: Number of customers from the denominator who also purchased a tire within 90 days after the initial bike purchase.
Performance: [Numerator] / [Denominator]
Missing Customers: List of customers from [Denominator] who did not qualify for the [Numerator]
Running it against the AdventureWorksDW2012 dataset returns the following:
As you can see in the result set above, there are 5 customers who purchased a helmet within 60 days after purchasing a bike in 2007. Of those 5 customers, 2 also purchased a tire within 90 days of the bike purchase.
That’s great, but there’s just one problem…the performance SUCKS!
Below are the stats from MDXStudio showing that it took the query ~42 seconds to execute which is completely unacceptable…
Cold Cache
Warm Cache
The major bottleneck in this query are the 2 sets shown below which are defined using a combination of the Generate and Filter functions…or what I like to refer to as the “double-iterator”:
SET CustomerBikeHelmetPurchase AS GENERATE( CustomerBikePurchase AS alias_Bike ,Filter( CustomerHelmetPurchase AS alias_Helmet , /* Customer = Customer */ alias_Helmet.Current.Item(0).MemberValue = alias_Bike.Current.Item(0).MemberValue AND /* Date of Helmet <= 60 after date of bike */ DateDiff( "d" ,alias_Bike.Current.Item(1).MemberValue ,alias_Helmet.Current.Item(1).MemberValue ) >= 1 AND DateDiff( "d" ,alias_Bike.Current.Item(1).MemberValue ,alias_Helmet.Current.Item(1).MemberValue ) <= 60 ) ) SET CustomerBikeTirePurchase AS GENERATE( CustomerBikePurchase AS alias_Bike ,Filter( CustomerTirePurchase AS alias_Tire , /* Customer = Customer */ alias_Tire.Current.Item(0).MemberValue = alias_Bike.Current.Item(0).MemberValue AND /* Date of Tire <= 90 after date of bike */ DateDiff( "d" ,alias_Bike.Current.Item(1).MemberValue ,alias_Tire.Current.Item(1).MemberValue ) >= 1 AND DateDiff( "d" ,alias_Bike.Current.Item(1).MemberValue ,alias_Tire.Current.Item(1).MemberValue ) <= 90 ) )
For those who might not be familiar, Generate and Filter are iterator functions (check it) so this is essentially a double loop construct which is a terrible design pattern for set-based tools like MDX. That said, given the complexity of the requirements for this particular project, there were a instances where it could not be avoided.
CustomerBikeHelmetPurchase – this set is constructed by looping through the entire set of (<customer>,<date of bike purchase>) tuples in [CustomerBikePurchase]. Then for each tuple in [CustomerBikePurchase], we are looping through the entire set of (<customer>,<date of helmet purchase>) tuples in [CustomerHelmetPurchase] and returning only the tuples where the <date of the helmet purchase> occurs between 1 and 60 days after the <date of the bike purchase> for the customer.
CustomerBikeTirePurchase – this set is constructed in the same manner as CustomerBikeHelmetPurchase described above…substituting [CustomerTirePurchase] for [CustomerHelmetPurchase] and the acceptable date range is from 1 to 90 days after the date of the bike purchase.
Tip 1: Use Member_Value instead of MemberValue
Now, what if we simply change the code for CustomerBikeHelmetPurchase and CustomerBikeTirePurchase to use Member_Value instead of MemberValue? Seems, simple enough, let’s give it a shot…
SET CustomerBikeHelmetPurchase AS GENERATE( CustomerBikePurchase AS alias_Bike ,Filter( CustomerHelmetPurchase AS alias_Helmet , /* Customer = Customer */ alias_Helmet.Current.Item(0).Member_Value = alias_Bike.Current.Item(0).Member_Value AND /* Date of Helmet <= 60 after date of bike */ DateDiff( "d" ,alias_Bike.Current.Item(1).Member_Value ,alias_Helmet.Current.Item(1).Member_Value ) >= 1 AND DateDiff( "d" ,alias_Bike.Current.Item(1).Member_Value ,alias_Helmet.Current.Item(1).Member_Value ) <= 60 ) ) SET CustomerBikeTirePurchase AS GENERATE( CustomerBikePurchase AS alias_Bike ,Filter( CustomerTirePurchase AS alias_Tire , /* Customer = Customer */ alias_Tire.Current.Item(0).Member_Value = alias_Bike.Current.Item(0).Member_Value AND /* Date of Tire <= 90 after date of bike */ DateDiff( "d" ,alias_Bike.Current.Item(1).Member_Value ,alias_Tire.Current.Item(1).Member_Value ) >= 1 AND DateDiff( "d" ,alias_Bike.Current.Item(1).Member_Value ,alias_Tire.Current.Item(1).Member_Value ) <= 90 ) )
Cold Cache
Warm Cache
…and BOOM goes the dynamite! Down to ~22-23 seconds from ~42 seconds…almost a 50% performance improvement!
This one was new to me and recommended by one of the architects on the project, Hans Esquivel (b | t). Even days later, I still haven’t really been able to fully wrap my head around the reasons why this improves performance…so if you know the answer or just have a theory, please be sure to leave a comment.
Tip 2: Compare Members Directly
In general, accessing member properties is expensive…therefore, we should avoid it whenever possible.
Below is another version of the “double-iterator” sets where we’re now comparing members directly for the customer-portion of the comparison criteria of the filter-function. Note the use of the “IS” comparison operator (as opposed to “=”) which is required when comparing members directly.
SET CustomerBikeHelmetPurchase AS GENERATE( CustomerBikePurchase AS alias_Bike ,Filter( CustomerHelmetPurchase AS alias_Helmet , /* Customer = Customer */ alias_Helmet.Current.Item(0) IS alias_Bike.Current.Item(0) AND /* Date of Helmet <= 60 after date of bike */ DateDiff( "d" ,alias_Bike.Current.Item(1).Member_Value ,alias_Helmet.Current.Item(1).Member_Value ) >= 1 AND DateDiff( "d" ,alias_Bike.Current.Item(1).Member_Value ,alias_Helmet.Current.Item(1).Member_Value ) <= 60 ) ) SET CustomerBikeTirePurchase AS GENERATE( CustomerBikePurchase AS alias_Bike ,Filter( CustomerTirePurchase AS alias_Tire , /* Customer = Customer */ alias_Tire.Current.Item(0) IS alias_Bike.Current.Item(0) AND /* Date of Tire <= 90 after date of bike */ DateDiff( "d" ,alias_Bike.Current.Item(1).Member_Value ,alias_Tire.Current.Item(1).Member_Value ) >= 1 AND DateDiff( "d" ,alias_Bike.Current.Item(1).Member_Value ,alias_Tire.Current.Item(1).Member_Value ) <= 90 ) )
Cold Cache
Warm Cache
Another solid performance boost! We’ve now shaved off > 85% of the query duration by tweaking the use of member properties.
Wrap Up
Hopefully, this post has provided a brief glimpse into into the dark-art of MDX performance tuning. The key take-aways are:
- the “double-iterator” pattern is terrible and should be avoided at all costs
- avoid accessing member properties if at all possible
- use Member_Value (instead of MemberValue)
Despite what appears to be identical functionality, Member_Value outperforms MemberValue by a non-negligible factor and despite some fairly deep tracing (i.e. xEvent, PerfMon, ProcMon, and DiskMon) I’m still unable to understand the mechanism that explains the performance difference. So if you know, please leave a comment.
For what it’s worth, below is the same query rewritten in a way that avoids the “double-iterator” pattern. Unfortunately, a similar approach was not possible for the client’s query.
WITH MEMBER [Measures].[CustomerAge] AS DateDiff( "yyyy" ,[Customer].[Customer].CurrentMember.Properties("Birth Date", TYPED) ,Cdate("1/1/2007") ) SET Customers18to35 AS Filter( [Customer].[Customer].[Customer] ,[Measures].[CustomerAge] >= 18 AND [Measures].[CustomerAge] <= 35 ) SET CustomerBikePurchase AS NonEmpty( (Customers18to35,[Date].[Calendar].[Date]) ,( [Product].[Product Categories].[Category].&[1] // Bike Category ,[Date].[Calendar].[Calendar Year].&[2007] ,[Measures].[Internet Sales Amount] ) ) MEMBER [Measures].[SalesAmountHelmet] AS SUM( ( [Date].[Calendar].CurrentMember.Lead(1):[Date].[Calendar].CurrentMember.Lead(60) ,[Product].[Product Categories].[Subcategory].&[31] // Helmet Subcategory ) ,[Measures].[Internet Sales Amount] ) SET CustomerHelmetPurchase AS Filter( CustomerBikePurchase ,[Measures].[SalesAmountHelmet] > 0 ) MEMBER [Measures].[SalesAmountTire] AS SUM( ( [Date].[Calendar].CurrentMember.Lead(1):[Date].[Calendar].CurrentMember.Lead(90) ,[Product].[Product Categories].[Subcategory].&[37] // Helmet Subcategory ) ,[Measures].[Internet Sales Amount] ) SET CustomerTirePurchase AS Filter( CustomerBikePurchase ,[Measures].[SalesAmountTire] > 0 ) /* Customers between 18-35 who purchased a helmet within 60 days after a bike purchase */ SET DenominatorSet AS CustomerHelmetPurchase /* Customers between 18-35 who purchased a helmet within 60 days after a bike purchase and who purchased a tire within 90 days after bike purchase */ SET NumeratorSet AS Intersect( DenominatorSet ,CustomerTirePurchase ) MEMBER [Measures].[Denominator] AS Count(DenominatorSet) MEMBER [Measures].[Numerator] AS Count(NumeratorSet) MEMBER [Measures].[Performance] AS IIF( [Measures].[Denominator] = 0 ,Null ,[Measures].[Numerator]/[Measures].[Denominator] ) ,FORMAT_STRING = "percent" MEMBER [Measures].[Missing Customers] AS Generate( Except(DenominatorSet,NumeratorSet) ,[Customer].[Customer].CurrentMember.Name ,", " ) SELECT { [Measures].[Denominator] ,[Measures].[Numerator] ,[Measures].[Performance] ,[Measures].[Missing Customers] } ON 0 FROM [Adventure Works]
Cold Cache
Warm Cache
3 replies on “MDX: MemberValue vs Member_Value”
[…] function to get the typed date value associated with each date, as Bill Anton’s recent blog post points out that it is marginally […]
LikeLike
[…] function to get the typed date value associated with each date, as Bill Anton’s recent blog post points out that it is marginally […]
LikeLike
[…] into the cube via the MDXScript, let’s revisit one of the queries from a post last year. In that post, we started with a really nasty query that took ~42 sec to run on a cold-cache and ~41 sec on a […]
LikeLike