Categories
Uncategorized

MDX: MemberValue vs Member_Value

Lately I’ve been doing a bit of MDX performance tuning and during the process I learned a new “trick” that blew my mind. Instead of just outright telling you what it is, I’d like to show you through an example.

Original Query

Take the following query shown below – which admittedly is a bit contrived but serves as both a fair representation of the calculations required at a client and as a nice example to highlight the performance improvement achieved with this new trick…so bear with me Winking smile

WITH
    /* Calculate the customer age as of the 1/1/2007 */
    MEMBER [Measures].[CustomerAge] AS
        DateDiff(
             "yyyy"
            ,[Customer].[Customer].CurrentMember.Properties("Birth Date", TYPED)
            ,Cdate("1/1/2007")
        )
    /* Set of Customers between 18 and 35 yo as of 1/1/2007 */
    SET Customers18to35 AS 
        Filter(
             [Customer].[Customer].[Customer]
            ,[Measures].[CustomerAge] >= 18 AND
             [Measures].[CustomerAge] <= 35
        )
    /* Set of Customers18-35 who purchased a bike in 2007 */
    SET CustomerBikePurchase AS
        NonEmpty(
             (Customers18to35,[Date].[Calendar].[Date])
            ,(
                 [Product].[Product Categories].[Category].&[1] // Bike Category
                ,[Date].[Calendar].[Calendar Year].&[2007]
                ,[Measures].[Internet Sales Amount]
             )
        )
    SET CustomerHelmetPurchase AS
        NonEmpty(
             (Customers18to35,[Date].[Calendar].[Date])
            ,(
                 [Product].[Product Categories].[Subcategory].&[31] // Helmet Subcategory
                ,[Date].[Calendar].[Date].&[20070101]:[Date].[Calendar].[Date].&[20071231].Lead(60)
                ,[Measures].[Internet Sales Amount]
             )
        )
    SET CustomerTirePurchase AS
        NonEmpty(
             (Customers18to35,[Date].[Calendar].[Date])
            ,(
                 [Product].[Product Categories].[Subcategory].&[37] // Tires and Tube Subcategory
                ,[Date].[Calendar].[Date].&[20070101]:[Date].[Calendar].[Date].&[20071231].Lead(90)
                ,[Measures].[Internet Sales Amount]
             )
        )

    SET CustomerBikeHelmetPurchase AS
        GENERATE(
             CustomerBikePurchase AS alias_Bike
            ,Filter(
                 CustomerHelmetPurchase AS alias_Helmet
                ,    /* Customer = Customer */
                    alias_Helmet.Current.Item(0).MemberValue 
                        = alias_Bike.Current.Item(0).MemberValue 
                AND
                    /* Date of Helmet <= 60 after date of bike */
                    DateDiff(
                         "d"
                        ,alias_Bike.Current.Item(1).MemberValue
                        ,alias_Helmet.Current.Item(1).MemberValue
                    ) >= 1
                    AND
                    DateDiff(
                         "d"
                        ,alias_Bike.Current.Item(1).MemberValue
                        ,alias_Helmet.Current.Item(1).MemberValue
                    ) <= 60
             )
        )
    SET CustomerBikeTirePurchase AS
        GENERATE(
             CustomerBikePurchase AS alias_Bike
            ,Filter(
                 CustomerTirePurchase AS alias_Tire
                ,    /* Customer = Customer */
                    alias_Tire.Current.Item(0).MemberValue 
                        = alias_Bike.Current.Item(0).MemberValue 
                AND
                    /* Date of Tire <= 90 after date of bike */
                    DateDiff(
                         "d"
                        ,alias_Bike.Current.Item(1).MemberValue
                        ,alias_Tire.Current.Item(1).MemberValue
                    ) >= 1
                    AND
                    DateDiff(
                         "d"
                        ,alias_Bike.Current.Item(1).MemberValue
                        ,alias_Tire.Current.Item(1).MemberValue
                    ) <= 90
             )
        )

    /*    Customers between 18-35 who purchased a helmet within 60 days after a bike purchase */
    SET DenominatorSet AS Extract(CustomerBikeHelmetPurchase,[Customer].[Customer])

    /*    Customers between 18-35 who purchased a helmet within 60 days after a bike purchase 
        and who purchased a tire within 90 days after bike purchase
    */
    SET NumeratorSet AS 
        Intersect(
             DenominatorSet
            ,Extract(CustomerBikeTirePurchase,[Customer].[Customer])
        )

    MEMBER [Measures].[Denominator] AS Count(DenominatorSet)

    MEMBER [Measures].[Numerator] AS Count(NumeratorSet)

    MEMBER [Measures].[Performance] AS
        IIF(
             [Measures].[Denominator] = 0
            ,Null
            ,[Measures].[Numerator]/[Measures].[Denominator]
        )
    ,FORMAT_STRING = "percent"

    MEMBER [Measures].[Missing Customers] AS
        Generate(
             Except(DenominatorSet,NumeratorSet)
            ,[Customer].[Customer].CurrentMember.Name
            ,", "
        )

SELECT    {
             [Measures].[Denominator]
            ,[Measures].[Numerator]
            ,[Measures].[Performance]
            ,[Measures].[Missing Customers]
        } ON 0
FROM    [Adventure Works]

The query above calculates a metric defined as follows:

Denominator: Number of customers who bought a bike in 2007 and then bought a helmet within 60 days after the initial bike purchase.

Numerator: Number of customers from the denominator who also purchased a tire within 90 days after the initial bike purchase.

Performance: [Numerator] / [Denominator]

Missing Customers: List of customers from [Denominator] who did not qualify for the [Numerator]

Running it against the AdventureWorksDW2012 dataset returns the following:

image

As you can see in the result set above, there are 5 customers who purchased a helmet within 60 days after purchasing a bike in 2007. Of those 5 customers, 2 also purchased a tire within 90 days of the bike purchase.

That’s great, but there’s just one problem…the performance SUCKS!

Below are the stats from MDXStudio showing that it took the query ~42 seconds to execute which is completely unacceptable…

Cold Cache

image

Warm Cache

image

The major bottleneck in this query are the 2 sets shown below which are defined using a combination of the Generate and Filter functions…or what I like to refer to as the “double-iterator”:

SET CustomerBikeHelmetPurchase AS
    GENERATE(
         CustomerBikePurchase AS alias_Bike
        ,Filter(
             CustomerHelmetPurchase AS alias_Helmet
            ,    /* Customer = Customer */
                alias_Helmet.Current.Item(0).MemberValue 
                    = alias_Bike.Current.Item(0).MemberValue 
            AND
                /* Date of Helmet <= 60 after date of bike */
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).MemberValue
                    ,alias_Helmet.Current.Item(1).MemberValue
                ) >= 1
                AND
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).MemberValue
                    ,alias_Helmet.Current.Item(1).MemberValue
                ) <= 60
         )
    )
SET CustomerBikeTirePurchase AS
    GENERATE(
         CustomerBikePurchase AS alias_Bike
        ,Filter(
             CustomerTirePurchase AS alias_Tire
            ,    /* Customer = Customer */
                alias_Tire.Current.Item(0).MemberValue 
                    = alias_Bike.Current.Item(0).MemberValue 
            AND
                /* Date of Tire <= 90 after date of bike */
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).MemberValue
                    ,alias_Tire.Current.Item(1).MemberValue
                ) >= 1
                AND
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).MemberValue
                    ,alias_Tire.Current.Item(1).MemberValue
                ) <= 90
         )
    )

For those who might not be familiar, Generate and Filter are iterator functions (check it) so this is essentially a double loop construct which is a terrible design pattern for set-based tools like MDX. That said, given the complexity of the requirements for this particular project, there were a instances where it could not be avoided.

CustomerBikeHelmetPurchase – this set is constructed by looping through the entire set of (<customer>,<date of bike purchase>) tuples in [CustomerBikePurchase]. Then for each tuple in [CustomerBikePurchase], we are looping through the entire set of (<customer>,<date of helmet purchase>) tuples in [CustomerHelmetPurchase] and returning only the tuples where the <date of the helmet purchase> occurs between 1 and 60 days after the <date of the bike purchase> for the customer.

CustomerBikeTirePurchase – this set is constructed in the same manner as CustomerBikeHelmetPurchase described above…substituting [CustomerTirePurchase] for [CustomerHelmetPurchase] and the acceptable date range is from 1 to 90 days after the date of the bike purchase.

Tip 1: Use Member_Value instead of MemberValue

Now, what if we simply change the code for CustomerBikeHelmetPurchase and CustomerBikeTirePurchase to use Member_Value instead of MemberValue? Seems, simple enough, let’s give it a shot…

SET CustomerBikeHelmetPurchase AS
    GENERATE(
         CustomerBikePurchase AS alias_Bike
        ,Filter(
             CustomerHelmetPurchase AS alias_Helmet
            ,    /* Customer = Customer */
                alias_Helmet.Current.Item(0).Member_Value
                    = alias_Bike.Current.Item(0).Member_Value
            AND
                /* Date of Helmet <= 60 after date of bike */
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).Member_Value
                    ,alias_Helmet.Current.Item(1).Member_Value
                ) >= 1
                AND
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).Member_Value
                    ,alias_Helmet.Current.Item(1).Member_Value
                ) <= 60
         )
    )
SET CustomerBikeTirePurchase AS
    GENERATE(
         CustomerBikePurchase AS alias_Bike
        ,Filter(
             CustomerTirePurchase AS alias_Tire
            ,    /* Customer = Customer */
                alias_Tire.Current.Item(0).Member_Value
                    = alias_Bike.Current.Item(0).Member_Value
            AND
                /* Date of Tire <= 90 after date of bike */
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).Member_Value
                    ,alias_Tire.Current.Item(1).Member_Value
                ) >= 1
                AND
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).Member_Value
                    ,alias_Tire.Current.Item(1).Member_Value
                ) <= 90
         )
    )

Cold Cache

image

Warm Cache

image

…and BOOM goes the dynamite! Down to ~22-23 seconds from ~42 seconds…almost a 50% performance improvement!

This one was new to me and recommended by one of the architects on the project, Hans Esquivel (b | t). Even days later, I still haven’t really been able to fully wrap my head around the reasons why this improves performance…so if you know the answer or just have a theory, please be sure to leave a comment.

Tip 2: Compare Members Directly

In general, accessing member properties is expensive…therefore, we should avoid it whenever possible.

Below is another version of the “double-iterator” sets where we’re now comparing members directly for the customer-portion of the comparison criteria of the filter-function. Note the use of the “IS” comparison operator (as opposed to “=”) which is required when comparing members directly.

SET CustomerBikeHelmetPurchase AS
    GENERATE(
         CustomerBikePurchase AS alias_Bike
        ,Filter(
             CustomerHelmetPurchase AS alias_Helmet
            ,    /* Customer = Customer */
                alias_Helmet.Current.Item(0) IS alias_Bike.Current.Item(0)
            AND
                /* Date of Helmet <= 60 after date of bike */
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).Member_Value
                    ,alias_Helmet.Current.Item(1).Member_Value
                ) >= 1
                AND
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).Member_Value
                    ,alias_Helmet.Current.Item(1).Member_Value
                ) <= 60
         )
    )
SET CustomerBikeTirePurchase AS
    GENERATE(
         CustomerBikePurchase AS alias_Bike
        ,Filter(
             CustomerTirePurchase AS alias_Tire
            ,    /* Customer = Customer */
                alias_Tire.Current.Item(0) IS alias_Bike.Current.Item(0)
            AND
                /* Date of Tire <= 90 after date of bike */
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).Member_Value
                    ,alias_Tire.Current.Item(1).Member_Value
                ) >= 1
                AND
                DateDiff(
                     "d"
                    ,alias_Bike.Current.Item(1).Member_Value
                    ,alias_Tire.Current.Item(1).Member_Value
                ) <= 90
         )
    )

Cold Cache

image

Warm Cache

image

Another solid performance boost! We’ve now shaved off > 85% of the query duration by tweaking the use of member properties.

Wrap Up

Hopefully, this post has provided a brief glimpse into into the dark-art of MDX performance tuning. The key take-aways are:

  1. the “double-iterator” pattern is terrible and should be avoided at all costs
  2. avoid accessing member properties if at all possible
  3. use Member_Value (instead of MemberValue)

Despite what appears to be identical functionality, Member_Value outperforms MemberValue by a non-negligible factor and despite some fairly deep tracing (i.e. xEvent, PerfMon, ProcMon, and DiskMon) I’m still unable to understand the mechanism that explains the performance difference. So if you know, please leave a comment.

 

For what it’s worth, below is the same query rewritten in a way that avoids the “double-iterator” pattern. Unfortunately, a similar approach was not possible for the client’s query.

WITH
    MEMBER [Measures].[CustomerAge] AS
        DateDiff(
             "yyyy"
            ,[Customer].[Customer].CurrentMember.Properties("Birth Date", TYPED)
            ,Cdate("1/1/2007")
        )
    SET Customers18to35 AS 
        Filter(
             [Customer].[Customer].[Customer]
            ,[Measures].[CustomerAge] >= 18 AND
             [Measures].[CustomerAge] <= 35
        )
    SET CustomerBikePurchase AS
        NonEmpty(
             (Customers18to35,[Date].[Calendar].[Date])
            ,(
                 [Product].[Product Categories].[Category].&[1] // Bike Category
                ,[Date].[Calendar].[Calendar Year].&[2007]
                ,[Measures].[Internet Sales Amount]
             )
        )

    MEMBER [Measures].[SalesAmountHelmet] AS 
        SUM(
            (
                 [Date].[Calendar].CurrentMember.Lead(1):[Date].[Calendar].CurrentMember.Lead(60)
                ,[Product].[Product Categories].[Subcategory].&[31] // Helmet Subcategory
            )
            ,[Measures].[Internet Sales Amount]
        )
    SET CustomerHelmetPurchase AS
        Filter(
             CustomerBikePurchase
            ,[Measures].[SalesAmountHelmet] > 0
        )            
    MEMBER [Measures].[SalesAmountTire] AS 
        SUM(
            (
                 [Date].[Calendar].CurrentMember.Lead(1):[Date].[Calendar].CurrentMember.Lead(90)
                ,[Product].[Product Categories].[Subcategory].&[37] // Helmet Subcategory
            )
            ,[Measures].[Internet Sales Amount]
        )
    SET CustomerTirePurchase AS
        Filter(
             CustomerBikePurchase
            ,[Measures].[SalesAmountTire] > 0
        )

    /*    Customers between 18-35 who purchased a helmet within 60 days after a bike purchase */
    SET DenominatorSet AS CustomerHelmetPurchase

    /*    Customers between 18-35 who purchased a helmet within 60 days after a bike purchase 
        and who purchased a tire within 90 days after bike purchase
    */
    SET NumeratorSet AS 
        Intersect(
             DenominatorSet
            ,CustomerTirePurchase
        )

    MEMBER [Measures].[Denominator] AS Count(DenominatorSet)

    MEMBER [Measures].[Numerator] AS Count(NumeratorSet)

    MEMBER [Measures].[Performance] AS
        IIF(
             [Measures].[Denominator] = 0
            ,Null
            ,[Measures].[Numerator]/[Measures].[Denominator]
        )
    ,FORMAT_STRING = "percent"

    MEMBER [Measures].[Missing Customers] AS
        Generate(
             Except(DenominatorSet,NumeratorSet)
            ,[Customer].[Customer].CurrentMember.Name
            ,", "
        )

SELECT    {
             [Measures].[Denominator]
            ,[Measures].[Numerator]
            ,[Measures].[Performance]
            ,[Measures].[Missing Customers]
        } ON 0
FROM    [Adventure Works]

Cold Cache

image

Warm Cache

image

3 replies on “MDX: MemberValue vs Member_Value”

Leave a comment