This is the fifth post in a series of posts covering the Aggregation Design Life-Cycle (ADLC).
- Create Initial Aggregation Designs
- Remove Ineffective Aggregations
- Capture Query Workload
- Run Usage-Based Optimization (UBO) Wizard
- Create Custom Aggregations
- Final Thoughts
In the previous post, we discussed how to configure the Query Log table and generate a query workload that fully represents production activity that the cube will need to handle. The result is a special table in a relational source containing the specifics for each query…information that will be used by the Usage-Based Optimization (UBO) Wizard when designing custom aggregations. In this post, we are going to cover the process of actually running the UBO Wizard while discussing some of the considerations to keep in mind throughout the process.
UBO Wizard Overview
This was covered in the last post…but here’s overview from the MSDN page for those who missed it:
The Usage-Based Optimization Wizard is similar in output to the Aggregation Design Wizard, and is used to design aggregations for a partition. However, the Usage-Based Optimization Wizard designs aggregations based on the specific usage patterns of queries recorded in the query log of an Analysis Services instance
Running the UBO Wizard
The Usage Based Optimization (UBO) Wizard is has the same look/feel as the Aggregation Design Wizard which we used to create out initial set of aggregation designs in the first step of the ADLC. If you’re only interested in the basic 1-2-3 how-to for this wizard, then read this MSDN article. Otherwise, stick with me and I’ll show you how I approach the UBO-Wizard…
- Create a backup copy of the base aggregation design for the partition(s) for which you are about to run the UBO-Wizard. The reason you want to do this is because we will be merging the aggregations created by the UBO-Wizard into the base aggregation design at the very end of the process. Having a backup copy makes it easier to rollback if we decide we don’t like updated aggregation design.
- Select the measure group(s) for which you want to design usage-based aggregations and click the Usage-Based Optimization button shown below:
- Select the partition(s) for which you want to design usage-based aggregations and click next:
- In the Specify Query Criteria step (shown below), note the number of queries in the top half of the window. If there’s more than a few hundred queries, then use the filter options in the bottom half of the window to narrow that down a bit. Unfortunately, the stats in the top part of the window don’t dynamically update as you adjust the filters down below…so you’ll have to just eye-ball it by toggling back and forth between this screen and the next.
- This next screen, one of the most important steps in the process, provides you with a list of the all the queries that will be considered for aggregations. It also includes the number of occurrences and the average duration for each query.
If the list isn’t outrageously long, go ahead and select everything and just move on. If on the other hand the list is long and/or you know you’re working within tight constraints (ex. processing window, storage space), then you can sort the list by average duration and then by occurrence (frequency) to try and start to get a feel for the distribution of queries/SE-requests. Use the quadrant diagram below to help guide your decisions as to which queries to include and which to exclude:
Note: some folks argue that the frequency of a query (or more specifically a subcube request) isn’t necessarily indicative of a good aggregation candidate. The argument is the frequent queries/subcube-requests will likely already be in cache – which is a completely legitimate argument. However, there are a few additional factors that need to be included in the debate…so please stick around for the final post in this series where we’ll cover this debate in more detail.
- This next screen generates the distinct counts. For this step, the counts should already be populated because we used the BIDSHelper feature to update them…right?…right!
Note: As far as I’m aware there is no automated method (including BIDSHelper) for updating the Partition Count values. See the Tips/Considerations/Recommendations section at the end of this post where I share a quote from Chris Webb regarding the purpose/setting this value.
- This next screen should look familiar…we saw it when creating the initial set of aggregations with the Aggregation Design Wizard. And just like we did with the Aggregation Design Wizard, we’re going to use the Performance gain reaches option…but instead of 30%, we’ll use 100%.The reason we want to use 100% this time is because we are limited by a specific set of queries…in the Aggregation Design Wizard we were only limited by the number of possible combinations (which is a lot – even with the dimension attribute properties we set).
- At this point, the aggregations have been designed and now we just need to merge them in with the base aggregation design we created via the Aggregation Design Wizard.
- Now repeat steps 1-8 for the next measure group partition(s)
Deploy, Test, Measure
At this point, we’ve updated all our aggregation designs with new “usage-based” aggregations that should help improve the average query response time thus improving the overall user experience. Right?
In theory, yes…in practice, we still need to confirm and, if time permits, measure the improvement. There are a lot of different ways to accomplish this task. One of the ways I’m currently using to measuring average query performance over time is via recurring scheduled Perfmon tracing and analysis (more on this in a later post – still working through the kinks 😉 ).
In addition to reduced query response time, we also want to check the increase in processing duration. There’s no free lunch…even with aggregations…so be sure to measure the processing duration and make sure you are still within the scheduled window…if not, you will need to go back and make some changes.
- For large cubes where a partitioning and an incremental processing strategy has been implemented, consider creating a heavier aggregation design (one that contains more aggregations) for historical partitions that don’t require frequent reprocessing. The advice also applied to the initial run through the Aggregation Design Wizard.
- On the Specify Object Counts step of the UBO Wizard, be sure to manually update the partition count (for measure groups that have been partitioned). As far as I’m aware, BIDS Helper does not handle this automagically. This recommendation also applies to the Aggregation Design Wizard (Step 1 of the ADLC). Here’s a quote from Chris Webb (b | t) regarding this setting:
More importantly, if you’re partitioning your cube, you need to make sure that the values in the Partition Count column reflect the count of the numbers of members on each attribute that will exist in each partition: for example, if you’re partitioning by month, you will want to tell AS that there’s only one month in each partition. These counts are, I believe, used during the aggregation design process to estimate the size of an aggregation, and if you don’t specify correct values then the algorithm may incorrectly think a useful aggregation is much bigger than it actually would be and not build it as a result.
- When working with a large cube, you may want to Deploy and Test (next topic) after running through the UBO-Wizard for measure group…this will allow you to keep an eye on increases in cube size and/or processing duration. If processing duration with the new aggregations is now exceeding the designated window, we’ll need to make a few decisions about which aggregations to keep and which to get rid of. Depending on how much processing time you need to cut, it might be easier to rollback to the base aggregation design (remember that backup you made? 😉 ) and run through the UBO-Wizard again with more aggressive with the filtering options in order to produce less aggregations.
- It may seem strange that there’s no “duration” filter option on the Specify Query Criteria step of the UBO Wizard…after all, it is these long duration storage engine (SE) requests that we are targeting. To accomplish the equivalent, please take a look at the last post where we discussed the recommendation to setup a recurring job to automatically delete records from the query log table where the duration < 100ms. This will ensure that only those SE requests with >100ms durations will be fed into the UBO Wizard.