Just a few days ago, a case study was released by one of the top Microsoft BI folks in the world – Alberto Ferrari (b | t) – outlining the evaluation process of using SSAS Tabular in a large-scale commercial solution.
This paper is a must-read for those considering the use of SSAS Tabular in a production solution as well as those who have already attempted to implement SSAS tabular (but failed for one reason or another). This paper will walk you through the complete process – discussing some of the design and configuration choices that should be considered in order to successfully implement this technology.
Download it from SQLBI.com
Below is a quick of the notes I made while reading through through the whitepaper:
- Distinct Count Performance
The challenges with implementing metrics based on distinct count are fairly well known and documented – which is why Alberto doesn’t waste much time on the topic. Using an in-memory technology such as SSAS-tabular is one of the very few ways to ensure outstanding performance with distinct counts as data sizes grow – and arguably one of the only ways to ensure outstanding performance for distinct count metrics in ad-hoc reporting/analysis.
While the exact algorithm remains proprietary, Alberto did an excellent job of describing how compression appears to have been implemented with SSAS-tabular along with some of the “levers” that can be used to tweak compression ratio and query performance. The use of diagrams really helps convey the various stages of the process from dictionary encoding to data segment compression (the latter being most likely based on a form of RLE – though the sorting mechnaism (if there is one (which there likely is)) remains as a primary ingredient in the secret sauce.
There’s really no way to to avoid addressing the issues associated with NUMA when dealing with a large dataset in the context of a SSAS Tabular solution until. The bottom line is that SSAS Tabular is not NUMA-aware and there’s a performance penalty. Alberto does an outstanding job of explaining NUMA (what it is, how it works), the implications of running non-NUMA-aware software (SSAS Tabular) on NUMA-architecture, and the goes on to describe a few options to address these issues including some performance benchmarking.
- Partitioning & Multitenancy
With any large dataset, partitioning becomes extremely important from a basic data management perspective. However, as Alberto explains in the case study, partitioning also becomes a major contributor to processing performance (which is decidedly important for NRT (near real-time) systems such as the one he was building) and query performance (as shown during NUMA architecture option benchmarking).To further address scalability and performance of the system, the final solution took advantage of the multitenant nature of the dataset and broke everything up into multiple Tabular models. This had an amazing impact on performance processing. However, one very important point to keep in mind is the **instance-wide** lock required during processing of a single tabular database which can block queries across all databases on the instance.
- Pushing Complexity to ETL
There was a small section in the case study dedicated to addressing some of the complexity in the queries/calculated measures. Alberto does a great job showing an example of such a measure and the solution of pushing the complexity down into the ETL process which can often be overlooked by model designers who focus too narrowly on the scope of design and are unable to zoom out and see all of the levers at their disposal.
All in all, this was a very well written case study and a true testament to the power of SSAS Tabular as a viable option for large scale solutions – something that is a breath of fresh air after quite a few discussions with peers who have also run into seemingly insurmountable walls while trying to leverage this new technology in production solutions for clients over the past 2 years.
It outlines, in solid detail, the considerations and impact analysis of various design/configuration choices needed to successfully implement a large-scale near real-time analytics solution based on SSAS Tabular.
Thank you Alberto for this contribution to the community!