In a presentation I’ve been delivering lately, Bus Matrix – the Foundation of your Dimensional Data Model, I briefly cover the hierarchy of an enterprise and how it can be conceptually broken down and mapped into a dimensional model.
- A single enterprise typically contains many business areas (e.g. Sales, Manufacturing, Finance)
- A single business area (e.g. Sales) typically contains many business processes (e.g. Orders, Shipments, Returns)
- A single business process can be modeled by multiple fact tables
This forms the basis of the main argument in the presentation: that dimensional models (despite being much less complex than normalized models) can quickly grow and become challenging to manage – thus requiring something other than a non-business-user-friendly ER diagram to visualize the model…like a bus matrix:
One point that I feel I’ve failed to make is that the complexity of a dimensional model is primarily driven by the number of fact tables…even though, in smaller models, it’s likely the number of dimensions exceeds the number of fact tables.
Fact Tables Drive the Complexity of a Dimensional Data Model
Folks rarely have trouble grasping the idea of dimensions. These tables map directly to business “entities” (e.g. Products, Territories, Customers, etc) – a clean 1 to 1 relationship leaving little room for confusion.
Note: technically the relationship between dimension tables and business entities can be many-to-one when you take into account core-vs-custom dimension tables. This concept is thoroughly explained in the best book on dimensional modeling so I won’t regurgitate it here.
Fact tables are a completely different story. They don’t map directly to business “entities”. If anything, they map to business processes…but even that is a bit misleading. Another argument is that they map to a set of metrics…again, not entirely true. The best way I can describe fact tables is that they are grouped by business area (e.g. Sales, Manufacturing, Finance) and model business processes (e.g. Orders, Shipments, Returns) in a way to support a variety of business measurements.
Types of Fact Tables
The Kimball methodology includes 3 main types of fact tables:
Transaction – the most common type of fact table, used to model a specific business process (typically) at the most granular/atomic level.
Periodic Snapshot – used to model the status of a business process at a specific point in time on a regularly recurring interval. For example, a periodic snapshot fact table might be used to track account balances on a monthly basis. In this case, a “snapshot” of the account balance would be taken at the end of each month – which represents the net of all withdrawal and deposit transactions occurring during the month. Inventory is another common scenario that makes use of periodic snapshots for tracking quantity on hand (by item) at the end of each month. In both examples, the primary “fact” (account balance and quantity on hand) in the two tables are “semi-additive” – which simply means they can’t be aggregated over time.
Accumulating Snapshot – model events in progress for business processes (e.g. Claims Processing for an Insurance Company) that involve a predefined series of steps (e.g. claim submitted, claim reviewed, claim approved/rejected). These tables prove useful in measuring/analyzing the duration between steps in a complete process and discovering bottlenecks.
In addition to the primary fact table types (above), there are a few additional fact tables to consider:
Factless Fact Tables – used to model the events (e.g. student attendance) or conditions (e.g. product promotions). One common misconception is that these tables do not contain facts – which is only partially true. For example, if we are using a factless fact table to model “student attendance” (a textbook example of factless fact tables – pun intended!), our table may also include – in addition to the keys mapping to the student, class, and date dimensions – a single fact called “attendance_count” with the value of 1 for every row. This is not necessary, but it makes it clear what we are measuring and might even make life easier for downstream development of a cube and/or reports.
Note: a factless fact table used to model conditions is known as a coverage fact table in the Kimball methodology.
Bridge Tables – which look amazingly similar to factless fact tables, are used to associate facts with multiple members from the same dimension (multi-value dimensions) or to associate a single dimension attribute with multiple values (multi-value attribute), or even to provide a solution for modeling complex hierarchies (hierarchy-bridge). A classic example involves banking transactions (e.g. deposits, withdrawals) which are typically tracked by account. While most accounts are associated with a single customer, there is the case of “joint” accounts (or accounts associated with more than one customer). In this latter case, a bridge table is used to map the relationship between customers and accounts allowing the business to report bank transactions by customer.
Note: some will argue that a bridge table is not a fact table – debating the differences between the two will make you want to claw your eyeballs out with a dirty spoon.
Aggregate Fact Tables – essentially a transaction fact table with the difference of being that records in this table are pre-aggregated to a certain level (e.g. month vs date). The primary benefit is performance although one could also make an argument for saving disk space. These tables are rarely necessary for architectures that include a multidimensional semantic layer and/or make use of in-memory technology.
Temporal Snapshot Fact Table – this is a very elegant solution for avoiding the data-explosion that comes with periodic snapshot fact tables. There is also a high degree of complexity involved when implementing this type of solution. For more information, I recommend starting with a SQL Bits X presentation by Davide Mauri (b | t) called Temporal Snapshot Fact Table – that is where I first learned of this design. (Warning: the audio quality in this presentation is terrible so you will definitely need to follow along in the slides and demo scripts available.)
Many claim that designing a data warehouse is a blend of art and science. My opinion (that’s why you’re here right?) is that it is mostly science. Determining which type(s) of fact tables to include (driven primarily by the types of questions the business wants to be able to answer) is where those “blend-of-art-and-science” folks mistake art for experience. Then you have other folks like Davide Mauri and the implementation of the Temporal Snapshot Fact table reminding you that, yes, there is a bit of art to this profession.
For a good demonstration of how different types of fact tables can have a drastic affect on performance and/or complexity (with respect to reporting), see the following SQLBits XI presentation by Alex Whittles (b | t): Data Modeling for Analysis Services Cubes