This weekend, on the way home from a nice few days at the beach with family, I was listening to a SQLBitsX session (A Lap Around the SSIS Catalog) by Jamie Thomson (b | t) and it got me thinking about some work I did at my last client and how I would do that work differently in the future.
The work I’m referring to involved some SSRS reports created to provide insight into the execution history of the nightly ETL process. This client was using the new SSIS 2012 project deployment model and the DW loading pattern was fairly straightforward – each dimension and fact table contained in a separate package with everything being driven from a single master package. I immediately suggested taking a quick look at the SSIS Reporting Pack to see if that would meet our needs (why reinvent the wheel, right?). Unfortunately, this client had a different vision and asked for something a bit more simple and concise.
After a few iterations, we ended up with a drill-through report pack comprised of 4 levels:
- Project-Level: displays a chart with bars for each of the last 50 executions ordered by execution date/time and lines marking the Max, Min, and Avg execution duration.
- Package-Level: displays a chart with bars for each (child) package in the project ordered by start time and lines marking the Max, Min, and Avg package duration.
- Task-Level: same as the Package-Level, but for tasks in the control flow.
- Component-Level: same as the Task-Level, but for components in the data flow(s).
Each report is basically just a bar chart at various levels of execution. And in each chart, a drill-through action is defined on the bars which takes the viewer down to the next level of reporting. So if the viewer is looking at the Project-Level report showing the last 50 executions of the selected project, he/she can click on a bar (representing a single execution of the project/nightly-load) and drilldown into the next level showing the execution of the various child packages in the project. And from the Package-Level report, the viewer can click on one of the bars representing a package and drilldown into the Task-Level report to see the duration of each task in that particular package’s control flow. You get the idea…
Here’s a screenshot to help paint the picture of what it looked like in development:
There were a few other bells and whistles…
- bars were color coded based on the status of the object in scope (project/package/task/component)
- custom expression-based tooltip messages displaying detailed information when the viewer hovers the mouse over a particular bar
- dynamic connection strings via report parameters allowing the viewer to choose the environment (Prod, Test, Dev, etc) against which to report
Overall, I was satisfied with the end result…so was the client. But there was one item (actually 2 items ***) that bothered me a bit. See, in order for there to be any information available in the SSISDB database for the Component-Level report, the execution logging level needed to be set to verbose. The problem with that is that there is considerable overhead for logging all those events included in the verbose level which slows everything down…so you don’t want to run verbose logging in production by default (its more intended for troubleshooting). So the Component-Level report will hardly ever contain any information.
The client and I discussed this issue and, for the sake of time, we decided to simply display a detailed message to the user explaining why the Component-Level report is blank when there’s no data to display and move on.
What I’d Do Differently
I really think it’s important to always have certain stats available from the data flow of DW ETL loads…the text-book example is row count stats for fact and dimension table loads. And since I’m obviously not going to recommend running the ETL process using the verbose logging-level, the only option left is to extend the current capabilities of the SSIS 2012 catalog with some custom logging tasks and tables.
Note: this shouldn’t be too big of a task for anyone who developed a DW ETL solution in SSIS 2005/2008…back then, the built-in logging was total crap and so a custom framework was really the only way to go.
There are plenty of ways to extend the current logging capabilities, but in this case where we are simply interested in capturing the row counts of the data flow tasks, we can use the following design pattern:
- Create a new table to hold the results. The decision of whether to put this table in the SSISDB database is up to you…i personally think it’s cleaner (but it may cause issues in future upgrades). In the example below, I’ve created a separate schema called custom for custom tables.
IF OBJECT_ID(‘custom.execution_dataflow_stats’) IS NOT NULL
DROP TABLE custom.execution_dataflow_stats;
CREATE TABLE custom.execution_dataflow_stats (
id INT IDENTITY(1,1) NOT NULL,
execution_id BIGINT NOT NULL,
executable_guid NVARCHAR(38) NOT NULL,
rows_processed INT NOT NULL,
–rows_inserted INT NOT NULL,
–rows_updated INT NOT NULL,
–rows_deleted INT NOT NULL,
–start_time DATETIMEOFFSET(7) NOT NULL,
–end_time DATETIMEOFFSET(7) NOT NULL,CONSTRAINT PK_custom_execution_dataflow_stats PRIMARY KEY CLUSTERED ([id])
[/sql]The key fields in this table are execution_id and executable_guid which give you the ability to hook this data back into the SSISDB catalog views. Also note that you can customize this table (and the steps that follow) to accommodate additional information depending on your loading pattern…hence the commented out fields in the create-table-statement above.
- Update the package design to include a RowCount component in the dataflow to capture the number of rows processed:
Add an event handler to the data flow task for the PostExecute event to write the rows processed to the custom table we created in step 1. This is the method used in a lot of the custom logging frameworks developed for SSIS 2005/2008 and you can read more about event handlers here.
Once everything is in place, you can execute the package and check the custom log table to make sure everything is working correctly. One thing to keep in mind is that when you execute the package via SSDT, the ServerExecutionId will be zero because nothing is getting logged in the SSISDB since the project hasn’t been deployed yet.
Once the package is deployed, you’ll see a valid execution_id value which you can use (along with the executable_guid) to tie this info back into your SSIS Logging reports. In the report pack created for the client, I think it makes the most sense to bubble up the row-count information via the hover-over-tooltip for the bars in the Task-Level report (for data-flow tasks).
Bottom line, the built-in logging capabilities in SSIS 2012 are wayyyyyy better than previous versions…they save developers a TON of time but they aren’t quite perfect…yet. In fact, one question that came up at the end of Jamie Thomson’s SQLBitsX session (reference in the intro of this post: A Lap Around the SSIS Catalog) was whether or not it is possible to create custom logging levels. The answer is still unfortunately “No“, but I think that would be an outstanding feature/functionality to be add in a future release.
When I get some free time I’ll update the reports (discussed in this post) to include the custom log table and share it on this blog. Until then, I hope this has given you some food for thought and a path forward.
*** the other item I wasn’t particularly enamored with in this solution was the visualization used for the Package-Level, Task-Level, and Component-Level reports. Since these items can potentially run in parallel (especially true for the Package-Level report)…I think it makes more sense to display the information in a Gantt chart. The SSIS Reporting pack (referenced at the beginning of this post) does include a Gantt-Chart visualization created via data bars in one of the drill-down reports…but this particular client wanted to keep it simple.
Update 7/10/2013: You can download a copy of the SSIS Reporting solution here…including a script for creating the custom database objects. Here’s a quick screenshot showing the addition of the row-count information in the hover-over text of the task-level report: