Recently, while debugging the logic involving a derived column component in an SSIS data flow, I learned something new and thought I’d share 🙂
Before we dive into the details, let’s quickly review a few of the common methods for troubleshooting issues in an SSIS data flow:
- using error output paths that write to flat files
- using data viewers
- using data taps (new in SSIS 2012)
The approach typically differs depending on the stage of development as well as the issue leading to the need for troubleshooting.
For example, if the package is responsible for loading a fact table and has already been deployed out into production and the problem is that a larger (than normal) number of records are coming through and not finding a matching record in one of the dimension key lookup transform, then I’ll probably use a data tap (assuming SSIS 2012 project deployment) to get a few snapshots of data as it goes in and comes out of the lookup component.
On the other hand, and as was the case leading to this recent discovery, the package was still in the early stages of development (so I was executing it against DEV data via SSDT) and the issue was that one of the dimension key lookups was pumping everything out of the “No Match Found” path…so I decided to use a simple data viewer to capture the buffers flowing through the data flow in realtime.
After enabling the data viewer, it was immediately clear that there was a problem with the logic in one of the upstream derived column transformations. In this case, the lookup value in the source data (used to retrieve the dimension key) was being derived from the actual facts…basically a banding scenario being performed at the ETL layer instead of on the fly during reporting for performance reasons.
After identifying the issue, the next step was to correct the logic in the derived column transform. However, instead of updating the current logic for the existing derived column, I simply added an additional derived column with new logic so that I could see them side by side in the data viewer. I then executed the package again and as soon as the first data buffer hit the data viewer, the window popped up, but my new derived column was no where to be found.
What this told me was that, despite being tightly coupled with the data flow path object, the data viewer object does not automatically update the displayed columns list to include newly added columns in the data flow. Nope…to get the new columns included in the data flow, you have to go in and edit the path object (line between 2 components) and move the new column over to the displayed columns list (see below):
So now you know, and…