During the last half of 2012, I had the opportunity to demo a few of the Microsoft self-service BI tools to a number of audiences. Unfortunately, due to time constraints, we weren’t able to cover PowerPivot in as much depth as it deserves. But, because of it’s sheer awesomeness and importance in the Microsoft BI story moving forward, I made a point to allocate a few minutes to at least run through the PowerPivot highlight reel:
- Mash up data from many different sources
- Work with a lot of data (100’s of millions of records)
- Scheduled Data Refresh
- Management Dashboard
- Source for Power View reports
…and just like clock work, the questions regarding the scheduled data refresh feature come rolling in. This behavior is not entirely unexpected: scheduled data refresh kicks serious ass from a self-service BI perspective, members of the audience realize this, hence the interest and questions.
That said, I did notice a consistent bit of confusion surrounding the differences between this scheduled data refresh feature (of which I speak so highly) versus the refresh selected/all connections option they saw in one of the drop down menus when the PowerPivot file was rendered in the browser via Excel Services.
Perhaps this confusion speaks to my inability to explain things to others…or maybe it just suggests that the “refresh selected/all connections” drop down option available to users viewing PowerPivot files via Excel Services in Sharepoint 2010 is utterly useless and causing confusion (/rant). Either way, this post is my attempt to clear up any remaining confusion related to the various options for refreshing PowerPivot workbooks in a Sharepoint 2010 environment.
But first, a bit of architecture…
When a user creates a PowerPivot workbook, they typically start by creating a direct connection to a remote data source (such as a SQL Server data warehouse or Analysis Services cube) to pull some data in:
Once data is loaded into the PowerPivot model, the user goes to town adding pivot tables, charts, and all sorts of excel flare (sparklines, databars, etc) to create a useful and visually appealing report, scorecard, dashboard, etc. Afterwards, the user uploads the final product to a Sharepoint site to share with the rest of the team, department, or organization.
Note: Before PowerPivot workbooks can be published to Sharepoint, the PowerPivot for Sharepoint feature must be installed on the Sharepoint server. If your Sharepoint farm is scaled out across multiple severs then this feature must be installed on any application server that you intend to use for PowerPivot. See this link for instructions on installing the PowerPivot for Sharepoint feature.
Once a user accesses a PowerPivot workbook via Sharepoint, the PowerPivot model is extracted from the workbook and loaded into the Analysis Services instance that was installed with the PowerPivot for Sharepoint feature mentioned above. Below is a diagram of what that looks like:
Note: the blue line just shows that the connection to the remote source or sources, defined in PowerPivot when the workbook was being created, stays with the PowerPivot model.
So now the workbook and all the PivotTables and PivotCharts point to the PowerPivot model sitting in the Analysis Services for Sharepoint instance, and the PowerPivot model sitting in the Analysis Services for Sharepoint instance is pointing to the original source system. This is the the main takeaway for this section: we now have 2 pieces (workbook, model) and options for refreshing each…which we’re about to discuss.
Refreshing the PowerPivot workbook
When a PowerPivot workbook in Sharepoint is accessed by a user, Excel Services is used to display the workbook to the user. Through this interface, there are two refresh options under the data menu…
Did you read the subtext for the first option, Refresh Selected Connection?
It states that it will refresh the data connection for the currently selected PivotTable…(the second option implies it will do the same but for *all* connections for all PivotTables)…meaning when the user selects either of these options, Sharepoint will send queries to the Analysis Services for Sharepoint instance hosting the PowerPivot model and retrieve the latest from the model…from…the…model…and update the PivotTable or PivotCharts.
At no point in this process is the original source (in this case the SQL Server DW) queried for new data. People like you and I can totally wrap our heads around this, but for someone without a deeper understanding of the architecture…hah, they hardly stand a chance!
The following is an example of using the Refresh Selected Connection / Refresh All Connections options available when accessing the PowerPivot workbook via Sharepoint. The data source we are going to use is the tried and true AdventureWorksDW2012 database running on a separate server.
- Below is a query against the source which aggregates the Sales Amount by Calendar Year. A COUNT column has been included to show the number of fact records per year.
- Next, a basic PowerPivot workbook is created (based on the data source above) with a pivot table showing the sales amount by calendar year.
- Now, in order to simulate changes in the source I’ll update the sales amount of every record in the fact table to $1. Notice that the total sales amount now equals the record count.
- If we open the PowerPivot workbook via Sharepoint and click the Refresh All Connections option, then we see that the values remain unchanged.
- Below is a screenshot of a query against the PowerPivot model in the PowerPivot SSAS instance to confirm that the numbers in the PowerPivot model are still out of sync with the changes made to original source.
Refreshing the PowerPivot model
In order for changes in the original source to be reflected in the workbook (ex. new data from nightly ETL process) the data in the PowerPivot model must be updated. For our environment (SQL Server 2012, Sharepoint 2010), there are 2 options for pulling new data (or changes to existing data) into the PowerPivot model:
- Scheduled Data Refresh
The manual option is just a straight forward workflow where the user downloads a copy of the PowerPivot workbook, refreshes the workbook from their local workstation, and re-uploads the updated workbook up to Sharepoint.
The other option, scheduled Data Refresh, is an automated workflow where the user just needs to set up a schedule and Sharepoint handles the rest. See this link for detailed instructions on how to set this up.
What follows is a continuation of the previous example. But this time we will use the Scheduled Data Refresh feature to update the PowerPivot model with the changes we made earlier to the source data.
- Configure the Data Refresh schedule
- Wait for refresh to occur
- Confirm data refresh completed successfully
- Open the PowerPivot workbook
Note: Depending on the PowerPivot configuration settings in your Sharepoint farm, users may need to manually refresh the connections after opening the PowerPivot workbook in order to see the updated data.
In this post, we walked through the various options for refreshing data in PowerPivot workbooks and discussed the differences between each option. Below is a list of key take-aways:
- When a PowerPivot workbook is uploaded to Sharepoint, the model is separated from the rest of the workbook and stored in an Analysis Services instance specifically configured for a PowerPivot model workload.
- Options to refresh connection(s) while viewing the PowerPivot workbook via the browser will only update data based on what is currently stored in the PowerPivot model…it will *NOT* go back to the source system(s) and grab changes. Btw, if you know a good use-case for these options in the drop down, please let me know…
- Data in the PowerPivot model can only be updated with changes to the source(s) manually or via the Scheduled Data Refresh feature.
Now that we’ve finally cleared that up…Sharepoint 2013 is about to start rolling out and includes a new refresh option called “Interactive Refresh”. In my opinion this new refresh option will provide a more intuitive experience/functionality for the “refresh selected/all connections” drop down options. You can read more about that here.