In my last post (Go Work for a Consulting Firm), I discussed a difficult situation faced earlier in my career…back when I lacked the knowledge and expertise to solve the problems facing the team and ultimately steer the car back on the road. Towards the end of that post I mentioned how, after working for an awesome consulting firm (RDA Corporation) and continuing to expand my knowledge base, I was now in a position both knowledge-wise and experience-wise to solve those problems from my past. So in this post, I’m going to go into a bit more detail on the problems we faced and then layout a proposed path to get the group back on track.
- Duplication of data/effort. Despite our best efforts to reuse ETL packages and database tables, there was still a ton of duplication in our development. On the technical side this simply lead to wasted storage and processing resources on the database. On the business side, the ramifications were much worse…wasted developer hours and multiple versions of the truth – the latter being one of those issues that festers unnoticed until the it finally presents itself during a important conference call or presentation with senior management and results in a lot of confusion, doubt, and wasted developer hours as each version of the truth is tracked back to its source and resolved in a way that satisfies all parties involved.
- Lack of experience. No offense should be taken – I’m calling myself out as well – but as a team our technical skill-level left a lot to be desired. Below are a few of the common and most egregious mistakes in our work:
- improper use of data types -> wasted space, bad query performance
- lack of FK-constraints -> bugs/data quality issues (orphan records)
- poor data modeling -> bugs/data quality issues/wasted space
- Lack of source control. Again, like the multiple versions of the truth issue above, this one doesn’t really become an issue until it becomes a major issue…for example when a developer’s hard drive crashes and a few weeks of work is lost in an instant. It also made team development next to impossible…which is especially bad when you are all developing against the same *production* database…bringing me to the next issue.
- No development environment. In the world of database development, not having a separate development environment (and standard deployment process) is like driving drunk, with a blind fold on, doing donuts in the police station parking lot and thinking you’re not going to get caught. This led to a lot of bugs…tons of bugs…and had we not been developing database objects specific to each reporting solution (definitely not a good strategy – but did stem the bleeding on occasion) it would have been worse.
- No data archival plan/No database monitoring. I still remember the first time our database filled up…turned out no one had been monitoring the hard drive space. However, the part that really stuck out was the incredibly difficult time we had freeing up space and getting everything moving again. Not minutes, not hours, but DAYS! We archived some of the larger tables onto an external hard drive and then proceeded to delete older data from these tables until we reached a point were processing could resume. Despite that amount of pain (and we’re talking serious heat as month-end approaches and the execs want their reports/dashboards/scorecards to see how the numbers are shaking out) the database continued to fill up a few more times before I left…no monitoring.
I shudder just thinking about those days…it was a mine field. But sometimes learning from failure is the best way to learn.
A Path Forward
It is, without a doubt, much more difficult to turn a failing development environment around due to the amount of technical debt and poor developer habits that have built up over time. Changing bad habits is hard and takes time. And the re-factoring (and/or rewrites) needed to pay down technical debt from the past are not only difficult from a technical-standpoint, but they’re also a hard sell to the suits who sign the paychecks because there’s no immediate value. That’s why the first step is to start out on the right foot from the very beginning.
- Start out with a plan. At this point, it doesn’t have to include the nitty-gritty details of source system extracts or database tables, columns and data types…but you need to have an idea of what it is you are trying to accomplish, why you are trying to accomplish it, and the major steps/milestones in the process. If you lack the skillset and expertise to come up with this on your own (especially the major steps/milestones piece), one good place to start are the following presentations by James Serra (b | t) titled: Building an Effective Data Warehouse Architecture and Best Practices to Deliver BI Solutions.
- Hire an experienced DW/BI architect. If you already have one in house, great…use them (as long as they’re capable but recognize that sometimes outside perspective is a very good thing). On the other hand, as is typically the case with smaller-medium sized companies, you’ll have to hire one. The sticker price can be eye-popping, but a good architect is worth his/her weight in gold – so don’t skimp here. Plus you will only need this person for short bursts of time: getting the program started, periodic check ups etc.
- Agile Development Methodology. Everyone, and I mean everyone on the team (including the managers) needs to attend agile training. The benefits of this development methodology, especially for DW/BI development – cannot be overstated enough. It will help your development team deliver value quickly and frequently to the business…which in turn becomes the best argument when trying to convince the business for further investment…its a positive feedback loop. It also reduces risk: instead of gathering requirements from the business and then disappearing for the next 6-12 months only to return with a finished product that is probably not what the product owners had envisioned….they (the business) will get a taste/demo of the product in small chunks early-on and regularly throughout the project – providing everyone involved with many opportunities for course-correcting if something doesn’t look right.
Note: If necessary, consider hiring an experienced project manager fully versed in the agile methodology…keeping developers on track is like herding cats.
- Hire 1-2 senior developers. Yeah, you already have a development team, but how experienced are they and have they kept up with best-practices? If your team is like the one I was on at the time – where none of us were true experts in any of the technologies we used (ex. DBEngine, SSIS, SSAS, etc) – then you need to strongly consider hiring a couple of senior-level developers and/or sending your current developer team to training (preferably both if you have the budget). These outside developers will work along side your existing team to make sure the technical standards and strategy (formulated by the DW/BI Architect) are being followed. They’ll also help to make sure best-practices are being followed during code reviews (ex. FK-constraints, correct use of data types) and can provide supplementary training along the way (ex. lunch and learns).
- Work off of a Bus Matrix. Regardless of whether you are developing a normalized warehouse (Inmon) or a denormalized warehouse (Kimball) you must have a cohesive development strategy – which is where the Bus Matrix comes in. This architectural document should be your North Star. It will not only help prevent a few of the issues our team faced at the time (ex. duplication of data, multiple versions of the truth, etc) but it will guide your team development efforts – allowing for prioritization of effort and parallel’ization (yep – you just learned a new word) of developer resources. Don’t forget to update the Bus Matrix as your DW/BI program progresses and expands – which it definitely will when done right and the business starts reaping fruits of this labor.
Note: don’t confuse “cohesive development strategy” with “tightly coupled code”…you still want your code (ex. ETL packages) to be loosely coupled…if you don’t know the difference, check with the senior developer(s) you hired back in step 4.
- Have a Development Environment. Before you put in the request for your production database server, go ahead and double that order. Having a separate development environment will save you a tremendous amount of heartache and pain (and grey-hairs). It provides a place for developers to test new code and changes to existing code as they iterate, refactor, and ultimately work their way through the Bus Matrix. If wallets are tight – you can technically make due with a less powerful development server – but be prepared to deal with the nuances that accompany this scenario (ex. performance and load testing/tuning). On the other hand, if your company is flush with cash and you want to do it right, you can go ahead and add several additional servers for QA and Integration Testing – each of which should be refreshed by restoring a from a production backup after each release.
Note: As DW’s are eventually moved to the cloud, and resources (cpu, memory, drive space) become more elastic, the costs of maintaining a multiple environments in addition to production (ex. DEV, QA, TEST) will be greatly reduce because most of the cloud cost-structures are consumption-based allowing dev teams to “flex up” during load/scalability testing.
- Monitor the Database Server. This is really a no-brainer – and I’m still not clear how we managed to acquire a > $50k server (including licensing) at a fortune 500 company that wasn’t being monitored for things like drive space. There are many ways to monitor the server…including some really robust third party applications (ex. Performance Advisor from SQL Sentry). So don’t be caught with your pants around your ankles. If the budget is tight, you must at the very least setup some custom monitoring scripts that check the basics like CPU/Memory/HD usage – schedule them to run nightly and check the output each morning.
- Establish a Plan for Archiving Data. Again this is something the DW/BI architect will help with – estimating the size of the data warehouse prior to acquiring the hardware. However, it’s worth a separate bullet point because this plan needs to be in place before it’s required. This can be as simple as retaining backups and periodically deleting data past a certain windows (ex. older than 5 yrs)…or it can be as complicated as implementing a partitioning strategy for large tables and routinely swapping out historical partitions to cheaper storage where the data remains accessible but at a lower cost.
- Use Source Control Effectively. There is no question that you need to be using source control to mitigate the risks of developer hardware failure – but you also need to use source control effectively…especially in team environments. There are plenty of source control systems to choose from and each have their pros and cons, so be sure and ask that expensive DW/BI Architect for a recommendation tailored to your environment. If you’re a Microsoft shop developing, I highly recommend using TFS.
- Testing and Continuous Integration. While database development isn’t a perfect fit for Test-Driven Development (TDD) – your developers should still be creating unit test for all functionality (ex. stored procs, SSIS packages, etc). Additionally, and especially in team development scenarios, you should consider leveraging continuous integration to prevent developers from “breaking the build“.
For someone new to *successful* DW\BI development, this list may look like a pretty steep investment…hiring outside Architects and Senior Developers, team training, multiple servers, testing code – and we haven’t even gotten to the front end (cubes, reporting layer, etc) though most of the list still applies. Truth be told doing everything at once is a steep investment…but that is the “perfect world” scenario.
Luckily you had the foresight to hire that experienced DW/BI Architect who helped you formulate an incremental plan, poised for growth, that also fits within your current/limited budget. Now, when the business starts to realize the value of your efforts (which shouldn’t take long – see: agile methodology) and finally starts to open up those purse strings, you won’t find yourself…(in the crude words of someone I respect greatly)… up-shits-creek-in-a-barbed-wire-canoe-f%#ked…which is where a team I was on once found ourselves a few years ago.