Q&A: Agile Data Warehousing
We explore the difference “agile” makes in your data warehouse.
How does the concept of agile development apply to data warehousing, and what benefits can we expect? Where will the strongest ROI come from? To learn more about this emerging approach to data warehousing, we turned to Ralph Hughes, chief systems architect at Ceregenics, Inc. and a familiar leader at TDWI World Conference seminars.
BI This Week: Why is there so much buzz about agile data warehousing? What does it potentially have to offer?
Ralph Hughes: Agile data warehousing (ADW) uses scrum as an alternative to waterfall project planning, providing a streamlined framework for building DWBI applications that regularly delivers modules faster with one fourth the developer hours, cuts project costs in half, and drives project defect rates toward zero.
It leverages the 80/20 rule in many ways, such as allowing you to start a project after doing the 20 percent of the requirements and design that provides 80 percent of the project definition, filling in the remaining details once development is underway and everyone has a good look at what the challenge really entails.
ADW also folds 80 percent of the planning into the overall process, getting projects get off the ground faster and letting the customer see results sooner. This yields better requirements, so accuracy of DWBI’s efforts increases dramatically.
What are misconceptions about ADW? What do enterprise unrealistically expect?
At the top of the list: That Agile data warehousing is a revolution in how we build DWBI apps, not an evolution.
Some IT managers take it to mean that you can simply throw developers at a group of end users and magically get a working data mart out of the collision. I prefer to introduce ADW as a developers methodology, with a good dose of pre-project requirements management and architectural thinking still in place to guide it. You still need high- and mid-level requirements defined, data and ETL architectures sketched, as well as industry and departmental standards to draw upon.
What is the difference between data warehousing and agile data warehousing?
Scrum arose from Japanese business school experts re-thinking the software delivery to emphasize effectiveness, rather than the software engineer’s objective of technical excellence. Operations research talent was also applied to tune the software development for Goldratt’s theory of constraints.
ADW is scrum applied to warehousing. We collaborate rather than contract with the business, break projects into business-tangible increments of value, and rely on mostly self-organized teams of developers to deliver potentially-shippable code in equal time-boxes of three to four weeks.
The emphasis on business-tangible value and project effectiveness often yields some surprising innovations, such as putting data online as soon as it’s staged and then backfilling the middle architectural layers such as integration and dimensionalization as the project proceeds.
We also end up with a particular tool set that includes many more “implementation by configuration” products and utilities that can project data into compelling prototypes based upon requirements or analysis work without coding.
How do you counter the notion that ADW can't handle deep-think warehousing tasks such as architecture?
I see comments like that all the time in the blogs and believe that they must arise when either people speak from theory before they’ve tried the method or when projects go bad because teams over apply one of the scrum concepts.
For example, they may have made the entire systems engineering process incremental so they put no time into a coherent high-level design, rather than confining the iterative process to just the project room once the developers are engaged.
I’m seeing dozens of people every quarter in my TDWI classes that have used scrum on warehousing projects. Four out of five give it glowing reviews, and the rest discover something during the class about the way they ran their first project that they definitely won’t try again.
When I teach scrum for warehousing, I emphasize putting a good, engineering “wrapper” process around it so that the project starts on the right foot. In fact, we have a TDWI class on requirements management that covers how to quickly get a full-project view of the work ahead that developers can work from more effectively.
Good architecture should be borrowed from external and internal standards and built into the project before the developers race into writing ETL code.
I also emphasize scrum’s basic recommendation that teams reserve 20 percent of their bandwidth for architectural work and fix-its that the end user might never be able to appreciate, so that the resources needed for these deep-think issues will be available.
Where can ADW provide greatest ROI? What would be the first areas an enterprise should look at?
In our TDWI requirements management class, we present a five-layered hierarchy of requirements that suggests one way to plan early agile warehousing programs.
The hierarchy starts with simple data access (because sometimes that is 50 percent of what end users need out of a DWBI project), and then proceeds upwards to reporting and KPIs, researching numbers of interest on a report, true analysis, and finally predictive analytics.
Finding projects that require a progression up this hierarchy is a great way to start ADW with high ROI. Try it first on a project that’s mostly data access. Then repeat for a project that is mostly reporting. Then move ahead to a project that requires a boatload of complex business rules and allocations.
In our practice, we’ve seen that there’s a two-way learning process that must occur for ADW to flourish: IT must learn to work in a simplified, collaborative manner with business customers, and the business members on the teams need to learn to speak a little data warehousing so that they can better express their needs to the developers they’ve started working so closely with.
Each step up the requirements hierarchy we’ve suggested allows time for this learning and (not coincidentally) provides new and better program-level requirements that would have been overlooked if the entire effort had been planned out before development began, as would happened under a waterfall approach.
Might agile affect any BI specialties other than development that will positively impact the enterprise?
There are dozens of such secondary impacts that often make switching to agile warehousing a watershed moment for the DWBI departments and their companies. Let me cite just a single, two-fold chain of events that we often see.
First, ADW revolutionizes estimating by basing it upon what just happened in the last, three-week iteration and focusing it mostly upon what will happen during the next.
Second, ADW eliminates the waterfall approach’s sense of contracting with the business, switching to collaboration instead.
Projects start by delivering the most important business requests first, allowing the sponsors to stop the project at the end of any iteration and putting what was developed up to then into production.
Combine accurate estimation with step-by-step collaboration, and you’ll see many more projects get funded. Analytical services will come online more quickly because of greater development speed. In addition, incremental delivery allows something to be deployed even before the whole project is complete. With agile, our business partners no longer wait around, wishing they had insight into the business. Their data warehouse now delivers that capability in a steady stream of improvements.
With more projects funded and valuable increments online sooner, the enterprise can execute upon far more business opportunities than before, increasing revenues and decreasing costs.