Where Does Agile DW/BI Get Its Speed?

We explore three aspects of agile that form a synergy that doubles a team's development velocity.

By Ralph Hughes, MA, PMP, CSM

[Editor's note: Ralph Hughes is making the keynote address, "Scaling Agile Data Warehousing for the Enterprise," at TDWI's World Conference in San Diego, August 7-12, 2011.]

My company recently benchmarked our agile method against a comparable waterfall project at a major U.S. telecommunications company to document the speed and quality advantages of incremental and iterative delivery for data integration applications. The Agile project had more than twice the scope yet spent nearly one-quarter the labor in pre-development, and the ETL required 60 percent fewer developer hours per module. Regarding quality, the agile project had only ten defects found in system testing, zero in acceptance testing, and it operated in production without a single fix-it ticket for nine months. In contrast, the waterfall project had dozens of defects going into system testing that persisted through acceptance testing into production, where they so undermined the customer's confidence in the data that the application was never used.

Upon reviewing these results, our business partners in finance asked us a simple question: "Where does agile data warehousing get all its speed?" Having utilized the method for over ten years now, we have dozens of answers. If I had to boil them down to a single quality, I would have to say agile excels in speed to market through a strategy of "fail fast and fix quickly," even with large data integration efforts. To understand how this is accomplished, however, we need to first establish a shared notion of what agile data warehousing is.

Agile DW/BI in a Nutshell

Agile DW/BI is based upon the classic agile approach called scrum that revolutionizes the way software professionals collaborate within the project room. We put business and coders in the same room, repeatedly giving them two or three weeks to deliver the next increment of shippable code. We provide them only a high-level design to work from, letting them work in the fastest way they can devise, figuring out the remaining details as they go. We ask them to accumulate and utilize a reference model to guide their detailed designs, and have them work to a tough definition of "done" that's applied daily, using only two, light-weight, paper-based tracking tools that all parties can use to monitor progress within and between iterations.

We have to adapt generic scrum in two important ways for data integration projects. First, the solution architect must work with the project's business partner to draft a short vision document, partition the development work into "bite-size" chunks, and sequence the list of desired features for business priority and technical dependencies. Second, we utilize a second agile technique called Kanban to organize the work of the data modeler and systems analyst so that their specs for each module are 80 percent complete when the developers start building each module of code. At that point, however, the developers proceed with largely generic Scrum methods, gaining tremendous speed through a combination of self-organized teams, information-rich programming, and quality-driven development -- all of which allow them to fail fast and fix quickly. Let's consider these aspects to see where "fail fast and fix quickly" begins.

Self-Organized Teams

Whereas waterfall projects prepare a detailed work breakdown structure before coding starts, agile leaves most of that detail for teams to figure out for themselves. Activity within the project room during the first few iterations may look chaotic for a new team, but scrum requires teams to end each cycle with a discussion of how to improve their work habits. Agile places a tough burden upon the team: demo new, shippable code to your business partner every few weeks. Teams quickly devise for themselves exhaustive standards for module development, including traditional quality techniques such as peer reviews and thorough "as-built" documentation for operations. At the end of the cycle, the embedded business partner either accepts or rejects the new modules based upon their functional capabilities.

By placing such tough objectives upon the team and repeating the process every few weeks, agile forces the developers to get good at delivering increments of new value and keeps them good at it. They quickly eliminate every inefficiency in their process, evolving their work habits to be many times more effective than the highly-managed approach offered by waterfall methods.

Information-Rich Programming

Although there are ways to span geographic separation between teammates, generic agile co-locates developers as much as possible because it results in speed and quality. This co-location includes the embedded business partner plus the roles of data modeler and systems analyst. By keeping everyone highly present in the physical or virtual project room, developers can get questions on requirements, design, and testing answered in real-time, without having to deal with the delays of scheduling meetings or the miscommunication from e-mail chains.

An important result of close collaboration is its impact upon the embedded business partner. Whereas waterfall methods keep the business out of the project room and uninformed about DW/BI in general, the agile business partner spends time with the team, regularly reviewing deliverables. This exposure teaches the business partner about data warehousing, and as a result, the quality of requirements rapidly improves, enabling agile teams to accelerate delivery because the iterative process makes requirements more focused and accurate. In fact, agile business partners often learn to spot portions of project design that can be re-used or eliminated, allowing the team to remove many features, some of which may be time-consuming.

Quality-Driven Development

Regular demos of working, shippable code is a demanding quality standard that effectively places quality at the forefront of everything the developers do throughout their iterations. Agile teams meet this challenge through "test-led development." Before they code a component, developers write the tests it must pass. Then they code until the tests can be completed without exception. In this way, agile guarantees that modules function as expected and that no untested functionality has been added.

Most agile warehousing teams also implement automated test engines and continuous integration so that each day's work is validated against multiple error scenarios with high-volume data. Waterfall teams tend to let quality defects accumulate until just before final integration so individual bugs and their effects become hard to discern. For agile teams armed with automated test engines, however, coding errors are caught and corrected daily, driving defects to zero long before a release candidate arrives at acceptance testing. Keeping their code clean lets agile teams sprint across the project's finish line while their waterfall brethren are still teasing apart many, entangled layers of coding flaws and functional defects.

Fail Fast and Fix Quickly

All three aspects we've described save labor, but more important, they form a synergy that further doubles a team's development velocity. Software professionals make mistakes during projects -- mistakes in requirements, design, and coding. By generating a big, detailed design before coding starts, waterfall methods ossify these mistakes, so they cost a tremendous amount of time to resolve once the application reaches system test.

Through regular demos of shippable code and continuous integration testing, agile forces a team's mistakes to the surface quickly, allowing business and IT to have informed, adult conversations about what can be done with the resources available. The early discovery of major issues means the developers still have time to correct misconceptions in requirements and design. They can keep the project on track, avoid large-scale rework, and consolidate designs so less must be built. Building less, avoiding major defects, preventing rework -- this is where agile DW/BI gets its tremendous speed.

Ralph Hughes serves as chief systems architect for Ceregenics, a provider of project leadership for DW/BI programs among the Global 2000 since 1988. Inventor of the Agile Data Warehousing(TM) method and author of the book by that name, he is a faculty member of the Data Warehousing Institute and has taught or coached over 1,000 BI professionals worldwide in the discipline of incremental and iterative delivery of large data management systems. Contact him at ralph.hughes@ceregenics.com