In-Depth
Agile Data Modeling: Evolving Toward Excellence
Agile data modeling calls for a new set of practices that enable the safe evolution of models, even those in production.
By Ken Collier, Agile Analytics Consultant and Author, KWC Technologies, Inc.
[Editor's note: Ken Collier is making the keynote address, "Agile Pitfalls, Anti-patterns, and Gotchas," at TDWI's World Conference in San Diego, August 7-12, 2011.]
The number one resistance I encounter helping organizations adopt Agile BI is from technical architects and data modelers who say something like, "Agile makes sense as long as the data models are developed and settled." Modelers often incorrectly believe that data models must be designed, developed, and locked down before building applications that use the data. The idea of evolving a data model incrementally can strike fear in the hearts of modelers and architects. Sometimes it is a fear of rework; data modelers prefer the goal of getting it right once. Other times it is the fear of unintended side effects or the risk of creating a spaghetti mess.
What's agile data modeling? Agile modeling calls for a minimally sufficient design up front to establish a reference model that guides the delivery team's incremental development activities. Aspects of the logical and physical models are completed just in time to support the BI features under development. Agile modelers avoid detailing aspects of the model that aren't immediately needed. Combined with good data modeling discipline, this style produces the right data model for its intended purpose. The model evolves to support future requirements as those become reality. Scott Ambler's book Agile Modeling (see recommended reading at end of article) covers agile modeling principles and practices in-depth.
Why is agile data modeling a good idea? To paraphrase Ron Jeffries, one of Extreme Programming's founders, the best way to implement a DW/BI system is to implement less of it. The best way to have fewer defects in your DW/BI system is to have a smaller/simpler one. The problem with comprehensive up-front modeling is that you must design for all contingent requirements, both known and speculated. This inevitably results in an overdesigned model that costs more to implement, is costlier to maintain, is more likely to contain defects, and is more difficult to understand.
Agile data models are as simple as possible while being sufficiently detailed, accurate, and consistent. They also fulfill a well-understood purpose and provide positive value (i.e., their benefit outweighs the cost of keeping them updated).
How to Safely Evolve Data Models
These concerns about evolutionary modeling resulting unnecessary rework, unintended side effects, and design degradation are legitimate. Additionally, the prospect of making a data model change to a high-volume data warehouse in production can be scary. Agile data modeling calls for a new set of practices that enable the safe evolution of models, even those in production. I'll summarize those practices here. Consider this list a brief introduction; each deserves a deeper study to gain proficiency.
Data Model Patterns: Data models evolve toward excellence when we take advantage of tried and proven designs. Design patterns enable us to benefit from mature solutions that have previously been developed. Effective application of patterns relies on familiarity and awareness of patterns catalogs, and the ability to use them appropriately and sparingly.
Michael Blaha's Patterns of Data Modeling is the most recent catalog of patterns, but David Hay first started cataloging data model patterns in 1996 with his Data Model Patterns: Conventions of Thought, and followed in 2006 with Data Model Patterns: A Metadata Map. An extension of data modeling patterns is the adaptive data model (ADM), a generalized data model designed to accommodate multiple domains. The ADM has been successfully used in data warehouse design and I write about it in detail in the Cutter executive report, The Message Driven Warehouse.
Technical Debt Management: Data models evolve toward excellence when changes are easy to make due to low technical debt. Technical debt is common in data warehousing. It is the entropy that occurs in any system over time due to development shortcuts, suboptimal design choices, maintenance activities, and so on. Like financial debt, a little technical debt is acceptable as long as we monitor it and pay it back quickly. When technical debt accrues unabated, the cost of change becomes unacceptably high. Agile data modelers continuously identify, prioritize, and monitor technical debt in the data model, seeking to eliminate it to assist with fast response to new requirements.
Database Test Automation: Data models evolve toward excellence when we have continuous confidence that our ideas are working. Agile BI practitioners work in short iterations delivering business value every few weeks. We need confirmation that what we build in later iterations doesn't break what we built in early ones. The only practical way to accomplish this is with an automated test suite. Automated database tests validate data structures, data content and quality, schema constraints and integrity, data derivations, and so on. We can run automated tests quickly and simply at any time to confirm that everything still works. Tests are added as data model changes are made, so the test suite grows alongside the model. I devote an entire chapter to this topic in my book, Agile Analytics.
Database Refactoring: Data models evolve toward excellence when we can safely make changes to the design, even if it is in production. Database refactoring is a technical discipline that enables the safe evolution of data models without breaking previously working features and components. In their book Refactoring Databases, Scott Ambler and Pramod Sadalage explain database refactoring as "... a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics." Refactoring combines automated regression testing with a change transition period to ensure that changes haven't broken anything. The change transition period is a window of time during which the model revision lives alongside the former version to ensure that nothing breaks. Refactoring combined with test automation are the central disciplines for effectively evolving data models.
Recognizing Data Model Smells: Data models evolve toward excellence when we recognize where they need improvement. Experienced data modelers develop a nose for elegant designs ... and stinky ones. Discovering smells in the model is an essential precursor to improving it. Smells may include smart keys, multipurpose columns and tables, data redundancy, very large tables, and so on. By learning to pay attention to these smells, you can focus your attention on possible problem areas and consider them as candidates for technical debt management.
Change Deployment: Data models evolve toward excellence when we can quickly deploy changes at any time without fear. Agile BI practitioners develop data model changes using techniques that safeguard production deployment. All data model changes are scripted and those scripts are kept under version control. Database schemas are versioned and scripts are developed to roll forward to the next version, and roll back to the previous version in case things don't work as expected. Data migration scripts ensure no loss or corruption of production data. Everything is automated and tested carefully in preproduction. Automated deployment, automated testing, and database refactoring disciplines support frequent, fast and fearless deployments.
Take Small Steps: Data models evolve toward excellence as a series of small, easily understood changes. It's easier to undo a sequence of little changes than one big, complicated one. A side benefit of agile BI is that short iterations force us to plan in small steps. Agile data modelers quickly learn to change only what is needed to support the BI features currently in development.
The Final Word
All of these techniques combined form a strong safety net for evolutionary data modeling. Moreover, these techniques aren't exclusive to agile BI, they should be considered as modern data warehousing practices that should be among the skills of every data modeler and data warehouse practitioner -- agile or otherwise.
Ken Collier is the president of KWC Technologies and leads the agile BI practice for Cutter Consortium and provides agile training, coaching, and mentoring in both product development and data warehousing/BI. He is the author of Agile Analytics: A Value-Driven Approach to Business Intelligence and Data Warehousing. You can contact the author by visiting his Web site, http://www.theagilist.com.
Recommended Reading
Ambler, S. W. (2002). Agile Modeling: Effective Practices for eXtreme Programming and the Unified Process. New York: John Wiley & Sons, Inc.
Ambler, S. W., & Sadalage, P. J. (2006). Refactoring Databases: Evolutionary Database Design. Boston: Addison Wesley.
Blaha, M. (2010). Patterns of Data Modeling. Boca Raton: CRC Press.
Collier, K. (2011). Agile Analytics: A Value-Driven Approach to Business Intelligence and Data Warehousing. Boston: Addison-Wesley Professional.
Collier, K., & O'Leary, D. (2009). The Message Driven Warehouse. Cambridge: Cutter Consortium, Inc.
Hay, D. C. (1996). Data Model Patterns: Conventions of Thought. New York: Dorset House Publishing.
Hay, D. C. (2006). Data Model Patterns: A Metadata Map. San Francisco: Morgan Kaufman.
Longman, C. (2005, December 7). Data Warehousing Meeting - December 7, 2005. Retrieved November 16, 2008, from DAMA UK - Data Management Association: http://www.damauk.org/Building%20the%20adaptive%20data%20warehouse%20-%20Cliff%20Longman.pdf