Q&A: Data Modeling's Role in Agile Development

Why isn't data modeling part of agile development?

Agile development methodologies ignore the value of data modeling. Len Silverston, a well-known expert in data modeling and best-selling author of The Data Model Resource Book series, argues that doing so will seriously impact the quality of your software. Silverston is teaching Mastering BI with Best-Practice Architecture and Data Models: From Hub and Spoke to Agile Development along with Claudia Imhoff at the August and November TDWI world conferences. We asked him to elaborate on his views about the place data modeling has in agile development.

[Editor's note: "Creating an Agile BI Environment -- Delivering Data at the Speed of Thought" is the theme of the 2010 TDWI World Conference in San Diego.]

BI This Week: Should data modeling be a required part of an agile development effort?

Len Silverston: Agile development methods do not seem to require or focus on the need for modeling. In the Agile Manifesto (see http://agilemanifesto.org), which is a statement of the principles that underpin agile software development, the first principle is "Our highest priority is to satisfy the customer through early and continuous delivery of valuable software." The second principle is "Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale."

It seems that the focus is on the quick delivery of software, whether modeling is done or not. It seems that modeling is only valuable if it produces better code in a shorter time period.

In the book Agile Software Development with Scrum by Ken Schwaber and Mike Beedle (2002, Prentice Hall), there was only one small section (on page 59) that I could find that even addressed modeling in general terms, and there was no mention of data modeling. The authors state, "Does the team become more productive if models are optional? If yes, then the team only uses models to guide their thinking. Does the code quality suffer if models are only used to guide thinking? If yes, the modeling is required where more rigor is needed."

I think that we do not want to skip the step of data modeling, in any agile development effort. I think it should be a required part of the process. The same principles apply today that have always applied, which is that without understanding the data requirements or modeling data to explore and refine stable, thought-out designs, the quality of the system suffers, maintenance costs rise exponentially, and systems become a patchwork of data silos.

How can we perform agile data modeling? Given that agile teams live or die on the idea that they must produce something of business value, such as working software, in a two- to four-week period, is there time to model?

I agree that we need to deliver quickly and with high quality, so does modeling slow down the effort and become an impediment?

The principles in agile development of delivering business value quickly makes a lot of sense to me and so does using data models to help with this. However, we must develop methods to develop data models more quickly while increasing quality.

One way that we can do this is by reusing common data models and patterns. There are many re-usable or "universal data model" constructs that are available that can help organizations model much more quickly and with greater quality. [Editor's note: see Silverston's The Data Model Resource Books, Volumes 1 and 2 (2001, Wiley Computer Publishing).]

These models serve as re-usable components and are similar to employing and customizing re-usable application code. Thus, modelers do not have to start from scratch when there is a common modeling construct that is needed such as modeling customer demographics or accounting. Although these templates do not eliminate the need for modeling, they can help data modelers by providing a more expedient way to data model via reuse and they can point out potential issues.

Additionally, we have found that the same data modeling themes or "patterns" occur in over 50 percent of most data models, such as the need to model roles, statuses, classifications, hierarchies, contact information, and other common patterns (many of these patterns are published in The Data Model Resource Book, Volume 3 that I wrote with Paul Agnew (2009, Wiley Computer Publishing)). These re-usable models and patterns can be used on agile efforts to help deliver models more quickly. [Editor's note: The author, along with Claudie Imhoff, will cover how to do this in their TDWI course, Mastering BI with Best-Practice Architecture and Data Models: From Hub and Spoke to Agile Development at the August and November TDWI world conferences.] Even if just a day or two is allocated at the beginning of a sprint for modeling, one can employ reusable models and patterns to quickly produce a model that can help lead to a much higher-quality effort.

The term "agile" also means flexible and adaptable to change. In this context, there are ways to create "agile" data models. Many of these reusable models and patterns offer alternatives and ways to create flexible data model that can more easily accommodate change.

On an agile development effort, should there be processes in place to ensure that the data structures are integrated with the rest of the enterprise and that a common, standardized, and/or enterprisewide data models used so that the effort does not result in a data silo?

In scrum, there is a role called scrum master; this person serves as the manager whose principle job is to remove impediments because the goal is to deliver quality code as quickly as possible. In my experience, one of the largest impediments that can slow down a project is any type of standards group (such as a data architecture group) that wants to integrate the effort.

In the scrum method described in Schwaber and Beedle's book, part of the process allows for "standards, conventions, and guidelines" to be incorporated into a sprint. This step is a place where an enterprise architecture group can and should help integrate the effort using tools such as an enterprise data model to conform and standardize data structure so that the effort does not result in a data silo that has its own data semantics.

However, in my experience, it usually takes more time to integrate the effort. For example, it is most likely quicker to develop an independent data mart than go through an integrated data warehouse which then feeds a data mart. I think the key question then becomes, "Because agile development focuses on 'early and continuous delivery of valuable software,' what is the value of having integrated solutions with much higher data quality versus delivering a more expedient non-integrated solution?"

If data integration is truly recognized as a significant business value, then processes -- such as ensuring that a sprint's data structures integrate with a standardized or enterprise data model -- must be in place, even if it means that the sprint takes a little longer.

Must Read Articles