Q&A: Data Marts Regaining Respect -- With Caveats
Reining in data marts is challenging but necessary.
- By Linda L. Briggs
Data marts, once scorned by the data warehousing community, are gaining a new respect. Organizations now depend on data marts for uses as diverse as departmental BI, advanced analytics, and big data. Still, reining in rogue data marts continues to be a challenge -- and a necessity.
In this interview, we speak with Nancy Kopp about the issues surrounding data marts. Kopp is a program director in IBM's information management development organization, focusing on database and appliance competitive strategy. She has been with IBM since 1999 and has over 17 years of experience in managing, implementing, and delivering information management projects, both as a customer and a consultant. Kopp is an active blogger on data warehouse topics and works closely with many key industry leaders.
Kopp's bottom line on data marts: "Your goal should continue to be consolidation, but marts can and should happen with good reason and strong management."
Kopp spoke recently along with TDWI Research Director Philip Russom at a TDWI Webinar on "Data Marts: If You Must, Then Do So Responsibly!"
BI This Week: Are data marts still a timely idea in today's consolidated world?
Nancy Kopp: For years, we've been playing a game of Whack-a-Mole with data marts. We aim to consolidate and they continue to pop up. You have to stop and ask, "Why does this continue to happen?"
In my opinion, a couple of things are happening. First, there is still a need for consolidation for obvious reasons -- governance, compliance, and to control redundancy. That said, BI has become mission-critical, so we can't ignore being responsive to the needs of the business when they ask. A marketing executive with short-term goals to launch an opportunistic campaign is not always going to be patient while all the data is modeled out according to the requirements of the enterprise. As we continue to push down the path of operational BI, we increase the dependency on data, and we decrease the patience of our clientele. That's just the way it goes.
The other thing I see happening is that the business case and the politics around building that single source of truth continue to be our biggest challenge. I have yet to find really well-documented return on investment (ROI) business cases for an enterprise data warehouse (EDW), but there is no shortage of analytic business cases with insanely good ROI.
That doesn't mean we didn't get significant value from building an EDW. It just means we had no easy means to measure and articulate it in a manner that our friends on the business side find acceptable or even digestible. When the economy started to decline, we began to take the easy way out -- we built more data marts.
In addition, we have seen both cost and resource pressure. Many smaller niche players have built their businesses on offering alternatives to offload the EDW, using all sorts of newer technologies. Many customers found it difficult to ignore the new shiny toy that ran fast and appeared to be less expensive. Consolidation projects started to get shelved around 2009, when the economy took a downward turn. The business remained dependent on BI to be smarter, but there was less tolerance for taking the time to deliver and being more strategic.
Now that we have forged the path of consolidation, I think we have learned a few things. One clear lesson is that it may not be optimal to consolidate absolutely everything. Therefore, in cases in which we are still managing the decentralization, offloading some work from the EDW is not a bad idea. For example, the ability to deal with large amounts of unstructured data is not something you want to manage inside the EDW, so complementing the EDW with a Hadoop-type solution can help deliver more function.
The idea here is that you need to start to look at building an ecosystem around the EDW so that you can take full advantage of all the different types of computing power and technology possible.
Bottom line: Your goal should continue to be consolidation, but marts can and should happen with good reason and strong management.
How can the business side be convinced of the value of better management of data marts in an overall EDW environment?
That is the holy grail and we've been looking for it for years. For financial information, it's a pretty easy argument -- if the CFO doesn't have facts straight, jail time can result.
Not all aspects of the enterprise have that much at stake regarding data governance and compliance. The easiest argument is probably cost. Replicating and then managing replicated data costs more money. Getting everyone to have a conscience about those costs is the real challenge. I have seen customers try to add this to personal business commitments and measurements, but it's never easy to overcome those political hurdles. Rewards are usually tied to results, and maintaining enterprise data is more effort.
As an evangelist for consolidation over the years, I've done my fair share of research, even poking at my consultant counterparts to find the ultimate example of a firm that has documented ROI with consolidation. You can find capital savings and labor savings, but articulating the true value of enterprise data is more challenging.
Here is what we have to realize: The EDW itself will not necessarily yield an ROI past the cost of not replicating and managing data, but it does enable applications and capabilities that would not be possible without the enterprise data. Those applications – for example, understanding customer behavior, monitoring for fraud, and several data mining applications -- can yield fantastic ROI and get better with more data. Focusing on the capabilities that just aren't possible with disparate marts tells a very good story for the EDW to the business. The act of consolidation is really a means to the end of creating more insight for the business, so that's the story to tell.
How much of a problem are rogue data marts, and in what sense?
If the data in the rogue mart is of no value to the organization, the threat is relatively low. These marts may be highly disposable data for short-term analysis. If the application on the rogue mart is of high value, you want to make sure you manage the data so decisions and actions are in line with information that is correct and decisions are sound.
My advice is that in IT, we have to learn to pick our battles better. Focus more on things that keep us from being able to deliver real business value. If a rogue mart is out there -- OK. If there are many, pick your fight carefully.
How can challenges with decentralized data marts be mitigated and addressed?
The best way to mitigate issues with decentralized marts is to help the organization better understand the challenges they pose, and that these challenges keep them from moving ahead. First, go after the costs of redundant data. What does it cost to maintain multiple instances, not just in capital but in staffing? Which version of the truth do you trust? How long does it take you to bring forward new capabilities? Do you spend more time assembling data then being able to act on new insights? Once the organization understands the barriers to maturing their BI capability, you can take action to consolidate.
I always used the strategy of showing my business partners a prototype of an application I knew would be of high value. When I asked the question, "Is this something you want?" I hardly ever got a no. However, I didn't sell them on everything I had to do to get there; instead, I sold them on the capabilities that could be achieved.
Don't focus solely on cost -- sell capabilities that can be enabled once consolidated.
What kind of time-to-value might a mid-size company see with a data mart consolidation project?
I have two thoughts here. First, the time to deliver analysis is reduced when there is less time assembling the data from many sources or marts. Value is not measured by the ability to ask the question of the data, it's in the ability to take action. If a firm spends all its time in data prep because of disorganization, its time to value can only improve with consolidation when new applications are requested.
Second, the time to value would be measured in the firm's ability to deploy and use capabilities that they were unable to use before. That might include capabilities such as upsell from one side of the business to another, or understanding the value of a customer across their business. Once the consolidation is complete, the time to value for new BI capability will significantly increase.
How do you distinguish a rogue data mart from an independent -- and highly useful -- data mart?
Speaking from some experience as an application development manager, many of us would argue that those rogue marts were highly valuable! Yes, I confess that I was one of those who had an IBM SP2 under my desk to do some cool spatial analysis that was "out of scope" of our warehouse at the time.
My rogue mart evolved into an application that became a significant argument to further BI in our organization with site analysis for our stores, but it didn't start that way. I was dealing data and my clients loved me, but was I ever ready to hand that over to the glass house after about a year, because ownership is indeed highly overrated in such cases.
In my opinion, many of these rogue marts are simply the business cutting their teeth on BI. These are applications that we should follow and monitor but not necessarily discourage. If the analysis or application looks to have legs and will grow, then we pull it in. I think the best way to deal with this is offer a sandbox -- that way you supply the ability but you manage its progression.
What sorts of challenges do companies face in managing a consolidated architecture that still features some level of managed decentralization?
There are several areas that can be a challenge when you are managing decentralization.
First of all, consider ETL. Where is the data coming from and how does it get transformed and cleansed? Will the data match other sources? How do I land the data in the right place at the right time?
Next, think about workload management and monitoring. Do I apply the same level of management to the systems outside the EDW? How do I monitor activity and usage? How do I manage aging data?
Then there's maintaining compliance and data governance. This is especially an issue for low-latency applications that tend to be mission-critical and highly operational in nature.
Finally, there's monitoring for changes in your workload that may signal a change in patterns. For example, take an application that was once very high latency and is now moving into much lower latency and toward operational BI.
How is metadata and master data handled when there are many data marts in place? Is that an argument for consolidation?
That's easy. When data mart sprawl isn't controlled, metadata is highly neglected. I would love to believe that metadata is a good argument for consolidation -- and it is -- but not as an argument standing on its own. The promise of good solid metadata, although it brings a smile to my face, just isn't a sexy enough argument for the business.
The ability to understand information about the data tends to be more of an expectation than a stated requirement from the business. It's certainly not something the business wants to think about or, even worse, actually pay for. Master data should be just that -- the master no matter what, with marts maintaining a dependency on the MDM source as the source for that data. Once that link is broken across too many marts, consolidation becomes an absolute must.
The fact is, consolidation gives us a good opportunity to get things right, but metadata and MDM are not exactly consolidation's best-selling features because to the business it's boring. Both serve as important justifications for consolidation, but the promise of new business value is essential.
Do data marts lend themselves to agile BI?
If you look up the word "agile," you see words such as quickness, lightness, and ease of movement; nimble. Something tells me that a vendor looking to sell more data marts and fewer EDWs coined this phrase. There's no doubt that in the minds of our business users, the EDW is not always the agile solution they want it to be.
As we have continued to build out these large EDWs, we have become less agile. Let's just admit it and move forward. Data marts are not completely synonymous with agility, but they have certainly occurred because the business required a faster response time. On the other hand, if you have a lot of independent marts, you also lack agility, but in a different way -- more in overall BI capability and control of your data.
Organizations should look to upgrade their agility, if you will, by optimizing their EDW environment. Where can you add some function and offload others to both compliment the EDW and offer a faster turnaround on solutions? In some cases, a data mart may help. New technology on the market offers high performance, ease of use, and lower cost, and can't be ignored. It should absolutely be a part of the ecosystem surrounding the EDW.
What does IBM offer that addresses the issues we've talked about here?
IBM believes in practicing "smart consolidation," and offers capabilities across our portfolio that allow customers to take full advantage of the technologies available to move ahead in their BI and analytic requirements. For starters, we're still very focused on consolidation and a single source of truth; however, we believe that with all the new technology and types of data we need to deal with, a monolithic architecture is not the only answer. Rather, we envision an ecosystem around the EDW that complements -- and offloads where appropriate -- some of the work of the EDW because we've learned some things about managing absolutely everything in one structure.
That knowledge has helped us evolve to the next wave in architecture. This new wave, while maintaining strong governance, also takes into consideration optimal price performance and responsiveness to the needs of the business.