How Data Federation Can Co-exist with an EDW

Why data federation has earned itself a seat at the data management round table

When Bob Eve first took the marketing reins at data federation specialist Composite Software Inc., people in the data management (DM) space treated him like a heretic. This was back in the data federation Dark Ages -- circa 2006 -- when federation vendors such as Composite were still viewed with deep suspicion by data management (DM) purists.

Fast forward to today, and the passions that were once aroused by the old data federation debate have cooled, in part because data federation has earned itself some respect. Instead of being perceived as a zero-sum proposition -- i.e., one can either federate data or put everything into an enterprise data warehouse (EDW) -- data federation is now viewed in (mostly) complementary terms. Eve explains that one can both federate and have an EDW.

"I think we've kind of crossed a bit of a chasm, and we've also found that where we used to have this 'us' versus 'them' kind of stuff, we now kind of have this 'and' story: you use it with your data warehouse," says Eve. "We just can't be stupid and say it's a virtual data warehouse. You can use it to replace [your enterprise data warehouse]."

The DM conventional wisdom has certainly changed. Eight years ago, data federation was more popularly known as enterprise information integration (EII). When EII first appeared, its technology model suggested a number of compelling use cases, some of which -- from the perspective of DM purists, at least -- were far short of ideal.

It was as a data warehouse (DW) replacement technology -- or as a "virtual" data warehouse -- that EII most offended the sensibilities of many DM practitioners. Such positioning, Eve concedes, was "stupid." By the same token, he suggests, so, too, is the notion of the EDW -- that is, One DW to Rule Them All.

"We've [i.e., the industry has] worked on data warehouses for a long time. For a long time now, we've had this idea [that] we'd get one big über enterprise data warehouse. One place where we could put everything," he explains.

Eve contends that the idea of the EDW is as misguided as that of the virtual data warehouse via EII. On paper, the EDW is a fine idea, he concedes. There are likely some environments or scenarios in which it's an achievable and even laudable vision. For many organizations, however, an EDW is a misguided goal.

What's more, Eve continues, an EDW-at-all-costs philosophy can handicap organizations and frustrate business users.

"Organizations are dealing with data of extreme complexity and with data volumes that continue to explode. They're also trying to quickly incorporate new technologies -- especially predictive analytics -- and service the increasingly sophisticated needs of users," he observes. More important, while the [enterprise data warehouse] is nice, it isn't necessary. I don't know that you'll find anyone seriously arguing that the EDW is essential anymore."

Eve's a pragmatist. Just as there are use cases in which an EDW is feasible (and thus desirable), a virtual data warehouse likewise makes sense in a select few use cases, too -- for example, as a tool for DW prototyping and as a quick-and-dirty Band-Aid solution for post-M&A scenarios, data mart (or possibly data warehouse) consolidation efforts, and other inescapably temporary projects.

These are the same kinds of use cases -- along with frequently refreshed reporting -- that data federation pragmatists first touted half a decade ago, he stresses. The difference this time around is that passions (on both sides of the divide) have cooled, even as the divide itself has basically ceased to exist.

"As an alternative to the EDW, what if we could have a common place to go?" asks Eve rhetorically. "It can be a virtual place; it doesn't have to be a physical place. I can federate my sources; I can abstract them; the location doesn't matter, the physical part -- none of that matters."

This isn't the same as a virtual data warehouse, he contends; instead, it's "the concept of the virtualized data layer, which says [that] instead of bringing everything into one place [as with an EDW], you can just leave it where it is but still make it available to your users."

Instead of connecting dozens (or even hundreds) of different data sources via federation, Eve explains, you're connecting disparate data marts, data warehouses, and other (often hitherto siloed) data sources.

A virtual abstraction layer likewise gives you the kind of flexibility that an EDW not only does not but cannot, he maintains.

"If you try hard enough, you can get almost everything you want in your warehouse, but what are you going to do when [your company goes] out and buy[s] somebody? They have a data warehouse; we have a data warehouse." This doesn't even take into account the existence -- or the inevitability -- of data marts, either.

From the perspective of many DM pros, of course, "half the data marts out there should never have been built," Eve concedes. The fact remains that data marts -- like commodity servers -- tend to breed and multiply. "Most of them are just some guy murking around with the data warehouse [and] managed to get his own copy of the data. Now all of the hard work that you did to get everything into the EDW is slowly degrading over time."

A virtual data abstraction layer helps accommodate errant data marts and other less-than-ideal data sources in the short-term -- which keeps business users happy -- even as DM teams work to address the issues that produced them in the first place. Data mart tends to spring up when business users feel that they're ill-served by existing resources or connectivity, after all.

"I can glue together my data warehouse even though I have all of these [data mart] pieces," he explains. "Everything is sharing a single abstraction layer."