Why Data Virtualization Trumps Data Federation Alone
Data virtualization confers advantages that simply aren’t achievable by means of federation alone.
Even as data federation has earned itself a full-fledged seat at the always-contentious data management (DM) table, federation itself is morphing into a very different beast.
Data virtualization is similar to, but wholly distinct from, data federation.
Some Bi professionals believe that virtualization is an offshoot of federation. That’s one reason why some federation specialists -- such as Composite Software Inc. and Informatica Corp. -- have started to make data virtualization a key component of their marketing efforts.
According to Philip Russom, research manager with TDWI Research, the research arm of The Data Warehousing Institute, data federation and data virtualization are similar enough to invite comparison and cause confusion.
Instead of seeing virtualization as a subset of federation, Russom instead argues that the latter precedes the former. “Data virtualization and data federation are closely related, which may be why so many people confuse the two or think they are identical,” writes Russom, in a new Checklist Report from TDWI Research, Data Integration for Real-Time Data Warehousing and Data Virtualization.
To understand virtualization as a subset of federation is to put the cart before the horse, Russom explains. Even though federation (as we know it) established itself before virtualization (as we know it) went mainstream, data federation partakes of some of the same concepts or methods that underpin virtualization.
“[D]ata virtualization must abstract the underlying complexity and provide a business-friendly view of trusted data on demand. To avoid confusion, it’s best to think of data federation as a subset or component of data virtualization. In that context, you can see that a traditional approach to federation is somewhat basic or simple compared to the greater functionality of data virtualization,” he writes.
In any case, data virtualization is a concept -- with cachet -- that providers such as Composite have eagerly appropriated. For one thing, it makes use of the same data federation technology they’ve been marketing for years.
"As an alternative to the EDW, what if we could have a common place to go?" asked Bob Eve, executive vice president with Composite, in an August interview at TDWI’s Summer World Conference in San Diego. “It can be a virtual place; it doesn't have to be a physical place. I can federate my sources; I can abstract them; the location doesn't matter, the physical part -- none of that matters."
The virtues of data virtualization speak for themselves, according to Eve: “[T]he concept of the virtualized data layer … says [that] instead of bringing everything into one place [as with an EDW], you can just leave it where it is but still make it available to your users." A virtual abstraction layer likewise gives you the kind of flexibility that an inescapably physical EDW not only does not but cannot, he maintained.
In his new Checklist Report, Russom expands on this theme, noting that data federation is a key component of data virtualization.
A key component, yes -- but one that’s best used along with other integration tools, as well as a single metadata repository.
“Data virtualization provides … [an] abstraction layer … [that] insulates DI targets from changes in DI sources as new data sources are added or retired,” he writes, adding that changes to DI sources are all but inevitable, thanks to activities such as mergers, acquisitions, or data mart consolidation efforts.
“The layer also enables data provisioning -- even in real time. Real-time provisioning is key, because the kind of data you want to move in real time for [business intelligence or data warehousing] changes so rapidly that only the most recent update is useful for a business process.”
Nor is that all. Because virtualization provides an abstract view of data sources -- in effect, it decouples data from the physicality of the hardware or location where it’s stored -- it confers other advantages as well.
“[W]hen this layer exposes data objects for business entities -- such as customer, order, and so on -- it provides a business-friendly view that abstracts yet handles the underlying complexity,” he explains.
Russom notes that data virtualization can promote collaboration and ensure trusted data. “It can ensure that inconsistencies and inaccuracies across heterogeneous enterprise data stores are identified upfront without the need for staging and pre-processing,” he explains. “In addition, [data virtualization] enables business and IT to collaborate in defining and enforcing data quality rules on-the-fly or in real time as data is federated.”
Most importantly, virtualization makes it easier to reuse objects and services, Russom concludes. “It allows you to design data object models and services once, federate access across fragmented or siloed data sources, and provision the data however it is needed by consuming channels and applications” for example, SQL, Web services, and batch ETL.