Informatica and the Power of Data Virtualization
From Informatica's viewpoint, being realistic about data virtualization shouldn't mean adjusting -- or curbing-- your expectations of what the technology can do.
- By Stephen Swoyer
Ash Parikh, director of product marketing for Informatica Corp., says he's as enthusiastic about data virtualization (DV) as any of his competitors, but he says Informatica is more realistic about DV than its competitors, too.
There's a catch, however. From Parikh's perspective, being realistic shouldn't mean adjusting -- or curbing, for that matter -- your expectations. That's at the core of Informatica Data Services (IDS), the data virtualization services platform Informatica launched last year.
"What if in data virtualization you could actually do profiling in mid-stream, on federated data? You're saving time, right? What if you could actually enforce the data quality, the cleansing, the masking, and all of the rich rules in real-time [for] federated data?" he asks.
"You have to do all of this in flight, and [some of our competitors] will tell you, 'No one can do that!' However, if data virtualization is going to be successful, it needs to be optimized as an end-to-end process. This is why you need a data integration vendor who truly understands the end-to-end flow."
Of course, DV seems to imply (or assume) some degree of data federation (DF), whether it's federation in a pragmatic sense (in which DF is used to create a virtual view of heterogeneous, siloed, or otherwise-unmanaged data sources) or federation in a high-level sense -- i.e., a virtual view of multiple managed data sources, be they data marts, data warehouses, operational repositories, and so on.
Parikh doesn't dispute this. In fact, he concedes, DV itself addresses one or both of these use cases. What seems to irk him most is the claim that data federation is somehow an extra- or super-DI practice. Parikh dismisses this as misguided.
"From the beginning, some of the traditional data federation vendors treated federation [as] outside the purview of core data integration. What we [at Informatica] said is that data federation is a subset of all the things you're going to do in the data integration space. To an architect, that translates into a single environment for both data federation and traditional data integration," he explains. DF players who've argued the contrary tended to do so to offset their own technological limitations or their limited perspective, Parikh argues.
This is doubly the case in the Brave New World of DV, he continues.
"If I were a customer, the first thing I'd ask is 'When I create these virtual views, am I going to need a separate tool to virtualize or move that data into a persistent store, like a DW?'" Parikh explains. "But why would you need a separate tool? Why would you want to pay for the training or [the] skills ... to understand a whole new tool? In data virtualization, there are use cases for data federation or traditional data integration. It makes more sense to offer both in a single environment."
Informatica's two-tools-in-a-toolbox argument could be seen as likewise self-serving, but Parikh doesn't think so. "It isn't a question of our not being able to do [data federation]. What we're saying is, we can provide you with a virtual view, and we can provide you with all of the ETL-like data transformations that you like. We're now providing those [transformations] in real-time, to a virtual view as [data] is in-flight. We do that even with DQ transformations and with data masking. Imagine how complex that is!" he argues.
The beauty of data virtualization, as Parikh sees it, is that it enables faster, richer collaboration between DI architects and the line-of-business. That's been a big problem -- it's arguably been the Big Problem -- from the beginning
"The problem was that the business got involved too late. By the time [IT] delivered [connectivity to the requested data sources], it could be weeks or months, and what if they missed [the mark]? What if [users] can't access specific data, or they can't access it in the best or the mostly highly-optimized way?" he points out.
"The business user needs to be the person who owns the data, because the business user tends to be the person who knows the data best. So ... let them own the creation of a data model or these business rules, these masking rules, [of] all of the different policies they need to create. IT is not losing control; they're bringing the business user much closer to what they truly need to do."
This is what makes DV so compelling, Parikh argues: in some cases, it can radically accelerate the data integration timetable, shifting the period from "days or weeks or months down to a hours or even minutes."
Secondly and most importantly, it emphasizes the involvement of the business at a very early stage -- e.g., the creation of logical objects -- and likewise promises to give the business an ownership stake in the project.
"Our approach is to have IT work with the business to create logical objects. You might take one of these [logical objects] and point to it and say, 'This is going to support applications that are talking SQL,' but in the very next minute you might say, 'I want this [logical object] to support Web services.' With just a click, you can have it point to SQL, to Web services, or even to batch," he says.