Q&A: How Data Virtualization Can Rein in Disruptive Technologies
Major trends in business intelligence such as big data, mobile devices, and cloud computing are impacting information management. In this interview, Composite Software's Robert Eve discusses how enterprises can respond effectively to these disruptive influences, and how data virtualization is especially well suited to address them.
Eve, a long-time industry observer who is the author of the 2011 book, Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility, is executive VP of marketing for Composite Software, Inc., an independent provider of data virtualization software.
Linda Briggs: What do you see as the major trends disrupting information management?
Robert Eve: From the business side, users today are far more demanding as the economy, competition, and cost-cutting all combine to force business innovation. Users seek new information technology that helps them delight their customers and differentiate their offerings. This means that simply understanding what happened yesterday and today is no longer good enough. Business needs to predict what is going to happen and find a way to impact that future in a favorable way.
Furthermore, information-savvy business staffs are far more analytical and have IT skills. This makes them less patient with respect to the "IT backlog" and far more willing to try "do-it-yourself" approaches.
From the IT side, traditional IT architectures with their focus on transaction processing and data warehouse-based reporting are under assault. Innovations include machine-generated data (from weblogs, sensors, location reports, and more), new data types such as social and video, and the need to bridge this new data with traditional systems.
This means the IT environment keeps getting more complex. New applications being added to IT's traditional infrastructure include analytics and mobile, big data sources such as Hadoop, and new styles of deployment such as cloud computing.
How are these trends disrupting data integration?
The trends disrupting information management have significant consequences for data integration.
The first impact relates to time. Businesses are hampered by the traditional seven to eight weeks that TDWI surveys say it takes to add a dimension to an existing warehouse or to create a new complex report. Because data integration often takes half the elapsed time in any IT project, this lack of agility is no longer tolerated.
The second impact relates to breadth. Although building a data warehouse was never easy, it was certainly easier when most of the data sources were other databases, nearly all of which shared the same language, SQL. Source data types and stores have exploded in the past three to four years. Terms such as big data and NoSQL actually represent dozens of forms and formats, including graph data bases, key value stores (and more) and don't forget the cloud.
The third impact relates to integration methods and skills. Moving faster and integrating more diverse sources will not only drive the need for more agile data integration methods such as data virtualization but also new skill sets such as MapReduce development and canonical-style data modeling.
How are enterprises responding to the challenge of integrating big data?
It's important to remember that analytics drives big data, not the other way around. In pursuit of new business innovation, analysts experiment on their data in all kinds of new and interesting ways.
From a data integration point of view, the big data challenge is gathering the data the analyst requires. This can be done by analyzing the source data in place with direct queries, or using something like data virtualization as an intermediary. As an alternative, the source data can be replicated and moved to a specialized analytical store of some kind using ETL or data replication. Some combination of these two approaches can also be used.
Although it's still early, today we mostly see data being replicated and moved to specialized analytical stores. This gives the analyst lots of control over his or her analytical silo. However, it also means lots of silo proliferation.
Down the road, however, what will happen when analysts want to integrate data across silos? Here is where leading analysts such as Mark Madsen, Shawn Rogers, Mike Ferguson, and others believe that data virtualization will play a more significant big data integration role.
How are enterprises responding to the cloud data integration challenge?
As more of the data used by the enterprise is distributed across the cloud, integrating it certainly becomes more challenging.
So far, when it comes to cloud data integration, enterprises seem to be applying the 80/20 rule. On the "80" side is integration of internal data with the largest cloud application providers, such as salesforce.com and Workday. A number of data integration vendors provide out-of-the-box, physical or virtual integration solutions, or both, for this requirement. Also on the "80" side are popular cloud-based data services such credit scores and identity verifications. Providers generally offer several types of APIs that help enterprises simplify integration with these offerings.
The "20" side presents a more interesting data integration challenge, however. Here the data integration problem switches from connectivity to latency. Simply put, without a so-called "booster" such as distributed caching of data sets closer to the consumer, the Internet is simply too slow to handle large data sets. We already have seen this approach applied by Netflix for managing its movie distribution business. Cloud data integration is next.
How are enterprises responding to mobility's data integration challenge?
I believe mobility is a bigger problem for applications than for data integration. The applications need to be easier to use, with new form factors and deployment architectures. Usage volumes will rise, meaning more data is required, and thousands of new applications will come into being, further accelerating data demand.
The hard problem for data integration is that the location of the mobile user is unpredictable; much of the data they might use spans the enterprise, big data, and the cloud. With so many moving parts, I anticipate lots of experimentation and innovation by both the enterprises and the data integration vendors, so stay tuned.
In addressing these challenges, what is the benefit of data virtualization versus other approaches?
With all of these trends, I have seen a significant shift in how enterprises think about data virtualization.
Traditionally, data virtualization was considered a point tool to add to the data integration mix. As a point tool, faster-time-to-solution was considered the primary advantage. Think about it: a business view, which forms the core of a data virtualization solution, is built and maintained as a single object. Create the view and it's ready to go. Contrast that with traditional data consolidation, where you must model the schema, map the source to the schema, write the ETL, and move the data before you are ready go. Which option takes fewer steps?
With the proliferation of new sources such as big data analytic silos and the cloud, enterprises no longer believe they will ever get all the relevant data into the enterprise data warehouse. Instead, enterprises desire a new data integration architecture that presents a unified business-view layer across all their diverse sources.
For this use case, data virtualization's primary advantages are twofold. The first is data abstraction, which transforms complex IT taxonomies into simpler to understand and use business ontologies encapsulated in business views. The second is data federation, which enables high-performance queries across multiple and diverse sources.
Is there a role for a hybrid mix of data virtualization and traditional data consolidation?
When I look at what our most advanced customers are doing, and when I read what Gartner is saying about the logical data warehouse, or what EMA is saying about the hybrid data ecosystem, it is absolutely clear that a hybrid mix of data virtualization and traditional data consolidation will be the "best practice" data integration architecture going forward.
How will data virtualization solutions evolve to address future needs in these areas?
As they have over the past 10 years, data virtualization solutions will continue to evolve apace with business and IT trends.
For example, to date most data virtualization users have been skilled developers and administrators. This is reflected in today's data virtualization offerings. As data virtualization deployments expand and it becomes the preferred approach for enterprise information access, the range of business users is becoming larger and more diverse. Enabling these new communities to learn data virtualization quickly, and then use it productively, is critical to data virtualization's successful evolution.
Data volumes will continue to rise, as will the diversity of consumer and data sources. To stay relevant, data virtualization solutions will continue to drive their query optimization and caching techniques, as well as extend their already long list of source and consumer integration options.
Finally, as user counts rise and deployments broaden, data virtualization solutions will also require even larger-scale operations and more complete governance. This will ensure the consistency, compliance, and control that enterprises demand. These natural extensions follow the proven path already established by databases, data warehouses, business intelligence, and more.
In sum, data virtualization becomes easier, especially for the business, as well as faster, more broadly deployed, and more scalable. I hope I've laid out here what I see as an exciting road map that will yield significant business and IT value for data virtualization users.