Q&A: Enterprise Data Sharing Builds on Data Warehousing

How enterprise data sharing builds on earlier models such as data warehousing and XML standards to enable data sharing at an enterprise level.

The concept of enterprise data sharing builds on earlier models for data integration, such as enterprise data warehousing, but uses a virtualized approach. The shared data isn't physically consolidated, as with a warehouse, but instead is called from a virtual server by XML-standards-based Web services. At run time, users can access data through applications, reports, or mash-ups using these Web-based data services.

In this interview, Robert Eve, executive vice president of marketing for Composite Software, Inc., an independent provider of data virtualization software, describes how enterprise data sharing builds on earlier models such as data warehousing and XML standards to enable data sharing at an enterprise level.

BI This Week: What is meant by the term "enterprise data sharing"?

Robert Eve: Enterprise data sharing is a new data integration pattern that large enterprises and government agencies are using to share data more broadly, and to solve wider business problems in a more consistent way.

Here are some examples:

  • An energy company with more than one dozen refineries worldwide is using enterprise data sharing to provide disparate refinery data to diverse technical and business user communities. That will allow them to increase refinery yields, proactively maintain equipment, and comply with a myriad of regulations.
  • A pharmaceutical company is using enterprise data sharing to provide a wide range of research and clinical trials data across R&D, marketing, and manufacturing teams, so they can get new drugs to market faster.
  • Enterprise data sharing is in place at federal intelligence agencies for sharing critical information across the agencies. For example, when a ship arrives at a U.S. port, the U.S. Coast Guard shares passenger, crew, and manifest data with other agencies such as the Drug Enforcement Administration and the Department of Homeland Security.

Three key components combine to enable enterprise data sharing: XML industry standards, Web data services, and data virtualization middleware. Data virtualization builds on earlier enterprise-scale data integration efforts, such as enterprise data warehousing, by leveraging data from consolidated enterprise data warehouses as well as other original and external sources. However, unlike enterprise data warehousing, the data itself is not physically consolidated in a central location. Instead, enterprise data sharing uses a virtualized approach.

Does shared data often extend beyond a single enterprise?

Yes, and that's why industry-wide XML data standards are important. The standards are used to establish a common, agreed-upon standard for the consumption of data by all users. This common ground simplifies data understanding by everybody who needs to share data, as well as the IT teams who provide it. These XML industry standards-based data abstractions are typically developed as Web data services.

In an oil refinery case, a process manufacturing standard called MIMOSA ensures that pump data is formatted the same way across all the refineries. Therefore, a process engineer studying pump failures to optimize preventive maintenance finds it easier to access and use data from any of the pumps in any of the refineries, significantly improving the quality of the analysis while decreasing the time required to get the data.

What role does data virtualization play in enterprise data sharing?

Data virtualization provides the technical foundation for enterprise data sharing. At its most basic level, data virtualization integrates data from multiple, disparate sources -- anywhere across the extended enterprise -- in a unified, logically virtualized manner for consumption by nearly any front-end business solution. That can include reports, portals, mashups, applications, search, and more.

By accessing the data from original or already-consolidated data warehouse sources, data virtualization avoids the need for additional physical consolidation and replicated storage of source data, making it faster to build and lowering the cost to operate when compared to data-warehouse-only integration approaches.

As a middleware technology, data virtualization has advanced beyond high-performance queries or enterprise information integration (EII). Data virtualization is typically deployed in two ways: on a project basis and enterprise-wide. On an individual project basis, it complements other data integration approaches by providing the data required in support of a specific application or use case, such as a virtual data mart federating marketing campaign data from two merged companies. On an enterprise level, data virtualization may be implemented as a virtualization layer or data services layer in support of multiple solutions and use cases.

Developers use data virtualization's design tools during the design phase to develop semantic abstractions in the form of WSDL Web services or relational views for conforming diverse source data to industry standard formats.

At run time, user-level applications, reports, or mash-ups can call these Web data services on demand to provide the requested data. A high-performance data virtualization server accesses the data, and queries, federates, abstracts, and delivers this data to information consumers.

What does IT need to know about implementing data virtualization successfully? What does the business side need to know?

Several years ago, the biggest issue was understanding what data virtualization was and when to use it. With today's broad adoption at an individual project level, the more common challenge is how to implement data virtualization at an enterprise scale.

Leveraging already-developed data virtualization assets involves several steps:

First, take an inventory of existing data virtualization deployments. Seek abstracted relational views and Web data services that can be reused more broadly across the enterprise.

Next, to address people and process issues, it's key to consider how best to rationalize data virtualization design, development, and operation activities within an existing Integration Competency Center or shared services organization, as well as applications development teams.

Third, consider how proven enterprise-scale data virtualization patterns, such as enterprise data sharing I've described, can be deployed to simplify and accelerate wider data sharing.

Finally, it's imperative to build a rock-solid business case using hard business metrics such as increased sales, improved productivity, or reduced risks, as well as IT metrics such as lower costs, faster time to solution, and technology savings to fund the data virtualization investments required.

Given that list, what are some barriers to enterprise data sharing?

The biggest barrier is letting go of the idea that an enterprise data warehouse is the only way to share data on a large scale. Enterprise data warehouses have huge value and help solve many problems, but even after 15 years of deployment, only a fraction of enterprise users leverage them, and only a fraction of enterprise data is consolidated within them. Instead, think of an enterprise data warehouse as a key source of sharable data, along with many other sources, for consumption by a broader range of business and technical users.

Given this broad use, the second biggest barrier is often the lack of standards for the data that enterprises and government agencies want to share. XML-based industry standards, such as the MIMOSA standard for process manufacturers such as oil refineries, PIDX for upstream exploration and production, and MIEM for maritime information, provide common starting points that eliminate this barrier.

Industry standards are clearly important in order to share data. What other sorts of accelerators are important to enable wider enterprise data sharing?

As I mentioned, using industry standards to share data is a significant accelerator. In effect, these standards cut the abstraction problem in half by pre-setting how the data will be consumed. This allows the design and development resources to focus 100 percent on accessing and transforming the data to meet these already agreed upon standards.

A high-productivity data virtualization development environment to build all the Web services required is also an accelerator. Best-in-class tools allow application developers to develop top-down Web data services using the XML industry standards as the starting point. They also give data-oriented developers the ability to develop bottom-up of lower level data access and federation services from original and consolidated sources using more traditional SQL modeling techniques.

Don't forget that you need a strong business case, especially in today's economic environment. Enterprise data is only an asset if it can be leveraged to improve business performance. Enterprises that build a solid business case complete with ROI numbers typically see the value of enterprise data sharing fairly quickly.

What role does Composite Software play in data virtualization?

Composite Software, Inc. is an independent provider of data virtualization software. Composite's data virtualization middleware platform, Composite Information Server, scales from individual business applications to enterprise-wide Information-as-a-Service (IaaS) architectures and enterprise data sharing environments, automating the entire data virtualization life cycle, while complementing traditional data warehousing investments. Global organizations including 10 of the top 20 banks, 5 of the top 10 pharmaceutical companies, and leading energy, media, and technology companies along with U.S. defense and intelligence agencies, use Composite's technology to integrate disparate data -- regardless of location or source format -- and fulfill critical information needs faster and for less.

Must Read Articles