In-Depth

Leveraging Legacy Data

For years, industry skeptics have prophesied the downfall of the mainframe, but in actuality, it remains the most common platform on which enterprise data resides. The reality is that legacy systems are not going away, and corporations will not scrap years of investment in legacy applications and replace it with an entirely new infrastructure.

The logical question is this: How can organizations leverage the legacy data that they have, while still taking advantage of open systems and packaged applications? There are a variety of solutions to the problem of leveraging legacy data for business value. Whether you build or buy, these solutions generally fall into one of three categories:

• A data mart or data warehouse.

• A system for migrating legacy data to a client/server application.

• Access to legacy data wherever it resides, using data access middleware.

Data Warehouses and Data Marts

Data warehouses and data marts are growing in popularity for a number of reasons. There is a need for comprehensive information to reduce costs and build competitive strengths. Data warehouses compensate for the lack of an enterprisewide data architecture.

Successful data warehouses provide a return on investment. However, data warehouse projects are not without pitfalls. Data warehouses can be expensive to build and take a considerable amount of time. Unfortunately, not all data warehouses deliver as promised – especially those in the 100 gigabyte to terabyte range. While they are serving the original purpose of storing large amounts of data well, they encounter difficulty keeping pace with business needs. There are characteristics, which will be discussed later, that offset these shortcomings, such as access to new and existing data sources, a more malleable infrastructure, faster support for new requirements and the ability to interact with a wide variety of front-end tools.

Data warehousing architecture is an umbrella term for the concepts, tools and system services used as components for high-level design of a data warehouse-based decision support environment. Among them are:

• Data models that represent a view of the information assets in an enterprise.

• Software tools for capturing data from source destinations; cleansing extracted data; mapping to warehouse databases; loading into databases; and so on.

• Relational database management systems to store and manipulate data.

• Metadata directory services for tracking and reporting on metadata, including data definitions, system usage rules, resource definitions and descriptions of available data.

• Interoperability middleware to provide connectivity between systems and databases.

• Data access tools for query, reporting and analysis.

• Warehouse management tools for managing data collection operations; archiving and backing up data; and securing and authorizing access.

Migrating to a Client/Server Application

Often legacy data in existing systems is inconsistent, incorrect and redundant, rendering it useless for true decision making purposes and creating "islands of data." To help solve this problem, many enterprises are replacing outdated legacy applications with client/server management applications from companies like SAP, Baan or Peoplesoft. These software packages promise to link operations within the enterprise, enhancing interoperability, information sharing and collaboration.

Buying the latest client/server application package is only the beginning of the solution for enterprises looking to leverage legacy data. These packages are not simple, shrink-wrapped solutions. Careful, well-thought-out implementation plans are required to make them work. Most importantly, migration of data to new platforms and applications must occur before the data can be accessed easily by users and new applications.

Enterprises that attempt major data migration projects face the following primary challenges:

• Incorporating existing legacy systems in the final infrastructure plan.

• Transforming data to fit into the new system.

• Migrating the data in a way that the IT infrastructure can evolve and change if the enterprise’s business environment changes.

• Migrating in a fast and cost-effective manner.

Today’s enterprises can take one of four approaches to data migration: Hand coding, data conversion engines, transformation engines or virtual database technology.

Hand coding. The traditional way of tackling data migration problems is to hire an army of software programmers. But this approach has several pitfalls, including significant expense and a tremendous amount of project management burden. Most importantly, the result is often an inflexible solution.

If business rules change, a previously hand-coded solution will have to be retooled continually. Enterprises conducting one-time conversions that are not highly complex may find their internal solution less expensive over the long run than a packaged tool. However, for companies that need to integrate five or more platforms in their environment, or who want to set up a data mart or data warehouse, attempting to build a solution in-house can prove to be unwise.

Data conversion engines. First generation conversion tools transform data by mimicking the code-generation process normally undertaken by programmers, essentially automating the job. This often involves generating, compiling and linking source code files from one platform to another. This method does provide advantages over hand coding because it reduces the number of programmers necessary and can be good at documenting metadata. However, challenges do exist.

Programmers must often spend several weeks learning how to use these tools, which cuts into the amount of time saved by automating the process. Secondly, many of the earlier code generation tools support only a limited set of transformations, forcing programmers to exit into other routines to perform custom transformations. Finally, programmers cannot quickly test or modify the code that generators write, resulting in programmers repeating a dozen or more steps each time a modification is required.

Transformation engines. These tools transform data in real time within a specialized engine, applying transformation functions "on the fly" within the engine’s memory, and outputting properly formatted data for loading into a target database. In this case, there is much less code to generate and fewer files to manage, allowing developers to spend more time on creating source-to-target mappings and transformation rules.

Virtual database technology. The idea of virtual databases is to avoid moving and loading gigabytes of data, in anticipation that it might include specific information that might be used for analysis in the future. In addition, virtual databases avoid the significant cost of redesigning and repopulating data warehouses when business needs change or new systems are added. Virtual databases also can prove to be the most effective solution for rapidly migrating data to client/server applications.

Using Data Access Middleware

A third way to leverage legacy data is to use data access middleware. While data warehousing may involve physically moving data from old data sources to new ones, data access middleware acts as an independent liaison between old and new systems. Ideally, a successful middleware infrastructure creates a single, distributed enterprise that is more scalable, reliable and functional than its individual parts. Data access middleware is one of the fastest-growing approaches for leveraging data from legacy systems. Middleware is no longer a bandage; it has become a strategic part of healthcare IT environments.

The use of middleware to access legacy data is popular for the following reasons:

• Data access middleware is an effective solution to the "islands of data" problem. Islands have arisen from years of decentralized decision making, which allowed each department or division to pick the best technology for their specific need, but hampered the ability to get global views for decision making.

• Data access middleware avoids having to fully replace old systems. Healthcare IT groups need to leverage the sizable investments in their existing infrastructure, which often include mainframes continuing to process large batch jobs.

• New intranet technologies are beginning to unify and extend the distributed enterprise beyond the virtual walls of the business and is resulting in the standardization of a network platform (TCP/IP), as the platform for retrieval and analysis (the Web). The growth of Web browsers as the primary window into corporate information systems is only one driver of enterprise system use. Systems must scale across networks – a formidable task that middleware can make easier.

• Middleware can provide a single infrastructure that allows system managers to look at the entire enterprise as a single, virtual machine. Management is increasingly important as TCP/IP eases the connection of machines throughout a corporation. Middleware also allows healthcare IT to retain control in an increasingly decentralized world.

• Middleware reduces the complexity of underlying legacy systems, giving users a common interface through which data is accessed and applications are developed.

However, middleware is not a panacea. Increasingly, operating systems, database management systems, application development tool run times and client/server application packages are being designed with a product architecture that embeds its own middleware-like capabilities. This can be referred to as "embedded connectivity technology." This is a tempting choice because it is one way to avoid additional investments in middleware. However, organizations can become too dependent upon the particular proprietary technology of the systems which has embedded the technology, meaning decreased flexibility with respect to future technology purchases and, therefore, increased costs.

Independent middleware solutions make more sense, since they give users more flexibility. By the same token, enterprises investing in middleware must be careful not to create a new "islands" problem as a result of working with multiple vendors. There is no guarantee that multiple products will be standardized or interoperable.

Additional concerns revolve around enterprisewide scalability, costs of middleware packages and application-to-application connectivity in a multi-tiered architecture.

In the rush to bring a legacy data management system online, many enterprises acknowledge, but fail to act on, the issue of data cleansing. One reason cleansing is so important to healthcare organizations is that it is used across multiple systems, including patient medical records, insurance coverage information, even pharmacy records. In this case, the impact of data that has not been cleansed can move from business threatening to life threatening.

The Virtual Database

This article has discussed data warehousing, data migration to client/server applications, and data access middleware as three approaches to leveraging legacy data in today’s enterprises. As a practical matter, these three approaches can be implemented in various combinations. Regardless of which solution holds the best promise for your particular enterprise, effectively delivering true business information requires that these operations be performed:

• Connecting

• Accessing

• Transporting

• Building

• Transforming

• Cleansing

• Integrating

• Modeling

• Storing

• Displaying

The first step in building a virtual database is to determine the sources of data that will produce the business information needed. The virtual database should then be able to map existing systems, collecting information on the tables, fields and permissions that exist in databases, such as Oracle, Sybase, Informix, or in flat files on a variety of hosts. The information in these disparate sources is then mapped to a "meta catalog" which simply is a road map of what data exists and how to access it. This enables the virtual database system to create on-demand views of information combined from multiple sources.

Integrated with the virtual database is a "data transformation" engine that reduces the complexity of data extraction and integration, while off-loading such work from production systems. Transformation engines provide a scalable, multitasking architecture that works asynchronously and includes GUIs for mapping data sources and applying business transformation rules. Transformation engines improve data throughput and reduce the time and cost, whether populating data marts and warehouses or performing large-scale data migration. A single, integrated management system can access multiple, disparate data sources, transform data into business information and deliver information to multiple destinations, including Web-based front ends.

A virtual database with a data transformation engine is a powerful combination, regardless of which approach you take to leverage legacy data – warehousing, migrating to applications, or real-time access.

About the Author: Robert Lewis is President of Enterworks, (Ashburn, Va.). He can be reached at [email protected]

Must Read Articles