A New Frontier for Anytime Data: Data Integration on the Mainframe

New tools previously unavailable on mainframes can now enable native, simplified, cost-effective data integration with increased reliability, scalability and performance.

After years of being marginalized, the mainframe is more popular than ever before. IBM zSeries sales are growing by double digits. Declining price/MIPS and new CPU and software technologies and are making mainframes more attractive for more applications. Many mission-critical applications that never left the mainframe continue to grow in size. Despite this resurgence, the mainframe market remains seriously underserved by open-systems technologies.

Organizations that use the mainframe as their primary operating platform—for transactions as well as data integration (DI) and data warehousing (DW)—need open tools to achieve the economies of scale and flexibilities enjoyed by the open-computing world. Java and XML on the mainframe represent significant progress, but further major steps are needed. The ability to perform DI natively on the mainframe, in support of on-demand data access, and using technology previously found only on Unix, Windows, and Linux-based is one such step.

Key to this picture is that sites that previously had to bring data down and put it back up on the mainframe no longer have to endure that cycle. For example, populating a mainframe data hub for a single view of customer data is made more efficient because processing data closer on the mainframe improves performance and scalability (due to reduced network overhead) and minimizes character set conversions (less data translation between platforms is needed).

Integrating mainframe data has traditionally been handled through hand coding. In many instances, that's a risky, costly, and change-resistant proposition. Typically, custom hand-coded extract routines generate large flat files that, once used, are transferred from the mainframe using FTP technology for DI processing. This process is error-prone, non-repeatable, and non-reusable. With up to 70 percent of all corporate data currently on mainframes, a better alternative is to perform DI processing closer to where all this data resides—and where organizations can take advantage of mainframe reliability, manageability, and performance. The benefits can be categorized into four groups:

  • Reduced risk
  • Substantially lower costs
  • Increased flexibility
  • More efficient (and more mainstream) use of mainframe resources

Reduced Risk

DI on the mainframe improves the reliability and availability of critical applications, as well as the integrity and security of their data, because data doesn’t have to be shipped to another platform for format conversion and DI processing. It stays in its native environment, under close supervision and control. Further, manageability of DI processing can be ensured with proven mainframe tools for scheduling and monitoring processes and tasks, thus ensuring that host operational systems are not negatively impacted.

The integrated data will be used by the enterprise at large and will ultimately leave the mainframe environment. DI on the mainframe enables better control over this process and enables data owners to keep data where it resides, saving time and reducing operational costs.

For example, mainframe-ready DI technologies include “push,” not just “pull,” architectures. If the data owner requires tight control, the data can be pushed. If open access is encouraged, then the pull mode or a mixture of push and pull can be employed. In addition, mainframe-class safeguards to data integrity and security can be employed, including facilities for two-phase commits, rollback and recovery; encryption; and mainframe security solutions.

Data from other systems can also be brought into the mainframe environment for integrating with native data. Its DI processing inherits all the mainframe fundamentals just discussed.

Reduced Costs

With network costs showing the same upward momentum as gasoline prices, DI on the mainframe can also help companies eliminate needless bulk transfers and thus avoid unnecessary costs. This is partially a function of keeping data on the mainframe for processing. It is also a function of being able to “slim down” the volume of data that is ultimately required by end-user applications and databases.

Changed data capture (CDC) techniques enable you to leverage changed-only data, minimizing the need for bulk processing. Performing CDC natively on the mainframe, where and when changes occur, delivers far-reaching benefits. The key is to access changed data via a native API as the changes happen. This avoids modifications to mainframe programs and the host-system performance degradations that have traditionally characterized hand-coded solutions. Accessing changed data natively removes the need to tie up application logic to identify and store changes. Hence, there is no performance impact on the source application. Instead, changed data can be captured and moved out of the system immediately as a “change stream.”

Part of the appeal of this approach lies in its data delivery flexibility. For example, a change stream can be directed towards a single requestor, or the data can be persisted and made available on demand (weekly, daily, or in real time) to multiple requestors requiring the same mainframe data. These can include applications, databases, data integration engines, and real-time message queues, all with different latency requirements. You need to create the stream only once, and any number of requestors can drink from it at their own rate—a far cry from building costly hand-coded extract routines for each new request.

Significantly for mainframe owners, change streams can also be used to keep applications concurrent during migrations, as when migrating from IMS to DB2. CDC on the mainframe also can be used to help drive data consistency across systems by making certain that changes are propagated when and where they’re needed to all relevant applications.

Designed Once, Deployed Everywhere

Companies that can derive maximum value from their DI investments are the ones taking an architected approach to integration. A modern, metadata-driven, object-oriented DI platform running natively on the mainframe can deliver all the benefits of such an approach, including reduced costs and complexity, increased developer productivity, and rapid time-to-implementation of DI projects.

While a custom hand-coded mainframe-data extract routine can get you the data you need, you often pay a major penalty in developer time, testing, processing, and network utilization. Multiply that by the number of required custom routines (not just today’s but the ones required tomorrow) and you end up with a development and maintenance nightmare. Even when the custom routines get the right data, they hardly ever provide visibility into metadata, which is essential to today’s analytic applications, compliance initiatives, single views of “x,” and other strategic undertakings.

In contrast, an architected approach to DI on the mainframe delivers a unified platform, a standard tool set, and a standard set of processes to reduce complexity and enable true “design once, deploy everywhere” functionality. For example, today’s open DI platforms provide graphical, point-and-click DI development environments that enable mainframe owners to speed development and avoid dependence on expensive programming resources. They are designed to promote reuse of transformation logic, data mappings, skills, and best practices across projects and development teams to further reduce costs and ensure consistency. Being metadata-driven, they enable metadata to be captured, managed, and shared as part of the overall DI process.

Extended Mainframe Investments

DI on the mainframe is a double-edged sword—it underpins integration projects with mainframe reliability, scalability, and performance, and it helps mainframe-centric IT shops lower operational costs and speed project delivery, benefits already received by the wealth of companies deploying DI on Unix-, Windows-, or Linux-based systems. In both cases, DI on the mainframe helps companies extend their mainframe investments.

Declining price/MIPS, powerful multi-core CPUs, storage and software virtualization—DI on the mainframe places all this at the service of data on demand. A properly designed DI platform, meanwhile, can provide such capabilities as real-time integration, parallel and always-on session execution, and distributed workflow processing to make full use of the performance and scalability intrinsic to mainframe environments. Leveraging existing mainframe MIPS for DI processing lets you take advantage of this synergy to extend the value of those MIPS.

At the same time, you are modernizing and standardizing your mainframe environment and bringing it further into the enterprise information mainstream. You gain an architected approach to integrating mainframe data that can interoperate seamlessly with DI on other hardware/software platforms. You gain new economies of scale and flexibilities while breaking dependence on expensive (and increasingly harder to find) mainframe programming resources. Furthermore, you enable your organization with portable knowledge and skills—mainstream skills that are applicable both on and off the mainframe.