IBM's zIIP a Boon to Big Iron Data Management

Big Iron data warehousing might be a pipe dream, but on balance IBM's zIIP engine has provided a bonus for data management in mainframe environments.

When IBM Corp. introduced its zSeries Integrated Information Processor (zIIP) almost four years ago, it argued that zIIP, like its predecessors, would give mainframe shops more bang for their buck, as well as induce some Big Iron backers to shift more workloads on to -- or in some cases, to transfer workloads back to -- their zSeries mainframes.

A secondary goal, one that Big Blue worked hard to flesh out over the last few years, was to reposition the mainframe -- with its quiver of application- or workload-specific specialty processor engines -- as a data hub.

A third goal -- one which, in a sense, goes beyondIBM's vision of the mainframe-as-enterprise-data-hub -- was to generate excitement about the mainframe as a host platform for processing-intensive database workloads such as data warehousing (DW).

IBM has seen some success with respect to the second goal. Several ISVs are exploiting zIIP in a clever, not-exactly-textbook fashion: chiefly as a means to offload the data processing activity associated with non-database workloads. BMC Software Inc. and CA Inc., for example, have used this approach to incorporate zIIP support into some of their mainframe management tools. CA also supports zIIP on both of its non-relational databases (Datacom and IDMS), whereas Progress DataDirect -- which acquired the former Neon Systems shortly before IBM introduced zIIP -- recently introduced a new revision of its Shadow product that supports relational-to-non-relational data processing via zIIP. Meanwhile, IBM itself supports zIIP on its Big Iron data stores (VSAM, IMS, DB2) and Oracle supports zIIP on the Big Iron flavor of its database.

Although Big Blue initially pushed zIIP primarily for use with DB2 running on z/OS -- in other words, as an accelerator for relational data -- it's in conjunction with non-relational data that zIIP's benefits are perhaps most keenly felt. For sources such as IMS, VSAM, IDMS, or Datacom, zIIP is a godsend, in that it gives Big Iron shops a cost-effective means to extract data from hierarchical or non-relational databases and transform it (into relational data) so that it can be consumed by an RDBMS. In the absence of zIIP, such processing would either have to be done in a costly Central Processor (CP) context or offloaded to a non-mainframe platform, typically via an extraction, transformation and loading (ETL) tool.

ETL tools, particularly ETL technologies that support mainframe connectivity, tend to be expensive, but many shops opted for a lesser-of-two-evils approach: in this case, the cost of a third-party ETL was typically less than that of doing ETL on System z itself.

zIIP may be seen as a boon to data management on System z. For one thing, it encourages shops to host (or expand their hosting of) RDBMS workloads on Big Iron. In addition, zIIP gives enterprises a cost-effective means to host intensive data processing workloads (e.g., ETL) on their mainframe systems. The latter scenario isn't just a cost-effective solution; it's also a sound data management practice.

IBM's progress with respect to its third goal -- that of recasting Big Iron as a platform (if not a platform par excellence) for data warehousing -- has been less clear cut. On paper, Big Iron data warehousing doesn't seem like a completely far-fetched idea. For one thing, the mainframe plays host to a lot of the data -- relational, non-relational, or semi-structured; transactional, operational, or otherwise -- that could be consumed by an enterprise data warehouse. Thanks to the availability of low-cost options like zIIP (and support for next-gen workloads such as Linux and Java, via a pair of other processor engines), the mainframe, with its legendary reliability and transaction-processing brawn, could notionally function as a platform for an enterprise data warehouse.

The reality, of course, is that comparatively few shops have standardized on the mainframe as a platform for data warehousing. So says Philip Russom, director of research with The Data Warehousing Institute (TDWI).

"Very few mainframes on a percentage basis host a true data warehouse. The DWs on mainframes mostly migrated to open systems as part of Y2K preparations in the 1990s. Two of my biggest consulting projects this decade involved migrating mainframes that eluded attention during Y2K," says Russom, who assesses mainframe-based data management in his most recent TDWI publication, Checklist Report on Mainframe Modernization.

One issue is that mainframe systems are rarely, if ever, deployed as single-purpose platforms; one typically doesn't see a DW-only mainframe, yet as DW is practiced in most environments, it is a single-platform proposition; systems that host data warehousing workloads are rarely tasked with doing anything else. (On the other hand, Oracle Corp.'s pitch with its new Exadata Version 2 appliances is that shops can and should host both OLTP and DW workloads on the same systems.)

"I've never encountered a mainframe with the single purpose -- or even primary purpose -- of data warehousing," says Russom. "The mainframe continues to serve the primary purpose of supporting operational and transactional applications, in the context of very stringent requirements for scaling to massive data and/or transaction volumes with the five nines of high availability," he continues, adding that -- in its bread-and-butter context -- "the mainframe is hard to beat, which is why it will be with us for many years to come."

Veteran data warehouse architect Mark Madsen, co-author of Clickstream Data Warehousing and a principal with BI and DW consultancy Third Nature, concurs with Russom's assessment.

"I don't see a lot of people doing the work [i.e., the data processing] on the mainframe, but there are a few," he says, stressing that the overriding concern for most shops is how to get data off the mainframe as quickly, cheaply, and efficiently as possible.

"Most people get data off the mainframe using programs and flat files that are transferred. Some are using replication or transaction capture to non-intrusively obtain data. Some are using direct connections and SQL [via] DB2, mainly," he explains.

Although the availability of zIIP hasn't led to an explosion in Big Iron-based data warehousing activity, it -- along with its specialty processor cousins -- has arguably made data management in mainframe shops both easier and cheaper, Russom continues.

"[C]ompanies using IFL, zIIP, and zAAP … [are] happy with these, plus their engine-based tools from third parties. I've encountered users who have a data warehouse on an open systems platform -- running Unix, Linux, or Windows -- and the specialty engines (usually zIIP, sometimes IFL) process data coming from a mainframe data source on the mainframe so that the dataset is smaller, cleaner, and more standardized before it hits open system data integration tools and eventually the data warehouse," he points out.

This isn't just a more elegant or efficient approach, Russom stresses: it's a sound data management practice. Prior to the availability of something like zIIP -- or absent the ability to do mainframe-based ETL at less cost (e.g., using Java or Linux) -- doing as much on the mainframe just wasn't a cost-effective option.

"Ideally, you should process mainframe data natively on the mainframe -- to cleanse it, reduce its volume, or otherwise improve it -- before passing it to other platforms," Russom writes in his TDWI report. "After all, data sets drawn from legacy platforms usually have significant quality issues, and data volumes extracted from a mainframe tend to be large."

He envisions a number of other potential use cases for the zIIP engine in particular -- including as an affordable data quality accelerator for mainframe databases. "Some users cleanse or otherwise improve mainframe-based data with a tool running on an IFL or zIIP, then reload the improved data back into mainframe databases," he points out. "You might also set up a similar solution -- using a specialty engine -- that integrates data among mainframe applications and databases."