In-Depth
Coming to Terms with Mainframe Data Access
Sooner or later, data warehousing pros will have to come to terms with the remarkable staying power of the mainframe.
It's no secret: data warehousing (DW) professionals seem plumb perplexed by the staying power of the venerable mainframe. Many, it seems, would rather that mainframe systems just went away.
A new report from TDWI, however, notes that neither mainframe systems – nor the applications they're hosting – have a secure future. Big shops, at least, have come to terms with Big Iron. In some cases, recent market research suggests, shops actually plan to expand their use of Big Iron as a data processing platform. This jibes with mainframe ISVs' reports. It continues to confound data warehousing technologists, however.
Given their druthers, many DW pros would prefer to bid Big Iron adieu.
"My personal view is that people still running mainframes are doing it because the cost [e.g., software, labor, operations] to migrate is not worth it, or because they're Luddites who don't want to learn anything new," says one DW professional. "I find it hard to justify expanding a platform for which talent is increasingly rare, [the cost of] which is so high … , and which is really inflexible and hard to integrate into today's technology landscape."
This DW technologist isn't a knee-jerk anti-legacy professional. "I'm not including the AS400/iSeries/whatever it's called today," this person stresses, noting that IBM Corp. pitches System i to a "very different audience needing [the same] low-cost/just-works/low-staff platform that they've had for 30 years. [There are] still a lot of them out there."
More to the point, this person says, Big Blue explicitly markets System i as a jack-of-all-trades platform. The upshot, this person observes, is that one can credibly speak of System i-based data warehouses -- running either IBM's own special sauce (e.g., the version of DB2 that's bundled with the OS/400 and System i) or -- more recently -- powered by a mix of offerings from both Big Blue and third-party ISVs. (Some boosters even like to pitch System I -- nee, AS/400 -- as the original data warehouse appliance.)
Data warehousing probably isn't the first thing that comes to mind when one thinks of Big Iron, however. After all, for many data warehousing pros grew up with relational database management systems (RDBMSes) running on some flavor of RISC-Unix or x86-powered systems -- and which has received Dramatic Renewals of Purpose from both DW platform specialists and specialty analytic database providers -- the mainframe is the odd platform out.
To the extent that DM or DW practices recognize the mainframe, it's typically as an encumbrance: e.g., Big Iron as notorious host to non-relational data sources (such as VSAM or IMS) which must be accommodated. Indeed, many IT shops still use largely "old" methods to get IMS or VSAM data off their mainframes, industry watchers note.
"Most people get data off the mainframe using programs and flat files that are transferred [often via FTP]. Some are using replication or transaction capture to non-intrusively obtain data. Some are using direct connections and SQL [via] DB2, mainly," explains veteran data warehouse architect Mark Madsen, a principal with BI and DW consultancy Third Nature.
This is in spite of the existence of inexpensive, high-performance, and elegant alternatives, notes research analyst Philip Russom, author of TDWI's Checklist Report on Mainframe Modernization.
For starters, Russom points out, IBM has led a sustained effort to help drive down mainframe TCO, chiefly by developing and marketing low-cost "processor engines" that support Java, data processing, and Linux workloads. Second, and just as important, mainframe ISVs have responded in kind, introducing products that run partially or entirely in the context of Big Blue's zSeries Integrated Information Processor (zIIP), zSeries Application Assist Processor (zAAP), or Integrated Facility for Linux (IFL) engines.
"Specialty engines are exactly like a mainframe's general Purpose Processor (GPP), except that specialty engine workloads do not count against mainframe MIPS or MSUs, and the speed at which they run is not restricted," writes Russom. "Mainframe integration middleware -- when specifically designed to exploit specialty engines in accordance with IBM's authorized use -- can shift loads associated with data integration and application integration to the zIIP and zaaP engines, respectively."
Thanks to the availability of IBM's zIIP engine, in particular, it doesn't cost shops as much to run data processing-intensive workloads on their mainframe systems.
"These specialty engines provide new possibilities for native data and integration processing on the mainframe -- all at a relatively low cost point," he writes. In fact, Russom says, the cost of hosting ETL or data quality workloads on a mainframe is (or can be) comparable to hosting similar workloads on "open" systems.
"If you need to process data natively on a mainframe and you have the zIIP engine plus tools that will run well on zIIP, then you can do native mainframe data processing -- especially data integration and data quality tasks -- at prices comparable to open-system-based solutions," he explains, in a follow-up interview with BI This Week. "I have found companies using IFL, zIIP, and zAAP; they're happy with these, plus their engine-based tools from third parties."
The most common approach, Russom asserts, is for an organization to extract data from a mainframe data source and offload it to a data warehouse running on a non-mainframe platform, transforming it -- if necessary -- along the way.
"I've encountered users who have [data on the mainframe and] a data warehouse on an open systems platform," Russom continues, explaining that "the specialty engines -- usually zIIP, sometimes IFL -- process data coming from a mainframe data source, on the mainframe, so that the dataset is smaller, cleaner, and more standardized before it hits [the] open-systems, data-integration tools and eventually the data warehouse."
That's precisely the approach espoused by Jeff Overton, senior product manager with mainframe ISV Progress DataDirect. Back in 2006, DataDirect acquired Neon Systems, developer of a mainframe application and data integration product called Shadow. Just last month, DataDirect unveiled a new version of Shadow that supports relational-to-non-relational data processing via zIIP.
By data processing, DataDirect doesn't just mean shuffling bits from one platform to another. Instead, Overton says, Shadow's new SQL offload capability can also perform transformations -- the "T" in ETL -- on the inexpensive zIIP engine. That being said, DataDirect isn't a disinterested party: to the extent that it can convince shops to keep data processing workloads on their mainframes (or in some cases to shift prodigal data processing workloads back to the mainframe) it stands to benefit.
That's also true when it comes to touting Shadow as a zIIP-based ETL tool -- or, alternately, as a zIIP-based data quality engine. Nevertheless, Overton argues, zIIP-powered SQL offload is the kind of capability that sells itself.
"A lot of these guys [viz., mainframe data management groups] are very protective of their data. If you come to them and you … want to access data on the mainframe using [programmatic] SQL or some other technology [e.g., third-party ETL connectivity], they're liable to tell you they don't want your SQL [database] or your ETL interfacing with their highly volatile transaction data on the mainframe," he observes.
"If you can leave the data in-place and access the data in-place, everybody's happy. That helps you reduce your latency issues, reduce your infrastructure costs, and allows you to reduce your development time because if you're able to go directly against that data on the mainframe and treat an IMS database or CICS data as [you would data from] Oracle or Microsoft SQL Server, then that allows you to expand the pool of developers that are available for mainframe development projects. It really is kind of an 'Aha!' moment, if you will."
If nothing else, Overton concludes, DW pros need to come to terms with the mainframe. He cites a number of recent trends -- including encouraging mainframe revenue and MIPS growth (relative to non-mainframe platforms) -- that he says augur in the mainframe's favor.
"We have talked to a number of analysts and what we've seen in our customer base and in the market in general is that while there are certain applications that customers are choosing to migrate off the mainframe, those are … applications that have few dependencies on other business processes or mainframe services.
"What we're seeing much more frequently is really the [type of] modernization effort where they're saying, 'My data is on the mainframe or these core business processes are on the mainframe, and it's the right platform for me for performance, scalability, reliability, security, and -- believe it or not -- for total cost of ownership," he argues.
"For those organizations, they're still seeing the mainframe as the platform for a significant amount of their data processing. When they talk to us, one of their rationales is … they're looking to get as long an operational life out of an application as they can because they've made significant investments in building that application. If they don't have to rewrite the application, and they can keep their [software] maintenance costs very low, then they can get a very nice total cost of ownership on that."
More to the point, Overton continues, many shops are looking to do much more with their mainframe data, particularly when it comes to involving Big Iron-bound operational data in ongoing DW and BI initiatives.
"Those [mainframe-based] business processes are still generating data, and there are a lot of cases where [because of] latency issues or because of infrastructure costs or things like that, it can be beneficial if I can get to that data [on the mainframe] directly. It's no longer good enough to just transfer that [data] in bulk [i.e., as part of a periodic batch process]. Maybe I need to get to it [where it is] for my time-sensitive business intelligence initiatives, or for my packaged applications that are running off-host," he maintains.
"We had an insurance provider [customer] who undertook an initiative to modernize their mainframe. Their challenge was, 'I have [to take] my claims processing … to the point where I can know what my liability is at the time that a service is being rendered. Before, there was a significant delay. They wanted to take that to real-time. But for them, moving that data off [the mainframe] wasn't an option. They said, 'We want to do this on the mainframe. The operational data is on the mainframe. That's where we want to keep it.' So they built a new [claims processing] application that interfaces with the [operational] data on the mainframe."