IBM Enables Native Apache Spark Data Processing on z Systems Mainframes

After going all in on Apache Spark with a massive investment in the Big Data processing technology, IBM today announced it has enabled native Spark analytics on mainframes running the z/OS operating system.

"IBM z/OS Platform for Apache Spark enables Spark, an open-source analytics framework, to run natively on the z/OS mainframe operating system," the company said in a statement today. "The new offering, available now, enables data scientists to analyze data in place on the system origin, without the need to extract, transform and load (ETL), by breaking the tie between the analytics library and underlying file system."

IBM last June announced a massive research and development investment in the open source analytics framework, while calling it "potentially the most significant open source project of the next decade." Since then it has revamped many of its data solutions while incorporating Spark technology and come out with Spark-based solutions, of which IBM z/OS Platform for Apache Spark is the most recent.

"As businesses of all sizes transform into real-time digital organizations, they must be able to get a clear picture of all their enterprise data without the excessive time and risk of ETL," said exec Rod Smith. "With Apache Spark enabled natively on IBM platforms -- now including z Systems -- customers can perform analytics alongside the transactional systems that house key data, while drawing contextual insights from other data sources, enabling them to engage with customers and generate revenue in real time."

While mainframes are often ignored in the new cloud-first, enterprise mobility and Big Data era, IBM said its z Systems still handle critical data transactions for much of the world's banking, insurance, retail and transport companies. The z Systems mainframes feature "the industry's fastest commercial microprocessor and the ability to perform in-transaction analytics, scoring predictive models within a transaction in 2 milliseconds or less," IBM said. "Organizations can now leverage these capabilities, applying advanced in-memory analytics through Spark without moving data off the mainframe, saving time and money and limiting risk."

The company said its new platform helps enterprises glean data insights via the following:

  • Streamlined development: Developers and data scientists can use their existing expertise with programming languages such as Scala, Python, R and SQL to reduce time to value for actionable insights.
  • Simplified data access: Optimized data abstraction services remove complexity, providing seamless access to enterprise data in traditional formats such as IMS, VSAM, DB2 z/OS, PDSE or SMF with familiar tools via Apache Spark APIs.
  • In-place data analytics: Apache Spark uses an in-memory approach for processing data to deliver results quickly. The platform includes data abstraction and integration services that enable z/OS analytics applications to leverage standard Spark APIs. This allows the organization to analyze data in-place, avoiding costly processing and security considerations associated with ETL.
  • Open source capabilities: The platform offers an Apache Spark distribution of the open source, in-memory processing engine that is designed for Big Data.

IBM said the IBM z/OS Platform for Apache Spark is available now for download by developers working with z/OS.

About the Author

David Ramel is the editor of Visual Studio Magazine.