Pentaho Releases Kettle’s Big Data Capabilities as Open Source

Developers, analysts, and data scientists gain industry’s first free Apache-licensed open source data integration tool for operationalizing big data management, analytics.

Note: ESJ’s editors carefully choose vendor-issued press releases about new or upgraded products and services. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the vendor's statements.

Pentaho Corporation has made freely available under open source all the big data capabilities in its new Pentaho Kettle 4.3 release and has moved the entire Pentaho Kettle project to the Apache License Version 2.0. Because Apache is the license under which Hadoop and several of the leading NoSQL databases are published, this move will further accelerate the rapid adoption of Pentaho Kettle for Big Data by developers, analysts, and data scientists who want to operationalize big data.

Big data capabilities available under open source Pentaho Kettle 4.3 include the ability to enter, retrieve, manipulate, and report on data using the following Hadoop and NoSQL stores: Apache Cassandra, Hadoop HDFS, Hadoop MapReduce, Apache Hive, Apache HBase, MongoDB, and Hadapt’s Adaptive Analytical Platform.

In addition, Pentaho Kettle makes available job orchestration steps for Hadoop, Amazon Elastic MapReduce, Pentaho MapReduce, HDFS File Operations, and Pig scripts.

Pentaho Kettle can execute ETL transforms outside the Hadoop cluster or within the nodes of the cluster taking advantage of Hadoop’s distributed processing and reliability.

Pentaho Kettle’s Hadoop capabilities work with all major Hadoop distributions: Amazon Elastic MapReduce, Apache Hadoop, Cloudera’s Distribution including Apache Hadoop (CDH), Cloudera Enterprise, Greenplum HD, HortonWorks Data Platform powered by Apache Hadoop, and MapR’s M3 Free and M5 Edition.

Pentaho Kettle for Big Data delivers the following benefits to developers, analysts and data scientists:

  • Delivers at least a 10x boost in productivity for developers through visual tools that eliminate the need to write code such as Hadoop MapReduce Java programs, Pig scripts, Hive queries, or NoSQL database queries and scripts

  • Makes big data platforms usable for a huge breadth of developers, whereas previously big data platforms were usable only by the geekiest of geeks with deep developer skills such as the ability write Java MapReduce jobs and Pig scripts

  • Enables easy visual orchestration of big data tasks such as Hadoop MapReduce jobs, Pentaho MapReduce jobs, Pig scripts, Hive queries, HBase queries, as well as traditional IT tasks such as data mart/warehouse loads and operational data extract-transform-load jobs

  • Fully leverages the full capabilities of each big data platform through Pentaho Kettle’s native integration with each one, while enabling easy co-existence and migration between big data platforms and traditional relational databases

  • Provides an easy on-ramp to the full data discovery and visualization capabilities of Pentaho Business Analytics, including reporting, dashboards, interactive data analysis, data mining and predictive analysis

More information is available at

Must Read Articles