Hamburger Hamburger Hamburger

Press Release

January 30, 2012

Pentaho Open Sources Big Data Capabilities to Further Fuel Widespread Adoption

Developers, Analysts, and Data Scientists Gain Industry’s First Free Apache Licensed Open Source Data Integration Tool for Operationalizing Big Data Management and Analytics

January 30, 2012, — Delivering the future of business analytics, Pentaho Corporation, today announced that it has made freely available under open source all of its big data capabilities in the new Pentaho Kettle 4.3 release, and has moved the entire Pentaho Kettle project to the Apache License, Version 2.0. Because Apache is the license under which Hadoop and several of the leading NoSQL databases are published, this move will further accelerate the rapid adoption of Pentaho Kettle for Big Data by developers, analysts and data scientists as the go-to tool for operationalizing big data.

Big data capabilities available under open source Pentaho Kettle 4.3 include the ability to input, output, manipulate and report on data using the following Hadoop and NoSQL stores: Cassandra, Hadoop HDFS, Hadoop MapReduce, Hadapt, HBase, Hive, HPCC Systems and MongoDB.

With regard to Hadoop, Pentaho Kettle makes available job orchestration steps for Hadoop, Amazon Elastic MapReduce, Pentaho MapReduce, HDFS File Operations, and Pig scripts. All major Hadoop distributions are supported including: Amazon Elastic MapReduce, Apache Hadoop, Cloudera’s Distribution including Apache Hadoop (CDH), Cloudera Enterprise, EMC Greenplum HD, HortonWorks Data Platform powered by Apache Hadoop, and MapR’s M3 Free and M5 Edition. Pentaho Kettle can execute ETL transforms outside the Hadoop cluster or within the nodes of the cluster, taking advantage of Hadoop’s distributed processing and reliability.

Download, how-to docs, videos and more at

Pentaho Kettle for Big Data delivers the following benefits to developers, analysts and data scientists:

  • Delivers at least a 10x boost in productivity for developers through visual tools that eliminate the need to write code such as Hadoop MapReduce Java programs, Pig scripts, Hive queries, or NoSQL database queries and scripts;
  • Makes big data platforms usable for a huge breadth of developers, whereas previously big data platforms were usable only by the geekiest of geeks with deep developer skills such as the ability write Java MapReduce jobs and Pig scripts;
  • Enables easy visual orchestration of big data tasks such as Hadoop MapReduce jobs, Pentaho MapReduce jobs, Pig scripts, Hive queries, HBase queries, as well as traditional IT tasks such as data mart/warehouse loads and operational data extract-transform-load jobs;
  • Leverages the full capabilities of each big data platform through Pentaho Kettle’s native integration with each one, while enabling easy co-existence and migration between big data platforms and traditional relational databases;
  • Provides a super-easy on-ramp to the full data discovery and visualization capabilities of Pentaho Business Analytics, including reporting, dashboards, interactive data analysis, data mining and predictive analysis.

Quotes and Multimedia
“In order to obtain broader market adoption of big data technology including Hadoop and NoSQL, Pentaho is open sourcing its data integration product under the free Apache license. This will foster success and productivity for developers, analysts and data scientists giving them one tool for data integration and access to discovery and visualization.”

-- Matt Casters, Founder and Chief Architect Pentaho Kettle Project, Pentaho, @mattcasters

"Pentaho Kettle's powerful ETL enables developers and analysts to more quickly integrate MongoDB into their enterprise environments by allowing them to transform and report on data they have stored in MongoDB. Pentaho Kettle for Big Data is a great addition to the MongoDB ecosystem and 10gen looks forward to continuing to work with Pentaho to further develop this open source tool with the MongoDB community."

-- Erik Frieberg, VP of Marketing and Alliances, 10gen

“The Pentaho and Cloudera partnership allows our joint customers to more quickly integrate Hadoop within their enterprise data environments while also providing exceptional analytical capabilities to a wider set of business users. We applaud Pentaho’s decision to open source its big data capabilities under the Apache License; the technology they are contributing is substantial and is a big step forward in helping to accelerate adoption and make it easier to use Hadoop for data transformation.”

-- Ed Albanese, Head of Business Development, Cloudera

“EMC Greenplum’s Unified Analytics Platform for big data analytics leverages all of an organization’s data—structured and unstructured— and embraces the extended data science community with collaboration tools that empower, foster creativity and pave the way to untapped insights and opportunity.  Pentaho’s move to open source its big data capabilities, will help accelerate big data adoption by making it easier for those data scientists to more quickly explore and visualize their data to gain those insights.”

-- Jim Totte, Director of Marketing, EMC Greenplum

“Hadapt allows customers to analyze their structured and unstructured data together in a single platform without ever having to move data outside of Hadoop.  Hadapt's SQL-compliant query interface together with Pentaho Kettle for ETL, allows analysts to leverage their existing SQL skills for big data analytics on Hadoop.”

-- Justin Borgman, CEO, Hadapt, @justinborgman

“The strategic alliance with Pentaho and HPCC Systems allows enterprise customers to gain better customer intelligence within Big Data for increased productivity, competitiveness and growth. We’re pleased to work with Pentaho to create breakthrough solutions for a new era of Big Data capabilities to foster thought-provoking ‘what if’ scenarios.”

-- Armando Escalante, CTO, LexisNexis Risk Solutions and head of HPCC Systems

“MapR is fully onboard with Pentaho’s open sourcing of its data integration for Hadoop. We are enthusiastic supporters of any fellow Hadoop community members who are dedicated to advancing the development and acceptance of the Hadoop marketplace.”

-- Jack Norris, VP of Marketing, MapR Technologies, @Norrisjack

  • Visit to:
    • Download Pentaho Kettle for Big Data;
    • Access how-to guides, videos and additional resources;
    • Connect with the community ##pentaho;
    • Join the Pentaho Big Data technical developer mailing list to be notified about future big data product updates and related events.
  • Attend the techcast on Thursday February 9th to learn more about Pentaho Kettle for Big Data, watch a live demo and hear how you can get involved. Register now.
  • Deep hands-on training FREE for attendees at the 2012 Strata Conference in Santa Clara, California. Sign-up for our how-to training session on February 28th during the ‘Tuesday Tutorials.’ Bring your use cases and get up and running Register with Pentaho’s 20 percent discount code: str12sd20.
  • Join the conversation at @Pentaho and

About Pentaho, a Hitachi Group company
Pentaho, a Hitachi Group company, is a leading data integration and business analytics company with an enterprise-class, open source-based platform for diverse big data deployments. Pentaho’s unified data integration and analytics platform is comprehensive, completely embeddable and delivers governed data to power any analytics in any environment. Pentaho’s mission is to help organizations across multiple industries harness the value from all their data, including big data and IoT, enabling them to find new revenue streams, operate more efficiently, deliver outstanding service and minimize risk. Pentaho has over 15,000 product deployments and 1,500 commercial customers today including ABN-AMRO Clearing, BT, EMC, NASDAQ and Sears Holdings Corporation. For more information visit

{ "FirstName": "First Name", "LastName": "Last Name", "Email": "Business Email", "Title": "Job Title", "Company": "Company Name", "Address": "Address", "City": "City", "State":"State", "Country":"Country", "Phone": "Business Telephone", "LeadCommentsExtended": "Additional Information(optional)", "LblCustomField1": "What solution area are you wanting to discuss?", "ApplicationModern": "Application Modernization", "InfrastructureModern": "Infrastructure Modernization", "Other": "Other", "DataModern": "Data Modernization", "GlobalOption": "If you select 'Yes' below, you consent to receive commercial communications by email in relation to Hitachi Vantara's products and services.", "GlobalOptionYes": "Yes", "GlobalOptionNo": "No", "Submit": "Submit", "EmailError": "Must be valid email.", "RequiredFieldError": "This field is required." }