Hamburger Hamburger Hamburger

Press Release

September 26, 2016

Five New Pentaho Data Integration Enhancements, Including SQL on Spark, Deliver Value Faster and Future Proof Big Data Projects

New Spark and Kafka support, Metadata Injection enhancements and Hadoop security alleviate big data complexity

September 26, 2016, New York City (Strata + Hadoop World, Booth #533) — Pentaho, a Hitachi Group Company, today announced five new improvements, including SQL on Spark, to help enterprises overcome big data complexity, skills shortages and integration challenges in complex, enterprise environments. These big data integration enhancements help IT teams deliver value from big data projects faster with existing resources, by eliminating the need for manual coding, providing tighter security and supporting more of the big data technology ecosystem

1. More Apache Spark integration

Pentaho expands its existing Spark integration in the Pentaho platform, for customers that want to incorporate this popular technology to:

  • Lower the skills barrier for Spark – data analysts can now query and process Spark data via Pentaho Data Integration (PDI) using SQL on Spark
  • Coordinate, schedule, reuse, and manage Spark applications in data pipelines more easily and flexibly – expanded PDI orchestration for Spark Streaming, Spark SQL and Spark machine learning (Spark MLlib and Spark ML) to support the growing number of developers who use multiple Spark libraries
  • Integrate Spark apps into larger data-driven processes and get more out of them – PDI Orchestration of Spark applications written in Python benefits developers writing Spark applications in this popular language

2. Expanded metadata injection capabilities

Pentaho’s unique metadata injection capability to onboard multiple data types faster allows data engineers to dynamically generate PDI transformations at runtime instead of having to hand-code each data source, reducing costs by 10X. Pentaho adds over 30 compatible PDI transformation steps, including operations related to Hadoop, Hbase, JSON, XML, Vertica, Greenplum, and other big data sources.

3. Expanded Hadoop data security integrations

Securing big data environments can be extremely difficult because the technologies that define authentication and access are continuously evolving. Pentaho expands its Hadoop data security integration to promote better big data governance, protecting clusters from intruders. These include enhanced Kerberos integration for secure multi-user authentication and Apache Sentry integration to enforce rules that control access to specific Hadoop data assets

4. Apache Kafka support

Apache Kafka’s increasingly popular publish/subscribe messaging system handles large data volumes common in today’s big data and IoT solutions. Pentaho now provides Enterprise customer support to send and receive data from Kafka, to facilitate continuous data processing use cases in PDI. 

5. Enhanced support for popular Hadoop file formats

Pentaho now supports the output of files in Avro and Parquet formats in PDI, both popular for storing data in Hadoop in big data onboarding use cases.

“Our latest enhancements reflect Pentaho’s continued mission to quickly make big data projects operational and deliver value by strengthening and supporting analytic data pipelines”, says Donna Prlich, Senior Vice President, Product Management, Product Marketing & Solutions, at Pentaho. “Enterprises can focus on their big data deployments, removing the complexity and time involved in data preparation by taking advantage of new, high potential technologies like Spark and Kafka in the big data ecosystem.”

Quote Sheet
“Veikkaus, the Finnish lottery, uses Pentaho Data Integration to rapidly consolidate and process both relational and semi-structured data to drive a better understanding of our customers and enhance loyalty,” said Harri Räsänen, Architect at Veikkaus. “Pentaho has helped us rapidly solve complex data problems and establish a future-proof data foundation in the face of an ever-evolving big data landscape.”

“Pentaho helps organizations create business competitive advantage with data by accelerating the incorporation of technologies like Spark into existing data environments by managing risk to facilitate alignment with big data security policies.” Said Tim Stevens, vice president of Corporate and Business Development, Cloudera.  “Cloudera’s partnership with Pentaho empowers joint customers to bring innovative, enterprise-grade Hadoop analytic applications to market more quickly.”

USAble Life of Little Rock, AR, was created in 1993 and is an independent life, disability, accident, and specialty insurance company. According to Jason Brannon, Supervisor of Data Architecture, USAble, "To synchronize the ongoing changes to enrolment information between our customers and partners, above all we need flexibility in our data architecture and as few bottlenecks as possible. PDI's metadata injection gives us unparalleled flexibility by automatically transforming customer data into outbound partner feeds. This means we can devote our time to analysing data and improving customer relationships."

North American Bancard Holdings, a leader in the payments industry, processes and analyzes more than $34 billion per year in transactions in order to enhance its operations and improve customer service. According to Krishna Swargam, Business Intelligence Architect at North American Bancard Holdings. “Pentaho plays a crucial role in orchestrating and automating this data pipeline, delivering analytic-ready data in a complex environment, and we are excited how Pentaho’s new big data enhancements will further drive business transformation throughout the organization.”


  • Learn more about Pentaho’s big data enhancements
  • Register for a webinar discussing Pentaho’s latest release, here.  
  • Visit us at booth #533 at Strata the week of September 26th
  • Join Pentaho for our Strata session “Filling the Data Lake” on Sept 28 at 2:05 pm E.T.

About Pentaho, a Hitachi Group company
Pentaho, a Hitachi Group company, is a leading data integration and business analytics company with an enterprise-class, open source-based platform for diverse big data deployments. Pentaho’s unified data integration and analytics platform is comprehensive, completely embeddable and delivers governed data to power any analytics in any environment. Pentaho’s mission is to help organizations across multiple industries harness the value from all their data, including big data and IoT, enabling them to find new revenue streams, operate more efficiently, deliver outstanding service and minimize risk. Pentaho has over 15,000 product deployments and 1,500 commercial customers today including ABN-AMRO Clearing, BT, EMC, NASDAQ and Sears Holdings Corporation. For more information visit

{ "FirstName": "First Name", "LastName": "Last Name", "Email": "Business Email", "Title": "Job Title", "Company": "Company Name", "Address": "Address", "City": "City", "State":"State", "Country":"Country", "Phone": "Business Telephone", "LeadCommentsExtended": "Additional Information(optional)", "LblCustomField1": "What solution area are you wanting to discuss?", "ApplicationModern": "Application Modernization", "InfrastructureModern": "Infrastructure Modernization", "Other": "Other", "DataModern": "Data Modernization", "GlobalOption": "If you select 'Yes' below, you consent to receive commercial communications by email in relation to Hitachi Vantara's products and services.", "GlobalOptionYes": "Yes", "GlobalOptionNo": "No", "Submit": "Submit", "EmailError": "Must be valid email.", "RequiredFieldError": "This field is required." }