Hamburger Hamburger Hamburger

Pentaho Platform: Pentaho Data Integration

Ingest, Blend, Cleanse, and Prepare Diverse Data From Any Source, in Any Environment — Without Code.

Download PDF

With Pentaho Data Integration, a Pentaho Platform product, managing the enormous volumes and increased variety and velocity of data entering organizations is simplified.

By allowing data preparation from any source and automating your data pipline, Pentaho Data Integration allows you to curate data better for your business user. This software delivers business analytics to end users faster with visual tools that reduce time and complexity — without writing SQL or coding in Java or Python. Organizations immediately gain real value from their data, from sources like files, relational databases, Hadoop, and more, which are in the cloud or on premises.

Turn Big Data Into Actionable Analytics


Pentaho Data Integration’s adaptive big data layer allows you to plug into popular big data stores with flexibility and insulation from change. Data can be accessed once, then processed, combined and consumed anywhere. The adaptive big data layer includes plug-ins for Hadoop distributions and object stores from Cloudera, Hortonworks, MapR (HPE Ezmeral Data Fabric), Amazon Web Services, Google Cloud and Microsoft Azure, object stores such as Hitachi Content Platform, as well as popular NoSQL databases like MongoDB and Cassandra.

Integrate and Blend Big Data With Existing Enterprise Data

With broad connectivity to any data type and high-performance Spark and MapReduce execution, Pentaho technology simplifies and speeds the process of integrating existing databases with new sources of data. Pentaho Data Integration’s graphical designer includes:

  • Intuitive, drag-and-drop designer to simplify the creation of analytics data pipelines (see Figure 1).
  • Rich library of prebuilt components to access, prepare and blend data from relational sources, big data stores on premises or in the cloud, enterprise applications and more.
  • Ability to spot check data in flight with immediate access to analytics, including charts, visualizations and reporting, from any data prep step.
  • Powerful orchestration capabilities to coordinate and combine transformations, including notifications and alerts.
  • Integrated enterprise scheduler for coordinating workflows and debugger for testing and tuning job execution.

Data Processing Performance and Productivity

Pentaho Data Integration speeds performance time, reduces the complexity of integrating big data sources, and provides:

  • Code-free data transformation design that empowers 15 times faster productivity versus hand-coding and executes in-cluster for high performance.
  • Template-based approach to rapidly onboard data sources into Hadoop via metadata injection feature set.
  • Ability to seamlessly switch between execution engines, such as Spark and the Pentaho native engine, to fit data volume and transformation complexity (see Figure 2).
  • Support for advanced analytics models from R, Python, Scala and Weka to oper-ationalize predictive intelligence while reducing data prep time.

“Moving data across a business is an art. Pentaho transforms art into better business value.”

– Warren Chang, VP of Engineering, Borderfree

Broad Connectivity and Data Delivery

Pentaho Data Integration offers broad connectivity to a variety of diverse data, including all popular structured, unstructured and semi-structured data sources. Some examples include:

  • Relational database management system (RDBMS): Oracle, IBM DB2 , MySQL, Microsoft SQL Server, Postgres, IBM MQ.
  • Spark and Hadoop: Cloudera, Hortonworks, Amazon EMR, MapR (HPE Ezmeral Data Fabric), Microsoft Azure HDInsights, and Elastic Search.
  • NoSQL databases and object stores: MongoDB, Cassandra, HBase, Hitachi Content Platform, AWS S3, Google Cloud Storage, Microsoft Azure ADLS Gen 2.
  • Analytic databases: Redshift, Snowflake,Vertica, Greenplum, Teradata, SAP HANA, Amazon Redshift, Google Big Query.
  • Business applications: SAP, Salesforce, Google Analytics.
  • Files: XML, JSON, Microsoft Excel, CSV, txt, Avro, Parquet, ORC, EBCDIC (mainframe), unstructured files with metadata, including audio, video and visual files.

To increase the performance of data extraction, loading and delivery processes, Pentaho offers the following capabilities:

  • Native connectivity and bulk-loading to most common data sources, including Amazon Redshift and Snowkflake.
  • Data services to virtualize transformations without staging, making data sets immediately available to reports and applications.
  • Automatic creation and publishing of metadata models to drive faster analytic results.
  • Process streaming data in real time.
Data Profiling and Data Quality

Pentaho technology provides data profiling capabilities, such as row counts, mathematical functions and identification of null values, as well as data quality operators, such as string manipulators, mapping functions, filtering and sorting. For name and address verification capabilities, Pentaho technology integrates with leading data quality vendors, such as Human Inference and Melissa Data. Pentaho data profiling and data quality capabilities help:

  • Identify data that fails to comply with business rules and standards.
  • Deduplicate and cleanse inconsistent and redundant data.
  • Validate, standardize and correct name, address, email and telephone data.
  • Replace file names and locations with simple business names by integrating with Data Catalog, a component of Pentaho Platform.
Powerful Administration and Management

Pentaho Data Integration provides out-of-the box capabilities for managing operations for data integration projects.These capabilities include:

  • Shared repository for collaboration among data analysts, developers and data stewards.
  • Content management, versioning and locking to easily version jobs for roll-back to prior versions.
  • Control over security privileges for users and roles and integration with third-party security systems; ability to set permissions for creating, reading or executing jobs and transformations.

{ "FirstName": "First Name", "LastName": "Last Name", "Email": "Business Email", "Title": "Job Title", "Company": "Company Name", "Address": "Address", "City": "City", "State":"State", "Country":"Country", "Phone": "Business Telephone", "LeadCommentsExtended": "Additional Information(optional)", "LblCustomField1": "What solution area are you wanting to discuss?", "ApplicationModern": "Application Modernization", "InfrastructureModern": "Infrastructure Modernization", "Other": "Other", "DataModern": "Data Modernization", "GlobalOption": "If you select 'Yes' below, you consent to receive commercial communications by email in relation to Hitachi Vantara's products and services.", "GlobalOptionYes": "Yes", "GlobalOptionNo": "No", "Submit": "Submit", "EmailError": "Must be valid email.", "RequiredFieldError": "This field is required." }