Get a hands-on experience of enterprise-class data integration and analytics software for scaling big data, multicloud migration and integration with applications and services.
Download Pentaho and try the full enterprise-class data integration & analytics software scaling for big data, multicloud migrations and integration into applications and services. Provides the complete set of features.
Pentaho Community Edition is an open-source version with core engines in the platform that lets you experiment with reduced features and capabilities. Extract, transform and load (ETL) limited data.
This plugin allows you to use the Elasticsearch REST Bulk Insert step if you have records that you want to submit to an Elasticsearch server 8x installed for indexing. Use this step to send one or more batches of records to an Elasticsearch server for indexing. Because you can specify the size of a batch, you can use this step to send one, a few, or many records to Elasticsearch for indexing. What’s new: The client version is now 7.17.16
Read More
Read Less
Databricks Bulk Loader
This plugin allows you to use the Databricks Bulk Loader step if you have load large amount of data from files in cloud account into databricks tables and enables use cases where bulk operations are required. The step uses the COPY INTO command to load the data, and most of the options are driven by that command. What’s new: Databricks bulk loader step can now use the new Databricks database connection.
Read More
Read Less
Rest Client Step
This plugin allows you to use the Rest Client step that enables you to consume Restful services. This step now supports Keystore for authentication.
Google Analytics v4
Google Analytics v4: You can use the Google Analytics v4 step to access your Google Analytics data to generate reports or to populate your data warehouse. This step queries Google Analytics properties using the Google Analytics API v4 and places the resulting dimension and metric values on the step output stream.
Note: Google Analytics v4 API is currently in Beta and the libraries that are shipped along with this new plugin are in Beta. General availability of the Google API is not available yet. For more information please refer to https://developers.google.com/analytics/devguides/reporting/data/v1
Read More
Read Less
Salesforce Bulk Operation Plugin
This plugin allows you to use the Salesforce Bulk Operation step if you have a large number of Salesforce objects to INSERT, UPDATE, UPSERT, or DELETE. The step provides seamless bulk job orchestration and significantly faster Salesforce object manipulation than the traditional SOAP steps offer.
Read More
Read Less
Elasticsearch REST Bulk Insert
This plugin allows you to use the Elasticsearch REST Bulk Insert step if you have records that you want to submit to an Elasticsearch server 8x installed for indexing. Use this step to send one or more batches of records to an Elasticsearch server for indexing. Because you can specify the size of a batch, you can use this step to send one, a few, or many records to Elasticsearch for indexing.
Read More
Read Less
Databricks Bulk Loader
This plugin allows you to use the Databricks Bulk Loader step if you have load large amount of data from files in cloud account into databricks tables and enables use cases where bulk operations are required. The step uses the COPY INTO command to load the data, and most of the options are driven by that command.
Read More
Read Less
Elasticsearch Plugin
Use the Elasticsearch REST Bulk Insert step if you have records that you want to submit to an Elasticsearch server for indexing. This plugin now supports Elasticsearch 8 and paves the way to use PDI in the future with the increasingly popular OpenSearch/AWS version of Elasticsearch.
Read More
Read Less
Hierarchical Data Type (HDT)
Hierarchical Data Type (HDT) is a new datatype in PDI for handling structured/complex/nested datatype based on JSON format. There are five new plugins/steps namely Hierarchical JSON Input, Extract to Rows, Modify values from a single row, Modify values from grouped rows and Hierarchical JSON Output. Hierarchical JSON Input step takes JSON /JSONL from previous step or from the file location and converts into a Hierarchical object. Hierarchical JSON Output step is used to get data in HDT from previous steps and convert it into JSON formatted string. And the other three steps are used for manipulating the Hierarchical Data Type.
Read More
Read Less
Kafka EE (Enterprise Edition)
Kafka EE is an upgraded version of Kafka CE Plugin with additional enterprise features enabled. It now supports SSL and Kerberos to connect broker and new Job entry called Kafka Offset is added to reset offset of Kafka topic partitions. Also included is the ability to stop reading messages by Kafka consumer, once the offset is reached by taking timestamp as input (ex: future/past timestamp) using Kafka Offset job entry. And finally, Kafka consumer step user can read messages from start offset to end offset from different partitions of a topic.
Read More
Read Less
Experience the power of Pentaho for Data Integration
Download Pentaho Data Integration and get a hands-on experience with full data integration and analytics functionality.