Faster Insights Drive Better Business Outcomes

“Pentaho Data Catalog gives us real-time insights into how our data is changing over time and helps us ensure that all our data files are stored in the right places to support smooth, standardized operations and compliance with internal guidelines.”

Rohny Kolli, Data Engineering Manager – Advanced Analytics Enablement, Fannie Mae

Overview

Challenge

Make millions of files of mission-critical business data rapidly available to business analysts every day.

Solution

Deploy Pentaho Data Catalog to automate profiling and tagging of data sets and provide context for analyses.

Outcome

Automate processes to eliminate data anomalies with AI, accelerate data delivery to analysts, and facilitate compliance.

Challenge

Fannie Mae enabled the acquisition of more than 2 million home purchases and refinancings, and financing of approximately 598,000 rental units across the United States in 2022. Today, Fannie Mae is an increasingly digital and data-centric business. To leverage all its business data across new and legacy applications, as well as break down existing data silos, the company wanted to create an agile and dynamic enterprise data lake.

Rohny Kolli, Data Engineering — Advanced Analytics Enablement at Fannie Mae, says: “Our goal was to build a modern, state-of-the-art data platform for business analysts and decision-makers across the company. We wanted to enable fast, data-driven decisions — which meant we had to make it easier to get the right data to the right people at the right time.”

Fannie Mae started by designing a comprehensive process to manage its enterprise data lake. Every single one of its 15,000 datasets went through an initial registration process to assign a unique identifier, and every field had to be documented manually. This approach increased compliance and transparency by helping to identify datasets at every stage of the analytics and reporting process — but the need to add an elaborate set of metadata to every dataset made the process slow.

“With our existing solution, it could take weeks or even months before new datasets would be registered in our data lake and made available to our business analysts and data scientists,” adds Rohny Kolli. “To respond faster to new data that is being continuously generated by our high-velocity apps, we had to automate this process. We were looking for a solution that could handle more than 10 million new files every day to keep our enterprise data lake up to date.”

“With Pentaho Data Catalog, we are integrating millions of files each day into our enterprise data lake. The solution enables data profiling and tagging to gain valuable insights, detect anomalies immediately, and support our data governance management to facilitate compliance.”

Rohny Kolli, Data Engineering Manager – Advanced Analytics Enablement, Fannie Mae

Solution

To help establish a faster and more dynamic data infrastructure, Fannie Mae selected Pentaho Data Catalog as a centralized, data-agnostic tool to accelerate data availability. The software runs fully in the cloud on Amazon Web Services (AWS) across multiple availability zones with auto-scaling to ensure fast performance and business continuity. It processes tens of millions of files and related attributes and aggregates them into thousands of high-level datasets that are easy for the business team to consume and reference for actionable insights.

To transform its data pipeline, Fannie Mae now relies on process automation based on the Pentaho Data Catalog API. This enables the company to connect its wide range of business applications to the enterprise data lake and update datasets on a daily basis.

Pentaho Data Catalog performs an automated pre-registration step, using machine learning and AI to validate and tag metadata and detect sensitive data. It then makes everything immediately available to the company’s metadata analysts, data stewards, data governors and business data officers for further processing and analytics.

Built-in metadata versioning helps Fannie Mae keep track of changes in its data sources and better understand the context of its business data. The data-agnostic solution highlights changes in storage location, file size, file format and many other technical details that can help the team to tune and optimize the data processing.

“Pentaho Data Catalog gives us real-time insights into how our data is changing over time and helps us ensure that all our data files are stored in the right places to support smooth, standardized operations and compliance with internal guidelines,” says Rohny Kolli. “The solution can catch unresolved schema issues and produce discrepancy reports, helping our various teams ensure high data quality and compliance.”

Outcome

Accessing critical business information is now easier than ever. “Using Pentaho Data Catalog, we have created a data-agnostic self-service offering for our business users,” adds Rohny Kolli. “Staff can flexibly search our enterprise data lake with a user-friendly and intuitive interface to gain a 360-degree view of our business data. The search results provide a simple overview, so data stewards, business analysts and data scientists can find the right datasets with the custom data properties they need quickly and efficiently.”

To unlock further insights and provide meaningful context to business users, Fannie Mae is now using the solution to tag its data — for example, to highlight sensitive and personal information and classify more than 400 key data elements (KDEs).

Ultimately, these solution elements enable faster analytics and insights, which translate into better business outcomes. Rohny Kolli concludes: “With Pentaho Data Catalog, we are integrating millions of files each day into our enterprise data lake. The solution enables data profiling and tagging to gain valuable insights, identify anomalies immediately, and support our data governance management to facilitate compliance.”

Industry

Financial Services

Solutions

Data Management and Analytics

Software

Pentaho Data Integration
Pentaho Business Analytics

Large Data Volumes

Automated processing of 10 million files per day supported decision-making to provide $684 billion in liquidity to the mortgage market in 2022.

10

million files

$684

billion liquidity

Ready to get started?

Prev DBS Bank : Leading Financial Group in Asia Next Generali China Life: Simplifying Data Management

Storage Platforms

Data Management

Integrated Systems

Sustainability Solutions

Hybrid Cloud

Enterprise Applications

AI and Analytics

Data Protection & Cyber Resiliency

Data Governance

Industry Solutions

Customer Success & Support

Infrastructure as a Service

Professional Services

Become a Partner

Technology Alliance Partners

Managed Service Providers

Global Systems Integrators

Resellers

Public Cloud Partners

Find a Partner

Partner Login

Faster Insights Drive Better Business Outcomes

Overview

Challenge

“With Pentaho Data Catalog, we are integrating millions of files each day into our enterprise data lake. The solution enables data profiling and tagging to gain valuable insights, detect anomalies immediately, and support our data governance management to facilitate compliance.”

Solution

Outcome

Large Data Volumes

10

$684

Tell your story

Ready to get started?

Storage Platforms

Data Management

Integrated Systems

Sustainability Solutions

Hybrid Cloud

Enterprise Applications

AI and Analytics

Data Protection & Cyber Resiliency

Data Governance

Industry Solutions

Customer Success & Support

Learning Services

Infrastructure as a Service

Professional Services

Become a Partner

Technology Alliance Partners

Managed Service Providers

Global Systems Integrators

Resellers

Public Cloud Partners

Find a Partner

Partner Login

Trending Topics

Overview

Challenge

“With Pentaho Data Catalog, we are integrating millions of files each day into our enterprise data lake. The solution enables data profiling and tagging to gain valuable insights, detect anomalies immediately, and support our data governance management to facilitate compliance.”

Solution

Outcome

Highlights

Large Data Volumes

10

$684

Tell your story

Ready to get started?