Fannie Mae Automates Data Lake Design Process | Hitachi Vantara
Hamburger Hamburger Hamburger

Fannie Mae Automates Data Lake Design Process To Accelerate Insight

Fannie Mae





The Challenge:

  • Efficiently prepopulate data lake with all necessary dataset properties.
  • Provide an API-based solution for automation.
  • Ingest a high daily volume of datasets.


The Solution:

  • Automated solution capable of cataloging greater than 10 million files per day with all associated properties.
  • Searchable user interface (UI) with custom search properties delivers desired data quickly and efficiently.


  • Self-service “marketplace” data catalog for business users.
  • Cataloged custom properties attached to each dataset. 
  • APIs to automatically process datasets. 
  • Millions of files cataloged per day.
Fannie Mae is a leader in providing housing finance for homebuyers and renters across the United States. The company helps make fixed-rate mortgage and affordable rental housing possible for millions of Americans.

With Waterline we were able to fully automate and accelerate the cataloging and searchability of data to deliver game changing value to the business.

– Prakash Jagananthan, Data Management Leader, Fannie Mae *Hitachi Vantara acquired Waterline Data in 2020

The Challenge

Ingest a high daily volume of datasets

Founded in 1938, Fannie Mae deals with the same challenges as many mature corporations, including legacy environments and data silos.

As a $110 billion company that is the leading source of financing for mortgage lenders, Fannie Mae’s increasingly data-centric business wanted to transition to an agile, more responsive data lake. The company sought to create a modern data environment that ensured the right data got to the right person at the right time.

Fannie Mae set a governance standard whereby every dataset and field in the data lake was completely documented. In fact, each dataset goes through a design process where it gets curated and assigned a unique identifier, which stays with it no matter where it gets copied. Each dataset also has an elaborate set of properties that have to be filled out before the identifier can be issued.

While this process made the data more accessible, it was taking too much time. It could take days, weeks or even months from the time the design was approved to when data was actually generated and transferred from the IMR design time system into the data lake.

Meanwhile, several of Fannie Mae’s high-velocity apps continued to generate more than 10 million new files every day, clogging up the slow design process even further. These new files also needed to be integrated into the data lake, which required an API-based automated solution.

The Solution

Dataset Preregistration and Metadata Versioning Accelerate Analytics and Insights
As part of its wholesale data transformation to a modern data infrastructure, Fannie Mae integrated Lumada Data Catalog. With its extensive APIs, Lumada Data Catalog supports high-volume applications that generate millions of files daily for preregistration of ingested datasets. The interface allows validation and management of metadata for different roles, including metadata analysts, data stewards, data governors and business data officers. Business data officers at Fannie Mae “ensure that data is fully owned and cared for by business leaders, and that new initiatives consider the creation, ongoing quality, and effective usage of data from the outset,” according to a recent Forbes report.*

Metadata versioning allows capture and display of technical metadata provided by the ingesting application, including file location, file size, file format, time of ingestion, partition and so forth. The solution can catch unresolved schema evolution in order to produce discrepancy reports between reported and inferred schema. That data is then made available to business end users in a robust self-service “marketplace” UI. The UI features complex custom properties presented simply and cohesively to enable end users to quickly find and utilize the data.

The Outcome

Fannie Mae Ensures Better Business Outcomes
By implementing Hitachi's data management and analytics solution, based on Lumada Data Services and Lumada Data Catalog, Fannie Me was able to provide a self-service "marketplace" data catalog for business users. Cataloged custom properties were attached to each dataset and the application programming interfaces (APIs) enabled automatic processing of the datasets. As a result, millions of files could be cataloged each day.

Ultimately, these solution elements enable faster analytics and insights, which translate into better business outcomes. As Fannie Mae Chief Data Officer Scott Richardson says, "We are engaged in thinking about business strategy through the lens of furthering our mission and improving the customer experience with data."*

*"How Fannie Mae is Creating a Modern Data Environment," Forbes

Aug 30, 2018,07:36am EDT

Explore More

{ "FirstName": "First Name", "LastName": "Last Name", "Email": "Business Email", "Title": "Job Title", "Company": "Company Name", "Address": "Address", "City": "City", "State":"State", "Country":"Country", "Phone": "Business Telephone", "LeadCommentsExtended": "Additional Information(optional)", "LblCustomField1": "What solution area are you wanting to discuss?", "ApplicationModern": "Application Modernization", "InfrastructureModern": "Infrastructure Modernization", "Other": "Other", "DataModern": "Data Modernization", "GlobalOption": "If you select 'Yes' below, you consent to receive commercial communications by email in relation to Hitachi Vantara's products and services.", "GlobalOptionYes": "Yes", "GlobalOptionNo": "No", "Submit": "Submit", "EmailError": "Must be valid email.", "RequiredFieldError": "This field is required." }