Fannie Mae Automates Data Lake Design Process To Accelerate Insight

Lumada Data Catalog’s Extensive APIs Automate Data Cataloging From High-Velocity Applications

The Challenge:

  • Efficiently prepopulate data lake with all necessary dataset properties.
  • Provide an API-based solution for automation.
  • Ingest a high daily volume of datasets.
  • The Solution:

  • Automated solution capable of cataloging greater than 10 million files per day with all associated properties.
  • Searchable user interface (UI) with custom search properties delivers desired data quickly and efficiently.
  • Outcomes

    • Self-service “marketplace” data catalog for business users.
    • Cataloged custom properties attached to each dataset. 
    • APIs to automatically process datasets. 
    • Millions of files cataloged per day.

    Fannie Mae is a leader in providing housing finance for homebuyers and renters across the United States. The company helps make fixed-rate mortgage and affordable rental housing possible for millions of Americans.

    The Challenge

    Ingest a high daily volume of datasets

    Founded in 1938, Fannie Mae deals with the same challenges as many mature corporations, including legacy environments and data silos.

    As a $110 billion company that is the leading source of financing for mortgage lenders, Fannie Mae’s increasingly data-centric business wanted to transition to an agile, more responsive data lake. The company sought to create a modern data environment that ensured the right data got to the right person at the right time.

    Fannie Mae set a governance standard whereby every dataset and field in the data lake was completely documented. In fact, each dataset goes through a design process where it gets curated and assigned a unique identifier, which stays with it no matter where it gets copied. Each dataset also has an elaborate set of properties that have to be filled out before the identifier can be issued.

    While this process made the data more accessible, it was taking too much time. It could take days, weeks or even months from the time the design was approved to when data was actually generated and transferred from the IMR design time system into the data lake.

    Meanwhile, several of Fannie Mae’s high-velocity apps continued to generate more than 10 million new files every day, clogging up the slow design process even further. These new files also needed to be integrated into the data lake, which required an API-based automated solution.

    With Waterline we were able to fully automate and accelerate the cataloging and searchability of data to deliver game changing value to the business.
    – Prakash Jagananthan, Data Management Leader, Fannie Mae

    *Hitachi Vantara acquired Waterline Data in 2020

    The Solution

    Dataset Preregistration and Metadata Versioning Accelerate Analytics and Insights
    As part of its wholesale data transformation to a modern data infrastructure, Fannie Mae integrated Lumada Data Catalog. With its extensive APIs, Lumada Data Catalog supports high-volume applications that generate millions of files daily for preregistration of ingested datasets. The interface allows validation and management of metadata for different roles, including metadata analysts, data stewards, data governors and business data officers. Business data officers at Fannie Mae “ensure that data is fully owned and cared for by business leaders, and that new initiatives consider the creation, ongoing quality, and effective usage of data from the outset,” according to a recent Forbes report.*

    Metadata versioning allows capture and display of technical metadata provided by the ingesting application, including file location, file size, file format, time of ingestion, partition and so forth. The solution can catch unresolved schema evolution in order to produce discrepancy reports between reported and inferred schema. That data is then made available to business end users in a robust self-service “marketplace” UI. The UI features complex custom properties presented simply and cohesively to enable end users to quickly find and utilize the data.

    The Outcome

    Fannie Mae Ensures Better Business Outcomes
    By implementing Hitachi’s data management and analytics solution, based on Lumada Data Services and Lumada Data Catalog, Fannie Me was able to provide a self-service “marketplace” data catalog for business users. Cataloged custom properties were attached to each dataset and the application programming interfaces (APIs) enabled automatic processing of the datasets. As a result, millions of files could be cataloged each day.

    Ultimately, these solution elements enable faster analytics and insights, which translate into better business outcomes. As Fannie Mae Chief Data Officer Scott Richardson says, “We are engaged in thinking about business strategy through the lens of furthering our mission and improving the customer experience with data.”*

    *“How Fannie Mae is Creating a Modern Data Environment,” Forbes

    Aug 30, 2018,07:36am EDT

    EXPLORE MORE

     
     

    You’re in the Right Place!

    Hitachi Data Systems, Pentaho and Hitachi Insight Group have merged into one company: Hitachi Vantara.

    The result? More data-driven solutions and innovation from the partner you can trust.


    You’re in the Right Place!

    REAN Cloud is now a part of Hitachi Vantara.
    The result? Robust data-driven solutions and innovation, with industry-leading expertise in cloud migration and modernization.


    You’re in the Right Place!

    Hitachi Consulting and Hitachi Vantara have integrated into a new company under the Hitachi Vantara brand. We help you connect what’s now to what’s next.


    You’re in the Right Place!

    Waterline Data is now Lumada Data Catalog, provided by Hitachi Vantara. Lumada Data Catalog, available stand-alone, is now part of the Lumada Data Services portfolio.