Ingest a high daily volume of datasets
Founded in 1938, Fannie Mae deals with the same challenges as many mature corporations, including legacy environments and data silos.
As a $110 billion company that is the leading source of financing for mortgage lenders, Fannie Mae’s increasingly data-centric business wanted to transition to an agile, more responsive data lake. The company sought to create a modern data environment that ensured the right data got to the right person at the right time.
Fannie Mae set a governance standard whereby every dataset and field in the data lake was completely documented. In fact, each dataset goes through a design process where it gets curated and assigned a unique identifier, which stays with it no matter where it gets copied. Each dataset also has an elaborate set of properties that have to be filled out before the identifier can be issued.
While this process made the data more accessible, it was taking too much time. It could take days, weeks or even months from the time the design was approved to when data was actually generated and transferred from the IMR design time system into the data lake.
Meanwhile, several of Fannie Mae’s high-velocity apps continued to generate more than 10 million new files every day, clogging up the slow design process even further. These new files also needed to be integrated into the data lake, which required an API-based automated solution.
Dataset Preregistration and Metadata Versioning Accelerate Analytics and Insights
As part of its wholesale data transformation to a modern data infrastructure, Fannie Mae integrated Lumada Data Catalog. With its extensive APIs, Lumada Data Catalog supports high-volume applications that generate millions of files daily for preregistration of ingested datasets. The interface allows validation and management of metadata for different roles, including metadata analysts, data stewards, data governors and business data officers. Business data officers at Fannie Mae “ensure that data is fully owned and cared for by business leaders, and that new initiatives consider the creation, ongoing quality, and effective usage of data from the outset,” according to a recent Forbes report.*
Metadata versioning allows capture and display of technical metadata provided by the ingesting application, including file location, file size, file format, time of ingestion, partition and so forth. The solution can catch unresolved schema evolution in order to produce discrepancy reports between reported and inferred schema. That data is then made available to business end users in a robust self-service “marketplace” UI. The UI features complex custom properties presented simply and cohesively to enable end users to quickly find and utilize the data.
Fannie Mae Ensures Better Business Outcomes
By implementing Hitachi's data management and analytics solution, based on Lumada Data Services and Lumada Data Catalog, Fannie Me was able to provide a self-service "marketplace" data catalog for business users. Cataloged custom properties were attached to each dataset and the application programming interfaces (APIs) enabled automatic processing of the datasets. As a result, millions of files could be cataloged each day.
Ultimately, these solution elements enable faster analytics and insights, which translate into better business outcomes. As Fannie Mae Chief Data Officer Scott Richardson says, "We are engaged in thinking about business strategy through the lens of furthering our mission and improving the customer experience with data."*
*"How Fannie Mae is Creating a Modern Data Environment," Forbes
Aug 30, 2018,07:36am EDT
Accelerate data discovery and tagging to secure sensitive data, infer hidden relationships, accelerate data self-service and drive smarter insights.LEARN MORE
With Waterline we were able to fully automate and accelerate the cataloging and searchability of data to deliver game changing value to the business.
– Prakash Jagananthan, Data Management Leader, Fannie Mae *Hitachi Vantara acquired Waterline Data in 2020