The data fabric is an emerging design concept for data management that addresses the challenges of data complexity. It aims to provide an agile enterprise data foundation to support a wide variety of business use cases. The notion of a data fabric is closely tied to DataOps and initiatives for data modernization and digital innovation at large.
A data fabric can be thought of as a tapestry that connects data from multiple locations (edge-core-cloud), types, and sources of data, with methods for accessing that data. For users that are consuming applications and systems alike, it abstracts away the complexities associated with underlying storage, movement, transformation, securing and processing of data.
A data fabric is not a replacement of more traditional data management architectures such as data lakes, data warehouses, data hubs and databases. Rather, a data fabric involves those systems as active participants in a unified approach.
The data fabric aims to simplify data complexity through automating data integration, data governance, and data processing. Tools for data fabric design and management include data pipelines with various integration styles; workflow management, orchestration and policy management; active metadata and machine learning (ML) augmented data management; augmented data cataloging; data virtualization.
As the data fabric increasingly spans data across multiple clouds, data centers, and edge systems, it typically is built with the help of container-based technologies (such as Kubernetes) and related services mesh technologies.
The data fabric should provide a single environment for accessing and collecting all data, no matter where it’s located and no matter how it’s stored – eliminating data silos. As compared to the manual process of data pipeline creation which leads to slow, error-prone, redundant work, the automation of pipeline creation allows data engineers to better service data consumers.
Enriching data with business semantics and metadata level governance fosters a collaborative, self-service environment that allows for new and accelerated time to value in customer 360 views, fraud detection, IoT analytics, and many other use cases.