What is a data warehouse?
A data warehouse is a data storage system designed to ingest and analyze data in real-time typically to uncover business intelligence used in day-to-day operations. These systems draw relevant data from many other operational systems, like enterprise resource planning (ERP) systems, customer relationship management (CRM) systems, billing systems, and supply chain systems, and apply sophisticated visualization, reporting, and business analytics to help inform companies easily.
While many data warehouses do archive data for posterity, this role more frequently falls to data lakes because data lakes accommodate extensive data storage for multiple non-compatible data types. Data warehouse storage, however, is still a highly relevant enterprise storage solution, especially for sensitive customer information, proprietary business information, and storing refined data.
Data warehouses play a downstream role in enterprise data pipelines. Either the warehouse will dip into a data lake, drawing out relevant data from the pool of structured, unstructured, and semi-structured data, or it collects data from multiple data sources and operational systems.
Then, upon ingestion, the data warehouse will clean, transform, or otherwise prepare the data to fit into a data schema that it can store and use—unlike data lakes, data warehouses operate on hierarchical systems instead of flat. From the warehouse, or as data is moved inside, analytics and further transformations are applied to it, sending the refined insights to reporting, or on to another system like a data mart, to serve smaller specialized sets of users.