Do You Have What it Takes to Manage the Flood of Data?

Hu Yoshida
CTO Emeritus, Hitachi Vantara

April 11, 2022

In 2010, Eric Schmidt, then CEO of Google, made the startling claim that every two days we humans generate as much information as we did from the dawn of civilization to today, or about five exabytes of data. At the time, we had TB disk drives and could only imagine an exabyte, which is one million terabytes. The next increments from TB is the peta byte and then the zettabyte, which is 1,000 exabytes. By the end of 2010, the world had crossed the zettabyte threshold.

The challenge with such an explosion of data is figuring out the best ways to capture, copy, and consume it, in the most efficient, effective and economically way possible.

Thanks to storage technologies like multi-bit and multilayer recording, high density magnetic tapes (yes, tapes are still a viable storage media), erasure coding, software defined storage, and cloud we are somehow able to capture the data even with the introduction of new types of data like machine generated data from IoT.

Nevertheless, the tsunami of data continues to drive data sprawl, which in turn is pushing the limits of data governance. Both problems are becoming increasingly complicated as data grows more distributed across the data center, edge, hybrid, and public cloud infrastructure. Today, storing the data is only a small part of the challenge of data growth. The greater challenge is how to discover, access, prepare and blend data across multiple data sources and locations and seamlessly merge IT and OT data to unlock transformational business insights.

Hitachi Vantara believes that customers need to create a seamless data fabric governed by an enhanced data catalog for automated data quality improvements and governance. Hitachi Vantara’s Lumada Industrial Data Ops Portfolio provides Data Integration powered by Pentaho technology, that enables customers to reduce time and complexity to discover, access, prepare and blend data across multiple data sources and locations. The new Lumada Industrial DataOps portfolio includes IoT analytics models for industrial environments that jump start analytic application assembly. The IIoT Core framework seamlessly merges IT and OT data to unlock transformational business insights.

Data automation is also a key technology to address this challenge. Data automation is AI driven software like Hitachi Vantara’s IO-Tahoe software, which executes repetitive data processes in an audited, controlled manner. It works with your company’s core systems and existing applications, and automates data management, communication and response triggering. It drives performance, prevents human error, and lets you focus on value-added tasks. Once in place, Data Automation by Io-Tahoe delivers repetitive, high-volume data tasks efficiently and cost effectively whenever you need it.

Io-Tahoe’s AI-driven data management software helps Fortune 1000 clients address their toughest data challenges by driving data quality, data lineage, workflow approval, and governance and regulatory compliance to deliver meaningful business outcomes. These capabilities are integrated with the Lumada DataOps Suite – a flexible data management fabric that provides integration, catalog, and edge capabilities.

Unlike traditional data management solutions that lock customers into proprietary technologies, the Lumada DataOps Suite augments any ecosystem to manage and govern data from anywhere. Lumada DataOps Integrates data across hybrid cloud with Pentaho v9.3 through flexible cloud deployment and new connectors for cloud data stores like Snowflake, MongoDB Atlas, Teradata, Elastic Search7.x and IBM MQ 9.2.

Lumada DataOps portfolio allow organizations to create a seamless data fabric governed by an enhanced data catalog for automated data quality improvements and governance. With the latest updates to Data Integration powered by Pentaho technology, customers can reduce time and complexity to discover, access, prepare and blend data across multiple data sources and locations.

Collectively we are already consuming zettabytes of data today. If we keep doubling the amount of data every four years it won’t be too far in the future when customers will be managing that amount of data. New infrastructure technologies will be developed by then. but the core framework to support this will still be a data fabric, a data catalogue, and AI driven data automation.

Related News

DataOps for the Data-Driven – Manish Jain
Press Release: Hitachi Vantara Unlocks Data-Driven Innovation

Hu Yoshida is CTO Emeritus at Hitachi Vantara.

Hu Yoshida

Hu Yoshida spent 24 years at Hitachi Vantara helping define technical direction and enabling customers to address their digital transformation needs. He is widely known in the industry and was instrumental in evangelizing Hitachi's unique approach to storage virtualization.