October 10, 2022
Third in a Series: As I’ve written in this series, the blending of different data types and AI workloads requires that the high-performance storage platform on which they run on, simultaneously supports high throughput requirements (e.g., computer vision), and high IO workloads (e.g., deep learning).
Imagine this scenario:an autonomous vehicle is driving through a city, collecting images from its cameras, while at the same time, collecting telemetry and sensor data from the various outputs of the vehicle.A traditional high-performance system would be able to handle one or the other of those workloads, but not both simultaneously.
Now, imagine an AI platform is consuming this data and reading back both the video data and the sensor data to infer some sort of outcome and generating a new AI model epoch.A modern parallel filesystem will be able to handle both tasks and is therefore needed to support both.
But you will also need high-performance storage to natively support the second tier of storage, the long-term archive.
High performance storage is fundamental to ensuring graphics processing units (GPUs) are fed as quickly as possible to ensure that their economic value is realised, creating outcomes and time-to-value faster than anyone else.AI platforms are tip-of-the-spear use cases for organisations as they decide the next phase of their business. These platforms are used to provide greater value to customers, and to enable the organization to better understand its data to make more informed decisions. In other words, they’re critical to the success of the organisation.
The economics of storing data for AI platforms has traditionally been very expensive, as most parallel filesystems are made up of large layers of pricey flash-based storage, more commonly NVMe (non-volatile memory).So, organizations turn to data lakes, archives, or more affordable solutions, such as NL-SAS (Near Line-SAS) storage.
For AI workloads to run properly, however, data needs to be accessible at any time. And as the data gets spread across such systems, which are often isolates, huge amounts of manual work is required to bring it back.
This is where having a parallel filesystem with native integration and support for object storage technology is critical.Object storage is one of the most economical online storage technologies available to meet the demand of these environments.
In addition, one of the unique qualities of object storage is “erasure coding.” This protection capability enables the regeneration of missing data from pieces of known data, and is a highly economical option, especially compared to the traditional 3-2-1 backup approach used by most data lakes. The 3-2-1 strategy calls for having three copies of your data, including the production data and two backup copies, which are stored on two different media types, and one of them being offsite.
At Hitachi Vantara, we have solutions that when combined can address these challenges and more, enabling customers to create high-performance parallel filesystems with modern consumption capabilities and that integrate easily into object storage endpoints for archival and data protection.
They enable organizations to place data on the most strategic storage technology automatically, based on rules defined by the data owner and stewards. As a result, the solutions support simultaneous high IOPS and high throughput capabilities against the same dataset, making it much more akin to a DevOps tool than a complex storage platform.
In one example from the manufacturing sector, our solution helped an organization achieve a 500% improvement in application response times. This allowed it to manage massive volumes of unstructured data in varying formats and from multiple sources in a central object store, at a time when growing volumes of manufacturing, sales and distribution data were stretching the company's storage infrastructure to breaking point.
The bottom line is that the five modern workloads of the enterprise will continue to put pressure on the traditional storage platform. Investing in modern storage solutions, such as scale-up and scale-out block storage, or parallel file systems with object storage, will not only relieve system stress, but put the organization on a path to greater performance.