See how content intelligence and native file format backup can make the most effective use of data, while ensuring the smallest footprint possible.
Rich Vining is a senior product marketing manager at Hitachi Vantara, responsible for a portfolio of data protection and copy data management products. He has more than 25 years' experience covering these and other secondary storage and data management technologies in a number of roles.
Award winning writer, producer and host with over 20 years of experience delivering
This is the Studio NEXT to podcast. I'm your host, Jeremy Brisiel. Let's talk about data IT operations and governance. Let's not just talk about it with me because that wouldn't be too in depth. Let's bring in some experts on that. Scott Baker and Rich Vining join us here today. Scott, could you introduce yourself and let us know what you do.
Right on. Scott Baker here. I work for Hitachi Vantara and my role is running the Product Marketing Team for our content, data intelligence and backup and recovery or data protection business.
All right, excellent. Thank you for that. And Rich Vining. Rich, could you tell us who you are and what you do?
Sure. I handle the product marketing for our data protection, backup recovery copy data management capabilities. I work for Scott.
All right, that's good to know. We keep the hierarchy around here. We understand how things go. So Rich, I want to stick with you for a second cause I think one of the things that we should talk about is that backup category 'cause there's a differentiation for Hitachi Vantara in that space. How do we handle it and what do we do different than everybody else?
Sure. The product that I represent is a Hitachi Data Instance Director or HDID, and it does a lot of different things. But one of the things that it can do is back up unstructured data files to our Hitachi Content Platform, which is an object storage platform: very scalable, very reliable. And what we do is we store the data in a different way than any other backup application. It stores the data in a native file format rather than some encapsulated blob that everybody else uses. And that allows us to add metadata to it. We grabbed the system metadata from the source who owned it when it was created, those kinds of things. Then we also make that object available to other things such as our content intelligence product, which can then index that data. It can add custom metadata to it. It learns what kind of data it is and makes that easier to search on and to find when you need to, when you need access to that data.
That access is the thing that's really special, right?
It is. Yeah. So, with that, what we can do is support any number of applications that need access to data without impacting the production applications. So this is a whole like a game changer. Now we can do things like big data analytics, we can support governance, audits and lots of other things, you know whether it be legal or finance who needs access to data. And then we can do this now without an impact on the production application.
That's the big one.
Yeah — if I could just jump in on that. What Rich was eluding to there, is when that dataset is kept in its raw and natural format and we can use products to index that. That means that as a business you can begin to do contextual level searches for things that you might want to restore and make available again. So imagine being able to go and find all of the files maybe in a backup set that include the word Hitachi Vantara, because you want to restore those. Competitors in the market tend to have things like single item restores, full system restores, et cetera. But to us, this is a game changer, not because we can offer it and it's new, but because it allows us to keep the data in its natural format. Not putting proprietary wrappers around. It helps us to power good for the rest of the business. We don't want to lock them in and keep them: a hostage in a hostage state where we own the data and we control where that data can go and how it can get used. That's not our place.
It's not the place. What it does is, too, it sounds like to me, is it creates an environment in which the business can explore how to power their good in different ways. Their resources aren't captured by the encapsulated data, no pun intended there, but they're actually free to recognize backup as an effective amount of data that they can get to as well. Now, if I'm mistaken on that, please correct me, but it seems like effective availability of data really powers good for customers.
Absolutely. You know, historically we would say that that data backups are great for recovery, but I think what Rich is talking about here with his product is that he's almost blending the business continuity component: recovering from a loss with the discoverability that you would traditionally get from an archive solution. So it's almost bringing both of those together to make the most effective use of the data with the smallest physical footprint that you absolutely need to store that data for recovery or discovery, in this case.
Yeah. And so do you see that, Rich, the differentiation there is, how significant is that for customers?
Oh, it's very, significant. I'll give you an example that's really kind of top of mind here. Lots of jurisdictions are implementing privacy regulations, right?
And this is for the good of the society. One of the most prominent is the EU's General Data Protection Regulation. That regulation has like 99 different provisions in it, but one of them is called the right to be forgotten. So EU citizen can request a company to say, you know, forget me, just delete all information that you have about me. And a company has to do that within a reasonable amount of time: a week to two weeks. And what people don't realize really is that that request requires you to delete that data from all copies of the data as well. The ones you have, the ones you provided to suppliers or data processors. But it also includes backup copies. And with using traditional backup, it is basically impossible to find that data and delete it without breaking the rest of the backup. So you basically nullify the entire dataset if you try to either delete that data or change it, make it randomized or whatever. So yeah, this is a game changer in terms of — we make it very easy to find the records or the data related to that person. You change that data, change the file to remove the information, or mascot or delete it, and it doesn't affect any of the other files that we've got backed up.
That's, it does sound, that's foundational difference right there. Like that's fundamental in every way.
Remarkable to hear that. And to know that this first iteration of the EU's regulations is [just] the first iteration. I know many will fight that, but the security, the safety, the privacy issues seem to be expanding. So to be able to do this, it seems fundamental to the future. That seems to be forward thinking and in a big way.
Yup. Yup. Absolutely.
All right. With that in mind, a thing about forward thinking: We used to just put things in a data center and then go get them when they were back there in the room. And that's not the way we're doing things anymore. Multicloud is not just a fad. All private is not — is no longer a thing. All cloud is not a thing. Multicloud is where we're at. Backup data, operations and governance, that's a whole new territory to deal with as well, Scott. How we do that?
Yeah. Absolutely. You know, so what's interesting is a lot of people will associate governance to mean risk mitigation or to respond to rules like GDPR, California Consumer protection Act, et cetera. But, from the perspective of data operations, we think about governance equally as being the responsibility on the business to set strategies to control data quality and availability. And what Rich has talked about here largely, along with the rest of us, is the backup aspect of the Hitachi Data Instance Director. Well, there's another aspect that really feeds the information supply chain for the business. And that's enterprise copy data management. So being able to decide what data needs to be copied, control where those copies go, and, more importantly, how long those copies should remain, is critical to minimizing that storage footprint that we're talking about: ensuring that the people that have the access to the data should have it, and, more importantly, that it's as high quality as possible. If we've got data that's out there that's been copied and people are making decisions on it and it's four or five months old, it's way out of date and it just really erodes the decision-making process.
Yeah. And that that's so fundamental, as well, for business, for customers to make those decisions. It doesn't help me if I can't actually be accurate with the data. Right? Like if it's a vague idea of something close to the right data, that's almost worse use.
Yeah, absolutely. One of our analyst friends did a study a couple of years ago and came up with a number. They said that on an average around the world, companies are storing 13 copies of every piece of information they have. And so, you think, okay, this data may be 1TB — will not have considered a need for 13TB of storage for the copies. The problem is that nobody really knows what's going on with those copies. As Scott said, you don't know who owns it, whether it's being used, how long is being retained. You're making copies of the copies. So our copy data management is really about automating the creation of the copies so you know who they're going to. We can control who has access to them. We can automatically delete them. We can also automatically refresh them. So if you have an operation like your developers who need a fresh copy of a database once a week, we can just create, we can just refresh that database, that backup copy, rather than creating a new copy every week.
That's the thing about what he just said there too. You know, when you're talking about DataOps, you're talking about data engineers, people that are responsible for building these models. The accuracy of the model that they have is always going to be directly aligned to the quality of the data that they get to work with. And if we can control how often that data gets refreshed, it's only going to improve the model so that when it moves into production for the data scientist or whoever else needs access to that data, the quality of the model and the results that it puts out will be predicated and based on the actual data that you're hoping to get, the results that you're hoping to get from that kind of data, I think that's incredibly powerful.
Yeah, and you mentioned it: I want to expand on that, too. Those controls, those acts. We've talked about access. Now we're talking about control. But you mentioned digital footprint and I mean there's no business I know right now that's like: Oh, we're fine. More is fine. Just keep giving us stuff to stack. There's literally none that I can think of other than storage, like physical, actual storage companies, who want more stuff to stack.
Right? Right. Yeah. If I'm a sales rep, you know, I want to sell more. Right. But, but I think the other thing too that we have to accept is the speed at which data gets created. The complexity of the data itself is so rampant that putting strategies in place to effectively fix these problems can be very difficult. So when companies begin to adopt this notion that not all data is created equal, and they also take a good hard look at it, removing this idea … no one's ever been fired for keeping the data. We begin to start evaluating what should be kept for the long term and what shouldn't. And if it doesn't need to be here, it's gone. And if it does, let's make it available to the right people using products that, that Rich has talked about in capabilities.
That's a fantastic. With all of that, before we say goodbye here, the competitive advantage, then, what is the space that this creates for Hitachi versus others? What do we see there now and how do we see it this year going into NEXT 2020 when we get back together and talk, and three years out?
Well, following up on the GDPR, for example, since nobody else has this capability, most companies are ignoring the problem and, they figure, okay, technically I'm not able to do this right to be forgotten exercise. Which is fine. You know, they're going to delete the data from the production system. They'll stop sending emails to their customer. But if that data ever gets restored, then it becomes live and then they get problems. They'll get a lot of very large fines if they don't have some way of remediating that. So they have to have like a runbook of all the actions they've taken from the production systems. And when they do the restore, they're going to have to go in and delete those files again. That's not something you want to do when you're doing a restore, 'cause you're under a time crunch. Typically, you know something's gone wrong and your systems are down. You want to get back up and running. So we avoid, we eliminate all of those problems.
That's a pretty significant advantage.
Yeah. You know, from my perspective, I really see this HDID product for us as being this, this workhorse that's going to connect all of the products that we have today in terms of where the live data lives, where the copies need to go and even make stronger connections into multicloud adoption for customers. So that appropriate movement of data will be based on its value to the business, not on the fact that someone thinks it needs to move. And that right there really excites me having the intelligence built into that so you know what data you have, where it's located and how it impacts your overall data landscape across that information supply chain that I mentioned.
Great. it's great to discuss these DataOps with you under the governance, under the way in which Hitachi has separated itself from its competitors. Rich, to your point: Hey, others can just ignore it. That's always worked out well. I don't know him. Anybody who hasn't been successful, just ignoring them — problem both in personal life and business.
I was wondering about the personal comment. Yeah, right.
And so I think it's great to see that it's the operators are in good hands. Thank you guys for coming out and sharing your story with Scott and Rich.
Thanks so much.
Thank you very much.
Stay connected with updates from Hitachi.
© Hitachi Vantara LLC 2020. All Rights Reserved.