Hear how DataOps collaborative data management practices can bring data silos together to connect data producers with data consumers in the age AI.
Jeremy Brisiel is an award-winning writer, producer and host, he has more than 20 years of experience delivering creative content to audiences around the world.
Lothar Schubert is Head of Product Marketing at Hitachi Vantara. With a mission to market with authenticity to create customer value and drive financial and social impact, he has been driving marketing programs successfully for 15 years in various leadership roles. His focus and passion is accelerating and scaling commercial success from technology innovations benefiting businesses and society.
This is the Studio NEXT podcast. I'm your host Jeremy Brisiel. We've talked about a lot of things from NEXT ’19 and DataOps is something we've mentioned. Let's talk more about DataOps. Let's not do it just with me. Let's welcome in a guest. Sir, could you say hello? Let us know who you are and what you do.
Hi, my name is Lothar Schubert. I'm in product marketing with Hitachi Vantara focusing on Lumada.
I like how that rolled off there Luther. It's focusing on Lumada. It's a very good sound. It's an excellent portfolio as well. So, DataOps is a phrase and even in the general session, the first day at NEXT we mentioned it is a phrase that's going to be, I think one of the things we look back on and say, we didn't talk about this a lot and now it matters a great deal. What is DataOps?
Yeah, DataOps is a lot talked about here at the conference and I'm thinking, we'll hear more and more about DataOps. It's certainly something which is real. It's not just a hype. I think that the most important thing to point out is: Where does DataOps come from? It really, if you look at the term here at some terminology from DevOps as well. But really it's a collaborative data management practice here with a high degree of automation as well to better connect data and data producers with data consumers in the age of, well AI.
Thank you for breaking that down and for letting us to understand more and more what those connections are, what DataOps means, and does it tip to the DevOps understanding. So why do we need it? So it's great that we have it and it certainly makes sense in the era we're in, but why do we need it?
And by the way, I wouldn't necessarily say that we fully have it yet. Yes, we use the term, we have all the philosophy, we have all the methodology. I think it's building right now and we have some of the underpinning technology, but it's certainly a practice that is still developing. But why do we need it? I mean if you look at it, data management processes are pretty much broken at that point here where we are today and very much like data silos that we have across the organization. It's very hard to bring those data together to drive really the very insights. Right? And I mean this is not a new idea or discovery kind of things. I mean data management and business intelligence and whatnot has been around for like 10 or 20 years for things, but it's really at a breaking point here right now. If you look for example like the advances that we made in AI and machine learning and deep learning, they all have a huge appetite for data. But data scientists for example, they are spending probably 80% of the time on integrating and massaging and curating data. Data scientists, they are almost the highest paid and scarcest resources that we have right now, right? So we need to fundamentally change those processes and those approaches. Also the amount of data, which we are inundated with and which really can provide business value here. But it's unsustainable as the current management processes. If we look at edge, for example, 75% and three quota of all data is soon going to be created and processed at the edge. Right? We can do this with traditional data management approaches.
Yeah. Well, we do need it now. And so I guess the question is, cause we've been talking, as you said, like 10 or 20 years, we've been working through this, we've recognized that this is the, call it the information age to begin with, which of course could have been the data age, which would have made perfect sense. Data lakes had been a theme and a conversation for years and years. Now, why does DataOps now as a philosophy and as a strategy, why 2019? Why is it 2019 that everybody's getting on board and going, you know what? I think a better practice would be a good idea.
That's a really good point, right? Why not before, I mean we had data products before. I mean the one side, yes, of course, it's a demand. I would say it's kind of, because it's coming really to a breaking point. There are risks for the factors that I mentioned: Edge certainly plays, multicloud. Data is more distributed than ever, demands from an AI perspective. But then also from the technology and from a practice perspective: From a technology perspective, a maturity of technologies which really help is automation and allowing more agile data foundation. Things that come to mind are containerization. I mean containers have been around for quite some time now, but it pretty much standardizing now Kubernetes and Docker and those kinds of technologies which allow a more distributed data fabric. That's one thing. Metadata management and catalogs also play a big role there. And, we have seen great improvements in this area as well. And then I talked about machine learning. Machine learning also can be applied to running a more policy and policy-based/driven automation as well. So we are in a really unique spot here right now. And then, I mean, you mentioned data lakes for example, right? In fact, a data lake technology in the past had been, Hadoop and MapReduce and those kinds of things. But to that led actually to … first of all, it's very expensive to manage those things. And that led to many, like what people say, data swamps and curated things and technologies, which had been pioneered by AWS, for example. We have … an object store, and we see those technologies now also going not just into single clouds, but in hybrid environments.
So with that in mind, so with DataOps practice, what would an example of the DataOps in practice look like? What is one?
Let's take an example. I like in one industry, for example, is self-driving cars, right? Automated, assistant driving systems. Yep. We have petabytes of data, which are created by each of cars, right? Think of Tesla. That's pretty much like every car manufacturer also… a supplier is investing in those areas and doing research and not just research but doing real investment. So you have petabytes data are created at the edge of video, lighter data, others sends and feeds as well. Well you need to consolidate those data, bringing together, um, data lakes certainly is a good place for this, but you also need to understand what's in those data. You need to tag those data as well and saying, well, this is a traffic condition in a certain weather pattern here … being passed by a truck or something at a traffic light kind of thing. So it needs to be analyzed and needs to be tagged accordingly. And then what do you do with this data? You need just data for data scientists, who need to be able to find this data and run and train data models based on it. And once the state of models have been developed, they need to be put back into production, like inference at the edge. Because that's just the life cycle from the AI development. There's … privacy aspects and data governance aspects of it as well. So all this data needs to be managed in a much more repeatable and automated fashion because it's just not possible in a manual way.
And you brought up the self-driving cars and made me think of the connections between the internet of things and data operations. So because as you said at the edge … this is where all these things are going. So much data is going to be produced ... petabytes.... And it's funny how quickly you just threw out petabytes produced by one car at a time: that that size is what we're dealing with in the internet of things is so much. What is that connection? What's it look like for dataOps?
This is examples that are provided, but we also see the same here like in manufacturing and other areas in public infrastructure. For example, in smart buildings as well, that we see the data flow happening from the edge into the data center into the cloud and also from back into the edge, and run the analytics and the machine learning, also called inference, really where the decisions I meet needed at this time in real time, often. So specifically breaching the edge with the data center is something we have not seen very much in in the past year. But that's a key area of focus, for example, for Hitachi Vantara also to bring those worlds together. For example, when all of our latest products here, I want to pitch here, but Lumada Edge Intelligence that we announced is a fully containerized environment for the edge. And what it allows you by having, also creating a registry at the edge, is it allows us to do incremental updates. Also, it's, for example, it's machine learning models shift and drift and change all the time.
You need to push them back into the edge very rapidly and way quickly. And you might need to do this in a low bandwidth environment like a mining scenario or offshore drilling scenario, for example. And this containerized approach lends itself really well for that.
Thank you for breaking that down, showing how that connection exists. So let's say that I am in charge of my businesses resources and I've heard of it, DataOps, can I go buy some dataOps? How does one, how does one acquire some DataOps?
Yeah. Unfortunately, you cannot really buy DataOps. Obviously there's no DataOps platform or anything that you can buy off the shelf. I mean, first of all, it's a practice, it's a methodology. It's certainly supported by technology, but there's also culture and organization elements to it as well. Just in the similar way you could not buy DevOps in the past, You cannot buy DataOps as well, but you certainly can establish a DataOps practice and some organizations, forward-thinking, leading organizations, are already on the way to doing that.
And what elements do we excel at in this space? What do you think brings it all together?
Well, as mentioned, it is really about: Technology is certainly part of it. It's from practice. It's also processes. There are cultural and organization elements which all need to come together at the same time to make it happen. Just one element, just technology by itself won't solve it, but also just culture and the organizational aspects won't solve it by itself.
So it sounds like it's a full tech and human sort of loop that has to be addressed.
Very much so. Yeah.
So within that, how does a Hitachi Vantara support data operations or what is that connection there?
So from our perspective here, I mean, first of all, we have been doing it, ourselves and been practicing it within the organization – even within my department on the marketing side, deploying the dataOps practices as well. But how we support our customers with this is both from a professional services, but also from a technology perspective.
From a technology perspective, we actually announced here at the NEXT conference the Lumada Data Services, which is a software services based approach to establish a data fabric for organizations. It stretches from edge to multicloud to enable a DataOps practice. It relies very much on automation. So think about, as a business user, as a data engineer, you have a control plane where you can define policies based on what you want to do with those data. For example, policies for data curation policies for data, data compliance, policies for cost savings, et cetera. And then you have the policy engines behind this, which also leverages machine learning and other analytics to execute. The data flows behind the scenes for you. And you can do this in a collaborative fashion so you can bring together different parties of the organization.
Oh, sounds, I mean, that's a great space to be. And so now I can't buy DataOps, but what should one do to follow up? What would be the next step from here to go explore this concept and this practice?
Yeah, I mean, there's a couple of directions you can take it here. One thing to learn more about data, DataOps, you certainly can find rich information on our website from Hitachi Vantara. I think if you go to https://www.hitachivantara.com/en-us/company/dataops.html, for example, you can find three white papers specifically on that topic, which break it down: DataOps for value, for example, or the DataOps supply chain, or the cultural aspects of DataOps. So that's one area you can go into. You can study the technologies that we are providing for that, which is Lumada Data Services, and there's also engagement services that we provide. as well, to kickstart your journey.
Those are great next steps. Thanks so much for being with us.
Thanks for the opportunity.
Stay connected with updates from Hitachi.
© Hitachi Vantara LLC 2020. All Rights Reserved.