Navigating the private, hybrid, and multicloud landscape can be complex. Ensure your success with an integrated approach to modernizing core infrastructure, apps, and data to achieve your objectives.
See Modernize the Digital CoreAdapt to the needs of future workloads with a modern edge-to-core-to-cloud infrastructure that delivers efficiency, agility, and resilience.
Explore Modern InfrastructureSupport the evolving demands of today’s enterprise workloads with modern data infrastructure.
Explore App InfrastructureDeliver services through a centralized, infinitely scalable platform, providing native S3 integration, faster data access, and reduced TCO across hybrid and multicloud architectures.
See Cloud App InfrastructureManage IT service levels, not infrastructure, with a central operating model across data centers, the edge, and public clouds.
Hybrid Cloud InfrastructureExperience VMware with greater simplicity, resilience, and agility. Your workloads, at the right performance, and the right cost. It's all in the Power of +.
Visit Hybrid Cloud with VMwareMaximize the value, scalability, and flexibility of your data with EverFlex infrastructure, data protection, and storage as a service.
Explore EverFlex XaaSGain visibility, fuel analytics/data-driven decisions, and speed innovation. Data modernization services and DataOps solutions help you take control of your data, from edge to core to cloud.
Explore Data ModernizationProtect customer and user experience with solutions to drive agility and availability to meet fast changing needs, prevent downtime, and guard against cyberattacks and other threats.
Explore Data ProtectionDrive total data quality, and cut time to insight from weeks to hours, with Lumada DataOps. Democratize access, simplify management, reduce costs, automate scalability, and more.
Explore Intelligent DataOpsManage data across cloud environments with DataOps. Enable governance and compliance, reduce risk, and leverage tribal knowledge to drive better decisions, insight, and competitive advantage.
Explore Cloud Data ManagementBuild data lakes and warehouses and make data actionable using DataOps for superior onboarding, cost optimization, protection, and discovery, using accurate, relevant data.
Visit Data Lakes & WarehousesDeliver automated, agile data workflows, from edge to multicloud environments, regardless of data volume, variety, or velocity.
Explore Data IntegrationJump-start your AI-driven enterprise with advanced analytics and trusted solutions for achieving proven business outcomes.
Explore AI-Driven SolutionsBuild, modernize, and manage critical apps across the enterprise and ecosystem with agility, while driving innovation and reducing TCO.
See Application ModernizationComprehensive services to optimize resilience and cost for always-on business. Design, build, run, and operate workloads across private, public, hybrid, and multicloud environments.
Visit App Reliability CentersPlan and migrate to cloud, and build cloud-ready applications while ensuring business agility and elastic scalability.
Explore Cloud ModernizationStay ahead by building a business-centric, customized application portfolio with next-gen technologies, using modern engineering principles, automation, and expertise.
Explore Digital EngineeringMaximize value from SAP ERP applications with our expertise and support. Leverage data to drive insights and create innovative solutions to your toughest business challenges.
Explore SAP ERP ModernizationOptimize your Oracle applications with our expert support. Improve agility, performance, and results through better business decisions and processes.
Visit Oracle ERP ModernizationDigital transformation is accelerating across all industries, driving the need for greater innovation, agility and resilience. Get there faster with data-driven industrial operations.
Visit IoT Solutions OverviewSmart spaces are emerging everywhere. Get started creating yours, leveraging insights from video, Lidar, and IoT to create smart spaces that are healthier, safer, more sustainable, and more.
Explore Smart SpacesCreate competitive advantage via data-driven insight, automation, and processes. Address production challenges, enable visibility, improve loyalty with predictive models, and more.
Explore Manufacturing InsightsLeverage visual intelligence solutions to automate your infrastructure and asset inspection processes to reduce risk, improve public safety, and extend life cycles.
Explore Inspection InsightsDeploy data-driven asset health and performance insights to keep your assets delivering optimum performance, safety, reliability, and value with Lumada Asset Performance Management.
Explore Lumada APMLumada Field Services Management is a scalable, intuitive inspection, maintenance, and repair solutions that equips mobile users to execute work orders with optimal efficiency.
Explore Lumada FSMLumada Enterprise Asset Management enables industrial organizations to optimize outcomes by managing physical assets throughout their life cycle at reduced operating cost and capital investment.
Explore Lumada EAMTransform your business supported by a trusted partner with deep experience in every aspect of data operations, across multiple industries and technologies.
Industry Solutions OverviewThe future belongs to those who capitalize on change. The right partner can help you accelerate digital maturity, create real customer value, and steer your path to success.
Financial Services SolutionsCapitalize on the value of data from across the business ecosystem to enable superior outcomes. Create end-to-end visibility, resilience, and responsiveness to drive industrial digitalization.
Manufacturing SolutionsStreamline operations and accelerate energy transition to create competitive advantage and reduce risk in an increasingly complex and dynamic business environment.
Energy and Utilities SolutionsTurn massive data stores and types into opportunities using intelligent automation. Enable better decisions, gain deeper patient insights, improve lives, and enhance your organization.
Healthcare & Life SciencesIntegrate data from disparate sources to build a foundation for omnichannel retail success. Reveal unique insights, optimize operations and deliver superior customer experiences.
Retail Industry SolutionsLeverage data to build and support successful, healthy communities and economies using innovative data solutions for national, state, and local government.
Solutions for GovernmentEnhance transportation safety, efficiency, and experience, and enable digital innovation and monetization across the passenger travel, transit, freight, and logistics markets.
Solutions for TransportationOur commitment to innovation is why we are consistantly ranked as the most reliable and trusted storage platforms available. Thats why 88% of the worlds leading global banks trust their data to Hitachi Vantara.
Explore MoreFor the last 22 years we offer 100% data availability guaranteed on all VSP models. That’s why 85% of the Fortune 100 trust Hitachi storage.
Explore MoreUnleash the potential of your unstructured data, our integrated approach accelerates the value of data by combining enhanced cloud data management and accelerated performance to meet modern application demands.
Explore MoreTrusted for more than 40 years Hitachi ensures mainframe solutions are helping you get the most out of your data no matter it lives.
Explore MoreHitachi gives you to power to handle any type of workload challenge. From intelligent storage to automated disaster recovery, we have a simple solution for you.
Explore MoreThis is the software that provides all the awesome features for our storage platforms. From data replication to virtualization, SVOS delivers it all!
Explore MoreNow your storage is self-driving with AIOps management, analytics and automation. One management suite for all your storage solution needs.
Explore MoreDistributed applications need the right storage. VSS has full VSP integration with enterprise support running on prem or in the cloud.
Explore MoreEasily enhance data protection to support business continuity and cyber resilience in ways that work best for your business -- in the cloud and on-prem.
Explore MoreAccelerate data-driven transformation. Deliver trusted data by optimizing your data fabric with an intelligent DataOps portfolio, to experience data from edge to multicloud.
Explore MoreIntelligent data discovery and transformation improves productivity by revealing insights more quickly to make your business smarter.
Explore MoreRapidly build and deploy data pipelines for end-to-end data integration and analytics at enterprise scale. Integrate data lakes, data warehouses, and devices, and orchestrate data integration flow across all environments.
Explore MoreTrust in Data by Mastering, Classification, Quality and Governance.
Learn MoreReduce Hadoop infrastructure costs with an intelligent data tiering solution using object storage for seamless application access.
Explore MoreIoT application assembly and enablement for enterprise-wide industrial data operations and information fusion. Power reliable and actionable insights for sustainable performance improvements.
Explore MoreAccelerate and scale operations application deployment with a complete IIoT data platform framework, including core, gateway, digital twins, and machine learning (ML) services.
Explore MoreToolkits that simplify industrial IoT software solutions delivery with packaged digital twins, machine learning (ML) services, and deployable ML models.
Explore MoreXaaS: Your infrastructure only better. Get the ability to scale as needed, guaranteed SLAs, and pay-as-you-go pricing to align IT costs with your business.
Explore MoreFast and flexible cloud consumption subscription service for storage. Enjoy proven performance of pay-as-you-grow storage and agile storage services with a cloud-based self-service management console.
Explore MoreLevel up data protection quickly and easily with cloud-managed pay-as-you-go, OPEX options for hybrid environments.
Explore MoreHitachi Vantara Professional Services will help accelerate your digital infrastructure deployment, speeding your benefits and providing you the skills to adapt and optimize IT change.
Explore MoreBringing the power of cloud to thousands of customers like you, innovative converged and hyperconverged solutions that increase product release velocity, simplify cloud operations, and reduce TCO.
Explore MoreRedefine enterprise IT with best-in-class performance and availability for your most critical business applications.
Explore MoreSimplify, scale, and save with next-gen HCI appliances delivering low-latency infrastructure and advanced policy-based automation to modernize IT.
Explore MoreIncrease business agility and automation with a seamless hybrid cloud powered by VMware Cloud Foundation.
Explore MoreFlexibility, scale, and performance to make tomorrow's technology work for you today.
Explore MoreAI-driven image-based inspection automates defect detection for equipment assets to lower costs, reduce risks, and enhance worker safety.
Explore MoreComprehensive services to optimize resilience and cost for always-on business. Design, build, run, and operate workloads across private, public, hybrid, and multicloud environments.
Visit App Reliability CentersA range of options to help you reinvent, innovate, transform, and scale. Rely on our global network of experts, supported by powerful assets, platforms, and partnerships.
Explore Consulting ServicesYour digital transformation journey starts here with experts ready to help you drive innovation, superior business outcomes, and value.
See Digital Strategy ServicesRearchitect core applications to optimize resilience and scalability, and drive superior experiences and innovation from the ground-up.
Learn About App ModernizationFast track cloud transformation by moving applications and workloads using our end-to-end cloud modernization services.
Explore Cloud ModernizationBuild exceptional real-time, data-driven experiences to drive customer and brand engagement while improving growth and business value.
Visit Digital Experience Services.Engage experts to create a modern data fabric, institute governance and DataOps, improve decision making, and optimize the value of enterprise data.
Learn About Data ModernizationThe experience, analytics, frameworks, and organizational knowledge to extract superior insight and value from your data, fueling competitive advantage and business growth.
Explore AI & Insights ServicesMaximize value from SAP ERP applications with our expertise and support. Leverage data to drive insights and create innovative solutions to your toughest business challenges.
Explore SAP ERP ModernizationOptimize your Oracle applications with our expert support. Improve agility, performance, and results through better business decisions and processes.
Visit Oracle ERP ModernizationDigitize your physical operations and assets to achieve data-driven modernization and achieve next-level business outcomes and ROI through industry-tested consulting services.
Learn More About IoT ServicesStreamline IT operations, enhance performance and agility, and drive growth with managed services for application and infrastructure management.
Explore Managed ServicesAccelerate depolyments and migration for improved operational efficiency and reduced TCO, with a range of Edge to Cloud Infrastructure Services.
Explore Edge to Cloud ServicesImprove competitive advantage and control of your infrastructure and service levels with expert strategy, design, and operational services.
Visit Infrastructure ServicesDeliver superior uptime while reducing risk with a range of data migration, storage implementation, data protection, and workflow automation services .
Explore Services for StorageSimple, seamless deployment and control of your complex private, hybrid, and multicloud Kubernetes environments and associated enterprise application ecosystems.
Learn About Kubernetes ServiceFree your own people to focus on driving innovation and growth with a variety of service levels and options to proactively prevent issues or disruptions to your business.
See Customer Support ServicesSingle point of contact and VIP support for ongoing guidance, best practices, and managing feature requests. Ideal for complex data management, integration, IoT, and analytics initiatives.
Lumada & Pentaho Support ServicesFlexible, expert tiered service portfolio designed to help you save time, control costs, and accelerate responses to your unique business and technical requirements and challenges
Visit Preferred Customer ServicesA variety of options for every step in your learning journey. Get up-to-speed fast, become a product or solution expert and get the most out of your valuable investment.
Learn More About TrainingComprehensive, two-tiered program to build your expertise and advance your goals and career. Track your progress and gain recognition through professional certifications and digital badges.
Learn More About CertificationGrow your business with our simple and profitable Partner Program. Our partner-centric approach means we're here to help you succeed.
Your Next OpportunityBenefit from solutions built with top technology and cloud partners that accelerate time to value and reduce risk.
Meet our Technology AlliancesTransform your business with Hybrid Cloud Services from our ecosystem of Cloud Service Providers with as-a-Service solutions that are flexible, predictable, secure and efficient.
Meet our Cloud Service ProvidersAccelerate your digital journey and gain competitive advantage with business and IT services that transform your business.
Meet our Global System IntegratorsMaximize the value of your technology investments with services that deliver outcomes for your business.
Meet our Service Delivery PartnersFind Hitachi Vantara authorized partners who are qualified to help meet your unique business needs.
Find an Expert PartnerDevelop new opportunities and accelerate your sales with configuration tools, demos, training, incentives and more.
Login NowHitachi Vantara, a wholly-owned subsidiary of Hitachi, Ltd., solves valuable digital challenges by guiding our customers from what’s now to what’s next. We make data centers more effective, harness the power of customers’ data and rapidly scale digital businesses. Companies choose us to help develop new revenue streams, unlock competitive advantages, lower business costs and enhance customer experiences.
Explore CompanyHitachi Vantara, a wholly-owned subsidiary of Hitachi, Ltd., solves valuable digital challenges by guiding our customers from what’s now to what’s next. We make data centers more effective, harness the power of customers’ data and rapidly scale digital businesses. Companies choose us to help develop new revenue streams, unlock competitive advantages, lower business costs and enhance customer experiences.
Explore CompanyThe Data Science Behind It All
Bill Schmarzo is regarded as one of the top Digital Transformation influencers on Big Data and Data Science. His career spans over 30 years in data warehousing, BI and advanced analytics. As the current CTO, Analytics and IoT for Hitachi Vantara, "The Dean of Big Data" guides the company's technology strategy and drives "co-creation" efforts with select customers to leverage IoT and analytics to power digital transformation.
For 19 years, Mauro has been developing analytical solutions for several industries, including telecommunications, media, e-commerce, finance, internet, retail, supply chain, health care and advertising, and in different functional areas such as marketing, finance and operations.
Hello everybody and welcome to episode five of Hitachi DataOps advantage podcast series. This is a podcast series where we talk about the trials and tribulations of organizations in their data off journey, especially with respect in many cases to how they move forward on their data life. As you know from our previous podcast at Hitachi Vantara, started down a Data Lake strategy, didn't have much luck and Renee, our CIO decided to reset and restart that. Hopefully if you've listened to the other podcasts that take you up to this point, because today we're going to go totally nerd. I've got Mauro Damo on this, on the line here who's our Senior Data Scientist. It's hard to find somebody in the organization is more nerdy than Mauro. Mauro, thanks for joining us today.
Thank you for the opportunity to talk with you. Yeah. It's a pleasure to be here.
So Mauro tell me. Let's start off by helping me understand what do you view the role of data science in this data-lake second surgery initiative we're doing at Vantara.
Yeah. Today, the data science role here is crucial, right? So it's very important. So, because in general what we see in the companies and the data lakes and EDP and we have all of BI business intelligence works, right? So we need to have like some dashboards we do in general, we look at the past. So BI worked at all the, almost all companies does. They know it's like very important because they ditch track can see all the historic on the progress stuff to company to KPI indicators. But actually what the assigned us, we work in the, in the future, right? So, we look ahead. So, we use the past to track and explain to future and try to, not to explain but predict the future. So, I think it is a main role here because, after we got the additional video and they meant, there's a workshop that all the stakeholders of the company was daring this workshop technology market team, uh, finance, saw a lot of people there. We are there. So the idea here is to how we can extract value from the, from the data. So in this, this case, this sounds data science is very important in his role is a push show, right?
I love your explanation Mauro, that, that you work in the future, right? You're kind of like Doc Brown and back to the future you're jumping your DeLorean and off you go into the future. It isn't sufficient for businesses to have reports to tell them what happened. They need to start transforming their mindset to thinking about the problems they can predict and the prescriptive actions they can take it. And I laid a term off earlier on, let me come up, kind of explain. I talk about this data lake second surgery problem, which is a lot of organizations built these data lakes. They just, they, they got some random technology, probably Hadoop and they put some random data in it and they hired some data scientists and waited for magic to happen and waited for magic to happen and waited for magic to happen. And it didn't happen. Part of it had to do with that, that BI mindset that looks at building reports and dashboards about what happened. But I think what you highlighted here is the business-critical nature of being forward looking as you try to look at what's going to happen with the business. So, Mauro as you jumped in your DeLorean and you jet it off into the future, what were some of the gotchas that you hit along that path? The things that kind of surprised you both pleasant and unpleasant. Right?
It's very interesting to be able to point that the try now, so I can see in general what is more unpleasant, right? So, it's like a VB. We think that the science is fancy work like a lot of algorithm, techniques, math and modelling, but 80% of our work is trying to extract the right data, how to understand the database, and learning how it works. The first time we saw EDP, there were 600 different types of tables. So, it's a huge data lake. It's very important that I have this data lake because we are still one step ahead of other companies, because some of them don't have a lot of data. Getting the data is very important. It's very difficult, right? So, to get the right data from the systems, takes a lot of time and I think this step we left behind. And right now, we are in good shape to create value from this data, but even if we have this data lake and all this data here, you still have 600 tables. We need to understand which one we're going to use for the use case. What is the best one? We are building our data management, so we are building our data catalogue. IT is still building this with the business unit. It's very important to have one because it's going to be much, much simpler. That worked for our data engineers and data science. And of course this is a focus on pleasure. And of course the data is not always the way that we want. Right? It's not academia, right? So it's not like a graphic. So, we have a missing data, have problems with the data, we have different types of problems that we need to solve and to apply some transformation, which one is better for the for the modelling. This kind of work is a very intense in big seller around 80% of the time for data science. I think this is one of the most pleasurable parts of the work. All that data sets like to do it right. We love to do it right, too. So it's not a problem, but it's sometimes very painful. I think that it's the way that we view the solution. I think it was very good in my humble opinion? And so, we review the actual three models, right? We talk all the model off model. Not just the young and all of the other data science, but all the IT team visits a unit. So we have a lot of people we talk with. So we need to have a lot of discussion about how to understand the data. Everybody, thank you for your help. Without you is not possible to build a model like this. In going back to the modern TESL, we did three types of models…So these three models are very important. I think it's the way that we are work together here. It's a very important and the way that it's a very amazing the outcome that we are getting.
Mauro were you surprised? Because I know I was surprised that when you build these models, the process you went through was very thorough, very interactive, very collaborative. And when you came out, you came up with a, you know, a propensity to buy model we think surround an 81% accuracy, which is really quite impressive. But what really struck me was you only need a three datasets, right? You think about how so many organizations in their data lakes, focused on spending so much time getting data into the data lake, do not spend enough time figuring out how to derive and drive new sources of value. So to me, I was totally shocked. I had always kind of hoped it was the case or thought it might be the case, but it was amazing to me to see that we only needed three datasets to generate an 81% accuracy model. Is that normal or is that sort of the exception?
We achieved this performance. It seems at the beginning of the project we got this DC, they're still available, but you understand exactly what, what the business eats, right. I think it's very important because we can go straight to the point, right? We don't need to do a huge big exploration; we can put all the data together and try to understand what the problem we've got is a problem. Right? So it's so funny. So I think a lot of companies make mistakes. Okay. Let's put all the data together and let's see what's it going to get? Right? But I think that this is not the right, not the best way, not the more intelligent way. The best way is to, let's see what the problem is. We can do what we need from the business side and then start to understand what it is we need. And then we, we go through all the data science processes, and I think that's is the accuracy that we got there is more than 81%. Some products reach 90%. If its only three data sets. It seemed very important to, to know exactly what you're doing and what data we are using. And this is the point. So based on the process, we are able to reach this. I think it's the accuracy level. Of course. It's not an intrusive process. The data helps a lot. The way that we build in the model helps too. We are different, we try different types of modelling. So we try, just for permissive to buy, we tried 30 different types of models. To consider iPad parameters and do different scenarios and datasets. And so the way that we'd be able to be trying to extract more value than we can from the data, you'd want to monitor the two and see both. We do exhaustive work, to build the best one. So, Bill, I think the process was very important to help us with the guidelines and set up expectations because sometimes, the businesses that have a high expectation about what they want. So, we set up this expectation because it is very important. We can't see the future, right?
You see the future as probabilities.
Sometimes we are right, sometimes we're wrong. This is, I think it's that we have a good motto, a good material to work. And of course this is not finish here. So this, I think it's a continuous process. We are going to hand over and to transition to marketing. I think it's very important to have a continue on this work and improve the model and see the benefits, you're going to achieve and how is the performance in the field is.
Great. Well thank you Mauro for your time. And I think the, the, the last observation that you've made is that the digital value enablement process highlights the importance of data science as a team sport, right? If by bringing together the best of what we have from a data science, data engineering, IT and subject matter expert perspective, we're able to become very laser focused on what we're trying to achieve. It helps us to figure out what data sources we're going to need. And what models we need to build, but equally important what data sources we don't need and what models we don't need. So, Mauro, thank you very much for your time. It was a great, a great observation. You, you lay it out some nerdy terms out there about hyper parameters. I'm glad to hear that. I'll have to go look those terms up now. Thank you very much. And for folks on listening to our podcast, I hope you subscribe. We're going to be doing a lot more of these podcasts, not only to continue to follow the internal Hitachi Vantara project, which we code named project champagne and we're going to be bringing other customers up through this process as well. So, if you want to learn more about DataOps, want to learn more about data science, you want to learn more about how to resurrect and save those data lakes, please subscribe to the Hitachi Vantara DataOps advantage.
I hope you enjoy this podcast and you certainly want to come back the next one as we talk again to more organizations about how they're leveraging data ops to drive value out of their data. If you want to learn more about Hitachi Vantara, track us on Twitter @Hitachi Vantara, or if you want to follow me, follow me @Schmarzo. I'm the only one on Twitter. Thanks for your time. Until next time, cheers.
Welcome to episode 6 of the first season of “Your DataOps Advantage,” a podcast series by Hitachi Vantara! In this episode, podcast host Bill Schmarzo (our CTO of IoT and Analytics) sits down with our...
Learn MoreAn ebook guide to the pioneering Hitachi Vantara and Kyndryl partnership.
Learn MoreAn infographic guide to the pioneering Hitachi Vantara and Kyndryl partnership.
Learn More
Thank you. We will contact you shortly.
Note: Since you opted to receive updates about solutions and news from us, you will receive an email shortly where you need to confirm your data via clicking on the link. Only after positive confirmation you are registered with us.
If you are already subscribed with us you will not receive any email from us where you need to confirm your data.
Explore more about us:
To Talk to a Representative,
Call 1.678.403.3035
Hitachi Data Systems, Pentaho and Hitachi Insight Group have merged into one company: Hitachi Vantara.
The result? More data-driven solutions and innovation from the partner you can trust.
REAN Cloud is now a part of Hitachi Vantara.
The result? Robust data-driven solutions and innovation, with industry-leading expertise in cloud migration and modernization.
Waterline Data is now Lumada Data Catalog, provided by Hitachi Vantara. Lumada Data Catalog, available stand-alone, is now part of the Lumada Data Services portfolio.
This site uses cookies from Hitachi and third parties for our own business purposes and to personalize your experience. By using this site, you agree to the use of cookies. For more information, visit Hitachi Cookies Policy.