Review of Big Data Workloads in the Cloud Exposes Enormous Waste, Opportunities for Optimization
Pepperdata, the leader in Analytics Stack Performance (ASP), announced the release of its inaugural “Big Data Performance Report” for 2020. The report was compiled after reviewing comprehensive data on the applications contained in the company’s largest enterprise customer clusters, representing nearly 400 petabytes of data on 5000 nodes. This equates to 4.5 million applications running in a 30-day timeframe. The report provides insights into the enormous compute waste that occurs with big data applications in the cloud.
Pepperdata research shows how IT operations teams are dealing with this challenge. The new “Big Data Performance Report” reveals that, within enterprise data applications that are not optimized by solutions that allow for observability and continuous tuning, there exists enormous waste—and tremendous potential to optimize and reduce that waste.
The shift to cloud computing is solidly underway. As Statista reports, “in 2020, the public cloud services market is expected to reach around $266.4 billion U.S. dollars in size, and by 2022 market revenue is forecast to exceed $350 billion U.S. dollars.” However, as the cloud expands, so does cloud wastage. As more complex big data applications migrate, the likelihood of resource misallocation rises. This is why, as Gartner reports, “through 2024, nearly all legacy applications migrated to public cloud infrastructure as a service (IaaS) will require optimization to become more cost-effective.” Without this optimization, the data highlights there will be overspend.
“When we analyzed the data, we were amazed to see how much underutilization and other wasted resources there were—unnecessarily driving costs up,” said Joel Stewart, VP, Customer Success, Pepperdata. “The failure to optimize means companies are leaving a tremendous amount of money on the table—funds that could be reinvested in the business or drop straight to the bottom line. Unfortunately, many companies just don’t have the visibility they need to recapture the waste and increase utilization.”
The research from Pepperdata sheds further light on the nature of cloud wastage. For instance:
- Spark clusters and jobs are dominating spend across clusters. This is where the highest amount of net wastage was found.
- When it comes to wastage, failures are important. Job failures cause serious performance degradation, and consume significant computational resources. In an unoptimized dataset, Pepperdata sees a wide range of failure rates across clusters. Some clusters will fail above 10%, and Spark applications tend to fail more often than MapReduce.
- Prior to implementing Spark optimization: Across clusters, within a typical week, the median rate of maximum memory utilization is a mere 42.3%. The underutilization here represents two states: not enough jobs running to fully utilize the cluster resources or the jobs are wasting resources.
- Prior to implementing cloud optimization: Comparing jobs used and wasted, the average wastage across 40 large clusters is 60+%. This wastage takes an interesting form; typically, with 95% of jobs, there is little wastage. Major wastage is usually found in 5% to 10% of total jobs.
Recommended AI News: Verisk Announces New ISO ClaimSearch Accelerator
This is why optimization is inherently such a needle-in-a-haystack challenge, and why machine learning can be such a help. Studies show that ML-powered statistical models predict task failures with a precision up to 97.4%, and a recall up to 96.2%. Applied to Hadoop, the percentage of failed jobs is reduced by up to 45%, with an overhead of less than five minutes.
Cloud optimization delivers big savings. According to Google, even low effort cloud optimization efforts can net a business as much as 10% savings per service within two weeks. Cloud services that are fully optimized and running on extended periods (over six weeks) can save more than 20%.
The research showed:
- With the visibility afforded by real cloud optimization, three quarters of customer clusters immediately win back task hours.
- Most enterprises are able to increase task hours by a minimum of 14%. Some enterprises are able to increase task hours by as much as 52%.
- 25% of users are able to save a minimum of $400,000 per year. At the higher end, the most successful users are able to save a projected $7.9 million for the year.
To cut the waste out of IT operations processes and achieve true cloud optimization, enterprises need visibility and continuous tuning. This requires machine learning and a unified analytics stack performance platform. Such a setup equips IT operations teams with the cloud tools they need to keep their infrastructure running optimally, while minimizing spend.
Recommended AI News: Tilr Launches New App Matching Job-Seeker Skills