Varada Open-Sources Its Workload Analyzer to Help Data Teams Optimize Data Lake Queries
Workload Analyzer gives data engineers holistic visibility into performance of Presto® clusters, enabling resource optimization and improved service to business-wide users of Big Data analytics
Varada, the data lake query acceleration innovator, announced that it has open-sourced its Workload Analyzer for Presto, including both Trino (formerly known as PrestoSQL) and PrestoDB, making the source code available to everyone via Github. The Workload Analyzer is a free, easy-to-use tool that offers visibility into how Big Data and analytics workloads are performing, offering users insights into how to improve performance and optimize resources.
“We’re already seeing this tool used in amazing ways”
“Presto democratized Big Data, exponentially expanding the number of business users that can ask questions to a Big Data infrastructure and enlarging the number of underlying data sources they can query,” said Ori Reshef, vice president of products at Varada. “But as the number of users within an organization grows, the challenge of DataOps teams is to keep queries running quickly, delivering results in a timely way so that those users can do their jobs. Unfortunately, DataOps teams are only able to get bits and pieces of the information they need to optimize resources from Presto itself. So Varada built the Workload Analyzer to give DataOps teams deep and actionable insights.”
Recommended AI News: Superb AI Secures $9.3 Million to Extend Leadership in Training Data Automation
The Workload Analyzer collects details and metrics on every query, aggregates and extracts information, and delivers dozens of charts describing all the facets of cluster performance. For the first time, data engineers have a holistic view of their cluster and can drill down into pain points to determine what queries to optimize and how.
The Workload Analyzer is compatible with PrestoDB and Trino. The Workload Analyzer script runs safely within the Presto cluster in the user’s Virtual Private Cloud (VPC), collecting and analyzing query statistics (JSONs). No data leaves the cluster and the tool does not require any external resources. The Workload Analyzer has already been tested on dozens of massive scale production clusters, resulting in zero impact on query performance.
Recommended AI News: Intelligent Proliferation: Using Artificial Intelligence Effectively in 2021
Using the Workload Analyzer, data teams can:
- Learn how resources are used on an hourly and weekly basis and define scaling rules
- Identify heavy spenders and improve the pipeline
- Improve predicate pushdown and significantly reduce IO and CPU
- Identify “hottest” data
- Improve JOINs performance
- Provide a better production roll-out experience and identify upgrade risks upfront
“We’re already seeing this tool used in amazing ways,” said Reshef. “For example, one company is using it as a quality assurance tool for daily tests on large clusters. Another is using it for strategic planning to understand the best data sets to query for business users, while allocating resources effectively to significantly reduce costs. The number of use cases continues to rise.”
Presto: A Tool of Choice for Data-driven Companies
Presto is an open source distributed SQL query engine for running interactive analytic queries. Presto offers many benefits, most notably its ability to quickly run queries on a wide variety of data sources all at once, including ‘raw,’ unmodeled data. With this capability, as well as other unique advantages, Presto has quickly become a tool of choice for many significant data-driven companies.
The Varada Commitment to the Trino and PrestoDB Communities
“As part of our deep commitment to the PrestoDB and Trino communities, Varada decided to release a standalone, open source version of our Workload Analyzer tool so that any Presto user can evaluate potential performance improvements in their cluster,” said Eran Vanounou, CEO of Varada. “The tool will help PrestoDB and Trino users optimize their clusters on their own using their existing solutions. Of course, we anticipate that after discovering the existing inefficiencies within their clusters, many users will want to further evaluate how adding an indexing layer to PrestoDB or Trino can help them vastly improve performance. We will be more than happy to demonstrate how the Varada Data Platform can do just that.”
Varada leverages Presto in its innovative query acceleration engine, the Varada Data Platform. A big data infrastructure solution for fast analytics on thousands of dimensions, the Varada Data Platform became generally available in December 2020. Varada’s proprietary indexing layer runs on top of Presto, improving Presto’s query response time by x10-x100.
Recommended AI News: Panos Kozanian, Former Cisco Engineering and Operations Executive, Joins Five9
Copper scrap environmental stewardship Copper refining processes Metal repurposing services
Export of Copper cable scrap for recycling, Scrap metal reclaiming yard services, Scrap copper procurement