Rafay Systems Powers AI and Machine Learning Applications at the Edge by Streamlining Operations for GPU-based Container Workloads
Quickly launch and easily manage production-grade Kubernetes clusters for AI and machine learning applications at scale with Rafay
Rafay Systems, the leading platform provider for Kubernetes Operations, announced the expansion of the industry’s only turnkey solution for operating Kubernetes clusters with GPU support at scale by adding powerful new metrics and dashboards for deeper visibility into GPU health and performance.
The Rafay Kubernetes Operations Platform (KOP) now features a fully integrated GPU Resource Dashboard that visualizes critical GPU metrics so developers and operations teams can seamlessly monitor, operate, and improve performance for GPU-based container workloads – all from one unified platform.
Kubernetes has rapidly become the preferred orchestration layer for enterprises that need the ability to provision and operate GPU-enabled, AI and machine learning applications in the cloud and at edge/remote locations.
Download Our Top Whitepaper : Building Reliable and Secure Fintech Systems in 2022
According to 2022 Gartner® Emerging Technologies: Edge Technologies Offer Strong Area of Opportunity — Adopter Survey Findings*, “The primary objectives for respondent organizations investing in and adopting edge technologies are to improve employees productivity (41%) and automate business processes (39%). This aligns with existing Gartner research (see Emerging Technologies: Use-Case Patterns in Edge AI) that edge AI is being used to improve business processes, delivering automation and productivity gains that translate into measurable ROI, such as cost savings.”*
However, as enterprises rapidly increase the number of AI and machine learning workloads, addressing several challenges such as visibility and monitoring helps prevent significant delays in application deployment and wasted costs associated with idle or underperforming GPUs in the clusters.
For example, a factory that increasingly relies upon real-time video detection applications powered by AI needs a standardized approach for cross-functional teams to manage the IT infrastructure and applications. The following challenges often result in operational fragility and lack of repeatability that hinders productivity:
- Flawed or overly restrictive access and visibility for developers and operational personnel that need GPU metrics on demand to tune and optimize GPU workloads.
- The struggle of hiring or training a team of experts and spending months to develop, operate and maintain a customized monitoring infrastructure to scrape and centrally aggregate GPU metrics.
- The complexity of developing and maintaining an integration with corporate single sign-on (SSO) systems to provide role-based access to metrics and dashboards.
- Accounting for the organizations’ GPU-enabled workloads that are developed and maintained by external entities (e.g., partners and ISVs). These entities also need visibility to GPU metrics to ensure the workloads are performing optimally.
Recommended AI News: Adjust and Apptopia Research Reveals 902% Growth in Crypto Apps in Q4 2021
Rafay KOP solves these challenges by providing enterprises and trusted external entities with a zero touch experience for automated and centralized aggregation of critical operational metrics for GPUs for the entire fleet of Kubernetes clusters. Rafay’s Zero-Trust Access Service with SSO integration enables seamless role-based access to ensure only authorized developers, external partners and operational personnel can gain secure access and visibility into GPU metrics from the console.
“Rafay makes spinning up GPU-enabled Kubernetes clusters incredibly simple. In just a few steps an enterprise’s deep learning and inference projects can be fully operational,” explained Mohan Atreya, SVP Product and Solutions at Rafay Systems. “Not only do we provide the fastest path to powering environments for AI and machine learning applications, but the combination of capabilities in Rafay KOP enables scalable edge/remote use cases with support for zero-trust access, policy management, GPU monitoring and more across an entire fleet of thousands of clusters.”
The new GPU Resource Dashboard that streamlines the orchestration of GPU-based container workloads has been fully integrated into the Rafay KOP and teams can take advantage of many additional benefits of the SaaS platform today including:
- AI/ML Application Deployment Automation: Rafay KOP allows organizations to avoid spending months or years developing a custom platform just to provision and manage GPU-enabled Kubernetes clusters for bare metal, virtualized and cloud environments.
- AI/ML Cluster and Workload Standardization and Consistency: Rafay KOP’s Cluster Blueprints standardize and govern clusters and workload configurations across a fleet. Enterprises can detect, be notified, and/or block configuration changes to Kubernetes clusters.
Recommended AI News: Lytics & Amazon Partner to Bring First-Party Data to Amazon DSP for Improved Campaign ROI
[To share your insights with us, please write to sghosh@martechseries.com]
Comments are closed.