NVIDIA Unveils AI Platform to Minimize Downtime in Supercomputing Data Centers
NVIDIA Mellanox UFM Cyber-AI Platform Detects Security Threats, Predicts Network Failures and Guides Predictive Maintenance
ISC Digital—NVIDIA unveiled the NVIDIA Mellanox UFM Cyber-AI platform, which minimizes downtime in InfiniBand data centers by harnessing AI-powered analytics to detect security threats and operational issues, as well as predict network failures.
This extension of the UFM platform product portfolio — which has managed InfiniBand systems for nearly a decade — applies AI to learn a data center’s operational cadence and network workload patterns, drawing on both real-time and historic telemetry and workload data. Against this baseline, it tracks the system’s health and network modifications, and detects performance degradations, usage and profile changes.
The new platform provides alerts of abnormal system and application behavior, and potential system failures and threats, as well as performs corrective actions. It is also targeted to deliver security alerts in cases of attempted system hacking to host undesired applications, such as cryptocurrency mining. The result is reduced data center downtime — which typically costs more than $300,000 an hour, according to research by ITIC.
Recommended AI News: The Shift To Cloud Computing Persists As Organizations Use Multiple Public Clouds
“The UFM Cyber-AI platform determines a data center’s unique vital signs and uses them to identify performance degradation, component failures and abnormal usage patterns,” said Gilad Shainer, senior vice president of marketing for Mellanox networking at NVIDIA. “It allows system administrators to quickly detect and respond to potential security threats and address upcoming failures, saving cost and ensuring consistent service to customers.”
Organizations that have long been employing the UFM platform in their data centers have expressed strong interest in the latest offering.
Allan Williams, associate director of services and technology at the National Computational Infrastructure (NCI Australia), said: “NCI plays a pivotal role in the national research landscape. Our supercomputing infrastructure serves 5,000 researchers who use it for critical national and global activities. UFM enables us to effectively manage our supercomputers and to optimize performance. We look forward to utilizing the new capabilities of UFM Cyber-AI to enhance even further our supercomputing utilization and improve our return on investment.”
Douglas Johnson, associate director of the Ohio Supercomputer Center, said: “We have been using the UFM platform for years in our InfiniBand data centers. UFM and the expertise from the Mellanox networking team have been fundamental ingredients in the management of our network and the stability we’ve achieved. We see great advantages in the UFM Cyber-AI platform.”
Recommended AI News: Rapid Compliance Solutions, LLC Introduces RRP Compliance Manager
Extending UFM Platform
The UFM Cyber-AI platform complements the UFM Enterprise platform, which provides network monitoring, management, performance optimization, configuration checks and secure cable management.
NVIDIA also added today a third member of the UFM family, the UFM Telemetry platform. This tool captures real-time network telemetry data, which is streamed to an on-premises or cloud-based database to monitor network performance and validate the network configuration.