Run:AI Creates First Fractional GPU Sharing for Kubernetes Deep Learning Workloads
By creating multiple logical GPUs on a single resource, Run:AI has built another key part of the technology for true, transparent GPU virtualization
Run:AI, a company virtualizing AI infrastructure, today released the first fractional GPU sharing system for deep learning workloads on Kubernetes. Especially suited for lightweight AI tasks at scale such as inference, the fractional GPU system transparently gives data science and AI engineering teams the ability to run multiple workloads simultaneously on a single GPU, enabling companies to run more workloads such as computer vision, voice recognition and natural language processing on the same hardware, lowering costs.
Today’s de facto standard for deep learning workloads is to run them in containers orchestrated by Kubernetes. However, Kubernetes is only able to allocate whole physical GPUs to containers, lacking the isolation and virtualization capabilities needed to allow GPU resources to be shared without memory overflows or processing clashes.
Recommended AI News: Nokia and ABI Research Identify Key Trends in Manufacturing Investment to Enable Industry 4.0
Run:AI’s fractional GPU system effectively creates virtualized logical GPUs, with their own memory and computing space that containers can use and access as if they were self-contained processors. This enables several deep learning workloads to run in containers side-by-side on the same GPU without interfering with each other. The solution is transparent, simple and portable; it requires no changes to the containers themselves.
To create the fractional GPUs, Run:AI had to modify how Kubernetes handled them. “In Kubernetes, a GPU is handled as an integer,” said Dr. Ronen Dar, co-founder and CTO of Run:AI. “You either have one or you don’t. We had to turn GPUs into floats, allowing for fractions of GPUs to be assigned to containers.” Run:AI also solved the problem of memory isolation, so each virtual GPU can run securely without memory clashes.
A typical use-case could see 2-4 jobs running on the same GPU, meaning companies could do four times the work with the same hardware. For some lightweight workloads, such as inference, more than 8 jobs running in containers can comfortably share the same physical chip.
Recommended AI News: SaltStack Research Finds Automation and Alignment are Vital to SecOps Success
The addition of fractional GPU sharing is a key component in Run:AI’s mission to create a true virtualized AI infrastructure, combining with Run:AI’s existing technology that elastically stretches workloads over multiple GPUs and enables resource pooling and sharing.
“Some tasks, such as inference tasks, often don’t need a whole GPU, but all those unused processor cycles and RAM go to waste because containers don’t know how to take only part of a resource,” said Run:AI co-founder and CEO Omri Geller. “Run:AI’s fractional GPU system lets companies unleash the full capacity of their hardware so they can scale up their deep learning more quickly and efficiently.”
Recommended AI News: Field Squared Selected by Health Testing Solutions to Automate Its Mobile Occupational Medical Services Operations
Comments are closed, but trackbacks and pingbacks are open.