OctoML Secures $28 Million to Accelerate ML Model Deployment

By AIT News Desk On Mar 18, 2021

New funding round underscores high demand for early access to OctoML’s machine learning acceleration platform that deploys to any hardware, cloud provider or edge device

OctoML announced it has raised a $28 million Series B funding round, bringing the company’s total amount raised to $47 million. Addition led the round with participation from existing investors Madrona Venture Group and Amplify Partners.

Built on Apache TVM, the ML open-source stack that powers the “Alexa” wake word and Qualcomm’s machine learning software, OctoML is an ML acceleration platform that automatically maximizes model performance while enabling continuous deployment. The company’s flagship product, Octomizer, enables engineering teams to deploy models in hours — not months — on any hardware, cloud provider, or edge device.

“Machine learning has become mission-critical in virtually every industry, yet getting models to production remains labor intensive, slow, and cost-prohibitive,” said Luis Ceze, CEO and co-founder of OctoML. “While ML spend is on the rise, 90 percent of models don’t make it to production. This is because improving model performance without sacrificing accuracy requires endless manual optimizations and fine tuning, especially given the growing stack of ML software and hardware backends.”

Cash Cow Marketing Releases New Guidance on Preparing for the AI-Powered Future of Search

Jul 17, 2026

For Multi-Location Brands, 20% of Their Locations are Invisible on AI Search

Jul 17, 2026

Jose Zuma Releases AI Visible: A Guide to AI Search Visibility

Jul 17, 2026

Prev Next 1 of 43,342

Founded by the team that created open-source Apache TVM, OctoML aims to make machine learning fast, useful, and accessible to any organization, large and small. Companies like AMD, Qualcomm, Bosch, and Microsoft are already using OctoML’s technology to increase model throughput, reduce inference costs, and accelerate their time-to-market. Early results show performance improvements of up to 30x without compromising accuracy.

Ceze adds, “The goal is to enable our customers to extract full value and efficiency from their hardware investments (CPU, GPU, SOCs and accelerators). By using ML to optimize ML, we reduce the optimization and tuning time by orders of magnitude. A 30x boost in performance translates to 30x savings in compute cost.”

Octomizer already supports a wide variety of ML frameworks like PyTorch, TensorFlow, and ONNX serialized models as well as hardware backends like NVIDIA/CUDA, x86, AMD, ARM, Intel, MIPS, and more. Recently, the company was able to beat Apple’s Core ML 4 on Apple M1 by improving model performance by 1.5x.

“When we first met Luis and the OctoML team, we knew they were poised to transform the way ML teams deploy their machine learning models,” said Lee Fixel, Founder of Addition. “They have the vision, the talent and the technology to drive ML transformation across every major enterprise. They launched Octomizer six months ago and it’s already becoming the go-to solution developers and data scientists use to maximize ML model performance. We look forward to supporting the company’s continued growth.”