Numenta Achieves 123X Inference Performance Improvement for BERT Transformers on Intel Xeon Processor Family
Numenta neuroscience-based AI technology is enabling deep learning models to achieve two orders of magnitude throughput speed-up while maintaining low latencies
Transformative Possibilities for NLP and Other Real-Time Applications Through New Numenta Beta Program
Applying its decades of neuroscience research to the development of deep learning technologies, NumentaInc. is reporting groundbreaking performance achievements in AI. In collaboration with Intel, Numenta reports it has achieved unparalleled performance gains by applying its brain-based technology to Transformer networks with Intel Xeon processors.
“This type of innovation is absolutely transformative for our customers, enabling cost-efficient scaling for the first time”
Numenta is highlighting these remarkable results on two Intel products announced today, the 4th Gen Intel Xeon Scalable processors (formerly codenamed Sapphire Rapids) and the Intel Xeon CPU Max Series (formerly codenamed Sapphire Rapids + HBM). These results demonstrate the first commercial applications of Numenta’s technology in Conversational AI solutions.
Recommended AI News: Jellyvision Acquires Picwell to Power Highly Personalized Employee Benefits Experiences
Breaking Latency Barriers in Conversational AI
To allow consumers to engage in human-like interactions with computers, high throughput, low latency technologies are a requirement for Conversational AI, a rapidly growing market projected to be a $40 billion industry by 2030. Transformer networks are the deep learning model of choice for these applications. But despite their high accuracy, the size and the complexity of Transformers have made them costly to deploy, until now.
In a remarkable example leveraging Intel’s new Intel Advanced Matrix Extensions (Intel AMX), Numenta reports a stunning 123X throughput improvement vs. current generation AMD Milan CPU implementations for BERT inference on short text sequences, while smashing the 10ms latency barrier required for many language model applications. BERT is the popular Transformer-based machine learning technology for Natural Language Processing (NLP) pre-training developed by Google.
Combining its proprietary technology with 4th Gen Intel Xeon Scalable processors, Numenta also reports a 62x throughput improvement over Intel’s previous generation of Intel Xeon Scalable processors.
Numenta’s dramatic acceleration of Transformer networks delivers high throughput at ultra-low latencies for inference with 4th Gen Intel Xeon Scalable processors. These results illustrate a cost-effective option for running the large deep learning models necessary for Conversational AI and other real-time AI applications.
“These breakthrough results turn Transformers from a cumbersome technology into a high-performance solution for real-time NLP applications and open up new possibilities for companies with performance sensitive AI applications,” commented Subutai Ahmad, CEO of Numenta. “Customers will be able to use the combination of Numenta and 4th Gen Intel Xeon Scalable processors to deploy real-time apps in a light-weight, cost-effective manner.”
Recommended AI News: Whalar Launches First-Of-Its-Kind Full-Funnel Measurement Solutions For the Creator Economy
“Numenta’s results on Intel’s new hardware make it possible to deploy state-of-the-art Transformers at an unparalleled price/performance point, greatly expanding the design space for conversational interaction and ultimately boosting top-line value,” said Tom Ngo, CEO of Lumin.ai, a leading Conversational AI company whose Sales Accelerator product helps high-touch sales teams in multiple industries meet with more of their prospects and shorten their sales cycles.
Unmatched Throughput for High Volume Document Processing
Numenta’s AI technology also dramatically accelerates NLP applications that rely on analyzing large collections of documents. When applying Transformers to document understanding, long sequence lengths are required to incorporate the full context of the document. These long sequences require high data transfer rates, and off-chip bandwidth thus becomes the limiting factor. Using the new Intel Xeon CPU Max Series, Numenta demonstrates it can optimize the BERT-Large model to process large text documents, enabling unparalleled 20x throughput speed-up for long sequence lengths of 512.
“Numenta and Intel are collaborating to deliver substantial performance gains to Numenta’s AI solutions through the Intel Xeon CPU Max Series and 4th Gen Intel Xeon Scalable processors. We’re excited to work together to unlock significant throughput performance accelerations for previously bandwidth-bound or latency-bound AI applications such as Conversational AI and large document processing,” said Scott Clark, vice president and general manager of AI and HPC Application Level Engineering, Intel.
“This type of innovation is absolutely transformative for our customers, enabling cost-efficient scaling for the first time,” added Ahmad.
NUMENTA LAUNCHES BETA PROGRAM
To provide the benefit of its AI products and solutions to customers as quickly as possible, Numenta recently announced a Private Beta program. Numenta is actively engaging with startups and Global 100 companies to apply its platform technology to a broad spectrum of NLP and Computer Vision applications.
Recommended AI News: Micron Delivers High-Performance Data Center SSD to Power the Most Demanding Workloads
[To share your insights with us, please write to sghosh@martechseries.com]
Comments are closed.