Cerebras Unveils Andromeda, 13.5 Million Core AI Supercomputer that Delivers Near-Perfect Linear Scaling for Large Language Models
Delivering more than 1 Exaflop of AI compute and 120 Petaflops of dense compute, Andromeda is one of the largest AI supercomputers ever built, and is dead simple to use
Cerebras Systems, the pioneer in accelerating artificial intelligence (AI) compute, unveiled Andromeda, a 13.5 million core AI supercomputer, now available and being used for commercial and academic work. Built with a cluster of 16 Cerebras CS-2 systems and leveraging Cerebras MemoryX and SwarmX technologies, Andromeda delivers more than 1 Exaflop of AI compute and 120 Petaflops of dense compute at 16-bit half precision. It is the only AI supercomputer to ever demonstrate near-perfect linear scaling on large language model workloads relying on simple data parallelism alone.
AI News: Infobip Creates AI-powered Chatbot for Uber
“It is extraordinary that Cerebras provided graduate students with free access to a cluster this big. Andromeda delivers 13.5 million AI cores and near perfect linear scaling across the largest language models, without the pain of distributed compute and parallel programing. This is every ML graduate student’s dream”
With more than 13.5 million AI-optimized compute cores and fed by 18,176 3rd Gen AMD EPYC™ processors, Andromeda features more cores than 1,953 Nvidia A100 GPUs and 1.6 times as many cores as the largest supercomputer in the world, Frontier, which has 8.7 million cores. Unlike any known GPU-based cluster, Andromeda delivers near-perfect scaling via simple data parallelism across GPT-class large language models, including GPT-3, GPT-J and GPT-NeoX.
Near-perfect scaling means that as additional CS-2s are used, training time is reduced in near perfect proportion. This includes large language models with very large sequence lengths, a task that is impossible to achieve on GPUs. In fact, GPU impossible work was demonstrated by one of Andromeda’s first users, who achieved near perfect scaling on GPT-J at 2.5 billion and 25 billion parameters with long sequence lengths — MSL of 10,240. The users attempted to do the same work on Polaris, a 2,000 Nvidia A100 cluster, and the GPUs were unable to do the work because of GPU memory and memory bandwidth limitations.
Access to Andromeda is available now, and customers and academic researchers are already running real workloads and deriving value from the leading AI supercomputer’s extraordinary capabilities, including:
- Argonne National Laboratory: “In collaboration with Cerebras researchers, our team at Argonne has completed pioneering work on gene transformers – work that is a finalist for ACM Gordon Bell Special Prize for HPC-Based COVID-19 Research. Using GPT3-XL, we put the entire COVID-19 genome into the sequence window, and Andromeda ran our unique genetic workload with long sequence lengths (MSL of 10K) across 1, 2, 4, 8 and 16 nodes, with near-perfect linear scaling. Linear scaling is amongst the most sought-after characteristics of a big cluster, and Cerebras Andromeda’s delivered 15.87X throughput across 16 CS-2 systems, compared to a single CS-2, and a reduction in training time to match. Andromeda sets a new bar for AI accelerator performance,” said Rick Stevens, Associate Lab Director, at Argonne National Laboratory.
- JasperAI: “Jasper uses large language models to write copy for marketing, ads, books, and more. We have over 85,000 customers who use our models to generate moving content and ideas. Given our large and growing customer base, we’re exploring testing and scaling models fit to each customer and their use cases. Creating complex new AI systems and bringing it to customers at increasing levels of granularity demands a lot from our infrastructure. We are thrilled to partner with Cerebras and leverage Andromeda’s performance and near perfect scaling without traditional distributed computing and parallel programming pains to design and optimize our next set of models,” said Dave Rogenmoser, CEO of JasperAI.
- AMD: “AMD is investing in technology that will pave the way for pervasive AI, unlocking new efficiency and agility abilities for businesses. The combination of the Cerebras Andromeda AI supercomputer and a data pre-processing pipeline powered by AMD EPYC-powered servers, together will put more capacity in the hands of researchers and support faster and deeper AI capabilities,” said Kumaran Siva, corporate vice president, Software & Systems Business Development, AMD.
- University of Cambridge: “It is extraordinary that Cerebras provided graduate students with free access to a cluster this big. Andromeda delivers 13.5 million AI cores and near perfect linear scaling across the largest language models, without the pain of distributed compute and parallel programing. This is every ML graduate student’s dream,” said Mateo Espinosa, doctoral candidate at the University of Cambridge in the United Kingdom.
Latest Aithority Insights : NVIDIA Raises the Standard of Low Code DevOps with the NVIDIA AI Enterprise 2.1
Andromeda’s near-perfect scaling across the largest natural language processing models is made possible by the second-generation Cerebras Wafer Scale Engine (WSE-2), the industry’s largest and most powerful processor, and by Cerebras’ MemoryX and Swarm X technologies. MemoryX enables even a single CS-2 to support multi-trillion parameter models. SwarmX technology links MemoryX to a cluster of CS-2s. Together these industry-leading technologies enable Cerebras’ large clusters to avoid two of the major challenges plaguing traditional clusters used for modern AI work: the complexity of parallel programming and the performance degradation of distributed computing.
The 16 CS-2s powering Andromeda run in a strictly data parallel mode, enabling simple and easy model distribution, and single-keystroke scaling from 1 to 16 CS-2s. In fact, sending AI jobs to Andromeda can be done quickly and painlessly from a Jupyter notebook, and users can switch from one model to another with a few keystrokes. Andromeda’s 16 CS-2s were assembled in only 3 days, without any changes to the code, and immediately thereafter workloads scaled linearly across all 16 systems. And because the Cerebras WSE-2 processor, at the heart of its CS-2s, has 1,000 times more memory bandwidth than a GPU, Andromeda can harvest structured and unstructured sparsity as well as static and dynamic sparsity. These are things other hardware accelerators, including GPUs, simply can’t do. The result is that Cerebras can train models in excess of 90% sparse to state-of-the-art accuracy.
Andromeda can be used simultaneously by multiple users. Users can easily specify how many of Andromeda’s CS-2s they want to use within seconds. This means Andromeda can be used as a 16 CS-2 supercomputer cluster for a single user working on a single job, or 16 individual CS-2 systems for sixteen distinct users with sixteen distinct jobs, or any combination in between.
Andromeda is deployed in Santa Clara, California, in 16 racks at Colovore, a leading high performance data center. The 16 CS-2 systems, with a combined 13.5 million AI optimized cores are fed by 284 64-core AMD 3rd Gen EPYC processors. The SwarmX fabric, which links the MemoryX parameter storage solution to the 16 CS-2s, provides more than 96.8 terabits of bandwidth. Through gradient accumulation Andromeda can support all batch sizes.
Latest NaturalAI Insights: InspireXT Announces Acquisition Of NaturalAI – A Conversational Artificial Intelligence Platform To Expand Its Solution Portfolio
[To share your insights with us, please write to sghosh@martechseries.com]
Comments are closed.