AI Accelerator Groq Adapts and Runs LLaMA, the Meta Chatbot Model and Competitor to ChatGPT, for Its Systems
Groq, a leading artificial intelligence (AI) and machine learning (ML) systems innovator, last week announced it adapted a new large language model (LLM), LLaMA–chatbot technology from Meta and a proposed alternative to ChatGPT–to run on its systems.
Facebook parent, Meta, released LLaMA, which can be used by chatbots to generate human-like text. Three days later the Groq team downloaded the model and within a few days had it running on a production GroqNode server, including eight GroqChip inference processors. This is a rapid time-to-functionality; a development task that can often take a larger team of engineers weeks to months to complete, while Groq executed with just a small group from its compiler team.
Jonathan Ross, CEO and founder of Groq said, “This speed of development at Groq validates that our generalizable compiler and software-defined hardware approach is keeping up with the accelerating pace of LLM innovation–something traditional kernel-based approaches struggle with.”
AiThority: How Generative AI is Transforming Audio Content
The rapid LLaMA bring-up by Groq is a particularly unique and noteworthy milestone because Meta researchers originally developed LLaMA for NVIDIA chips. With Groq engineers successfully running a cutting-edge model on its technology, they demonstrated GroqChip as a ready-to-use alternative to incumbent technology. Generative AI is carving out a place for itself in the market, and as transformers continue to advance the pace of LLM development, customers will need solutions that provide tangible time-to-production advantages, reducing developer complexity for fast iteration.
Bill Xing, Tech Lead Manager, ML Compiler at Groq said, “The complexity of computing platforms is permeating into user code and slowing down innovation. Groq is reversing this trend. Since we’re working on models that were trained on Nvidia GPUs, the first step of porting customer workloads to Groq is removing non-portable, vendor-specific code targeted for specific vendors and architectures. This might include replacing vendor-specific code calling kernels, removing manual parallelism or memory semantics, etc. The resulting code ends up looking a lot simpler and more elegant. Imagine not having to do all that ‘performance engineering’ in the first place to achieve stellar performance! This also helps by not locking a business down to a specific vendor.”
Read: How AI NFTs Are Unlocking the Democratization of the Digital Economy
[To share your insights with us, please write to email@example.com]
Comments are closed.