AWS-Decart Partnership Showcases Breakthrough AI Performance

AI video startup Decart is pushing to be the leader in real-time video generation, and it has found an immensely powerful ally in Amazon Web Services.
Thursday morning, at AWS re:Invent, the startup’s cofounder and CEO, Dean Leitersdorf, revealed that Decart is one of the first developers to gain access to Amazon’s new Trainium3 chips, the latest addition to the cloud computing giant’s family of custom-built AI accelerators.
Running its most advanced video generation models in a manner optimized for advanced infrastructure, Decart has benefitted from unprecedented gains, enabling it to generate high resolution, high-fidelity video outputs in milliseconds. Its video outputs set a new standard in terms of the quality of real-time AI video, creating new possibilities for interactive content, livestreaming, gaming and other applications.
“It’s a new category of GenAI foundation models called ‘real-time live visual intelligence,’” said Leitersdorf on stage in Las Vegas. “Because for the first time, we could take foundational models for LLMs and video diffusion models and get them to run at the same time with zero latency.”
A Latency Tipping Point
With just 5 billion parameters, Decart’s flagship model Lucy is lean, precise and domain-focused, making it much faster and cheaper to run than traditional LLMs, yet it’s more than capable of matching the quality and accuracy of their outputs. It has been designed specifically for one purpose – real-time video generation – and its laser focus on that task means it does it extremely well.
The partnership will see AWS make its most advanced AI accelerators available to Decart, including its all-new Trainium3 chip that was unveiled earlier in the week at re:Invent. Trainium3 is the most advanced version of Amazon’s Trainium family, which is a customized chip that’s designed to provide greater efficiency for AI training and inference workloads.
Decart has already optimized its flagship video generation model Lucy to run on the older AWS Trainium2 processor, and it’s now doing the same for Trainium3, said Leitersdorf. His comments came as he joined AWS Senior Vice President of Utility Computing Peter Desantis on stage during a keynote at re:Invent to discuss how highly specialized and integrated infrastructure can turbocharge the performance of smaller AI models with minimal resource overheads.
Real-time AI video generation models are different from standard video models like OpenAI’s Sora and Google’s Veo, because their main focus is on latency rather than quality. Enter a prompt into Sora, and the model might take several minutes to process that request and generate a high-quality video. In contrast, Lucy starts generating content within milliseconds, enabling the video to be livestreamed as it’s being created.
Infrastructure Makes the Difference
Instantaneous video generation can revolutionize applications such as livestreaming and online gaming, and it’s something that the cloud infrastructure giants are taking notice of. They have good reason too, for Gartner estimates that the global AI video market will grow to tens of billions of dollars by the end of the decade.
It’ll have a profound impact on businesses, with benefits such as rapid prototyping of marketing campaigns and personalized engagement, which AI video can do at a fraction of the cost and time of standard methods.
AWS’s Trainium infrastructure is a key enabler for Decart. Designed specifically for intense AI processing demands, it utilizes high-bandwidth interconnects and centralized SRAM to deliver superior floating point-operations-per-second than standard GPUs. This is what allows Decart’s models to process video with extreme low latency while ensuring its outputs match the quality of much more powerful models.
At re:Invent, Leitersdorf spoke about the importance of models and chips optimized to work well together. “The reason we get this performance, it’s a result of how we combine Trainium and our models,” he said.
“The models that we train at Decart, they have three components: an LLM that does reasoning and understands the world; a video model that understands pixels, it understands structure; and an encoder that lets the two connect and run together. So usually we have to run these in sequence one after the other. But we were able to build a Trainium megakernel that we wrote and it got it to run all three at the same time, with zero latency on the same chip, achieving maximum HBM memory utilization, Tensor engine utilization, all at the same time, with no latency.”
In terms of efficiency, Decart claims to have achieved over 30% better performance when running Lucy on Trainium2 compared to Nvidia GPUs while spitting out high-fidelity video at 30 frames per- second. With Trainium3, Decart believes it can reach 100 FPS by the time it has finished optimizing Lucy.
In a statement, AWS Trainium Vice President Ron Diamant said the performance of Decart’s models shows the amazing possibilities that arise when specialized models are combined with custom-designed processors: “We’re excited to see how Decart is enabling entirely new video, media, and simulation experiences for customers on AWS.”
Wider Ecosystem Implications
Decart isn’t the only AI video startup benefiting from Amazon’s optimized AI accelerators. Pika AI is also said to be using AWS chips to power its most advanced Pika-2.5 model, which likewise boasts latency low enough to support real-time video generation.
The continued rise of Trainium highlights the growing opportunity for cloud infrastructure providers to erode Nvidia’s dominance of the AI market by supporting niche applications. While the vast majority of LLMs run on GPUs, which are suited for general purpose workloads, a growing number of developers now prefer more customizable AI accelerators due to the increased efficiency they provide for targeted workloads.
AWS Trainium isn’t the only option. Last month, Google Cloud debuted Ironwood, the most advanced version of its Tensor Processing Units, which are architecturally very similar to Amazon’s chips. Google’s TPUs are advantageous for running video models because of their focus on high-performance AI processing, with Ironwood reportedly delivering a four-times efficiency gain over earlier generations. Moreover, Google says Ironwood can scale up to 9,216 chips per cluster to handle the largest video datasets.
Google Cloud made a number of specific references to AI video processing when it launched Ironwood. Like Decart, it also sees an extremely bright future for AI models that can process video in real time, and it understands just as well as Amazon does that the underlying infrastructure will be critical in making it happen.
Also Read: The End Of Serendipity: What Happens When AI Predicts Every Choice?
[To share your insights with us, please write to psen@itechseries.com]
Comments are closed.