NVIDIA and Google Unleash Game-Changing AI Optimizations for Gemma

By Pooja Choudhary On Mar 4, 2024

Optimizations for Gemma on AI Platforms

Gemma, Google’s new lightweight 2 billion- and 7 billion-parameter large language models, can be run anywhere, reducing costs and speeding up innovative work for domain-specific use cases. NVIDIA and Google launched optimizations across all NVIDIA AI platforms for Gemma.

Read: How to Incorporate Generative AI Into Your Marketing Technology Stack

Using NVIDIA TensorRT-LLM, an open-source library for optimizing large language model inference, coupled with NVIDIA GPUs in the data center, the cloud, and locally on workstations with NVIDIA RTX GPUs or PCs with GeForce RTX GPUs, the companies’ teams worked closely to speed up Gemma’s performance. Gemma is built from the same research and technology as the Gemini models. Because of this, developers may aim their products at the more than 100 million high-performance AI PCs around the world that have NVIDIA RTX GPUs installed.

Read: 10 AI In Energy Management Trends To Look Out For In 2024

Colle AI Develops Advanced Prototyping Frameworks to Boost NFT Creation Speed

Sep 26, 2025

AGII Introduces Realtime AI Intelligence to Accelerate Web3 Execution

Sep 26, 2025

GPT Proto Makes Enhanced Gemini 2.5 Flash Available Following Google’s Major AI Update

Sep 26, 2025

Prev Next 1 of 41,724

NVIDIA and Google Supercharge AI Platforms for Gemma

On top of that, developers can use Gemma on NVIDIA GPUs in the cloud, such as the A3 instances on Google Cloud that are built on the H100 Tensor Core GPU and the soon-to-be-deployed H200 Tensor Core GPUs from NVIDIA, which come with 141GB of HBM3e memory and 4.8 terabytes per second. To further enhance Gemma and implement the optimized model in their production applications, enterprise developers can leverage NVIDIA’s extensive ecosystem of technologies, which includes NVIDIA AI Enterprise with the NeMo framework and TensorRT-LLM.

Read: Intel’s Automotive Innovation At CES 2024

Find out how TensorRT-LLM is boosting Gemma’s inference and other details for developers. All of the model versions, including the FP8-quantized one and the many Gemma checkpoints, were optimized with TensorRT-LLM.

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]

NVIDIA and Google Unleash Game-Changing AI Optimizations for Gemma

Optimizations for Gemma on AI Platforms

NVIDIA and Google Supercharge AI Platforms for Gemma

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy

NVIDIA and Google Unleash Game-Changing AI Optimizations for Gemma

Optimizations for Gemma on AI Platforms

NVIDIA and Google Supercharge AI Platforms for Gemma

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

﻿Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought. Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy

Please fill your details and we’ll get in touch with you!

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy