NVIDIA and Google Unleash Game-Changing AI Optimizations for Gemma

By Pooja Choudhary On Mar 4, 2024

Optimizations for Gemma on AI Platforms

Gemma, Google’s new lightweight 2 billion- and 7 billion-parameter large language models, can be run anywhere, reducing costs and speeding up innovative work for domain-specific use cases. NVIDIA and Google launched optimizations across all NVIDIA AI platforms for Gemma.

Read: How to Incorporate Generative AI Into Your Marketing Technology Stack

Using NVIDIA TensorRT-LLM, an open-source library for optimizing large language model inference, coupled with NVIDIA GPUs in the data center, the cloud, and locally on workstations with NVIDIA RTX GPUs or PCs with GeForce RTX GPUs, the companies’ teams worked closely to speed up Gemma’s performance. Gemma is built from the same research and technology as the Gemini models. Because of this, developers may aim their products at the more than 100 million high-performance AI PCs around the world that have NVIDIA RTX GPUs installed.

Read: 10 AI In Energy Management Trends To Look Out For In 2024

Nextworld Platform Release Empowers Customers with Ambient AI Agents and Visual Workflow Builder

Nov 21, 2025

Iveda Advances Egyptian Localization Strategy, Signing Major New Manufacturing Partnership for Local Production of the Latest in 360 Camera Technology

Nov 21, 2025

Senvix Unveiled: How This Senvix AI Platform Is Empowering Global Traders with Data-Driven Confidence

Nov 21, 2025

Prev Next 1 of 41,907

NVIDIA and Google Supercharge AI Platforms for Gemma

On top of that, developers can use Gemma on NVIDIA GPUs in the cloud, such as the A3 instances on Google Cloud that are built on the H100 Tensor Core GPU and the soon-to-be-deployed H200 Tensor Core GPUs from NVIDIA, which come with 141GB of HBM3e memory and 4.8 terabytes per second. To further enhance Gemma and implement the optimized model in their production applications, enterprise developers can leverage NVIDIA’s extensive ecosystem of technologies, which includes NVIDIA AI Enterprise with the NeMo framework and TensorRT-LLM.

Read: Intel’s Automotive Innovation At CES 2024

Find out how TensorRT-LLM is boosting Gemma’s inference and other details for developers. All of the model versions, including the FP8-quantized one and the many Gemma checkpoints, were optimized with TensorRT-LLM.

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]

NVIDIA and Google Unleash Game-Changing AI Optimizations for Gemma

Optimizations for Gemma on AI Platforms

NVIDIA and Google Supercharge AI Platforms for Gemma

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy

NVIDIA and Google Unleash Game-Changing AI Optimizations for Gemma

Optimizations for Gemma on AI Platforms

NVIDIA and Google Supercharge AI Platforms for Gemma

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

﻿Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought. Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy

Please fill your details and we’ll get in touch with you!

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy