PEAK:AIO Solves Long-Running AI Memory Bottleneck for LLM Inference and Model Innovation with Unified Token Memory Feature

By GlobeNewswire On May 19, 2025

PEAK:AIO, the data infrastructure pioneer redefining AI-first data acceleration, today unveiled the first dedicated solution to unify KVCache acceleration and GPU memory expansion for large-scale AI workloads, including inference, agentic systems, and model creation.

As AI workloads evolve beyond static prompts into dynamic context streams, model creation pipelines, and long-running agents, infrastructure must evolve, too.

Also Read: The Impact of Increased AI Investment on Organizational AI Strategies

“Whether you are deploying agents that think across sessions or scaling toward million-token context windows, where memory demands can exceed 500GB per model, this appliance makes it possible by treating token history as memory, not storage,” said Eyal Lemberger, Chief AI Strategist and Co-Founder of PEAK:AIO “It is time for memory to scale like compute has.”

As transformer models grow in size and context, AI pipelines face two critical limitations: KVCache inefficiency and GPU memory saturation. Until now, vendors have retrofitted legacy storage stacks or overextended NVMe to delay the inevitable. PEAK:AIO’s new 1U Token Memory Feature changes that by building for memory, not files.

The First Token-Centric Architecture Built for Scalable AI

Why Gemini Signals a New Chapter in Personal Assistants?

Aug 29, 2025

BaaDigi Launches Revolutionary BaaDigi 360: All-in-One Digital Marketing Platform with AI-Powered Lead Management

Aug 29, 2025

AGII Expands Predictive Control Frameworks to Improve Web3 Execution Scalability

Aug 29, 2025

Prev Next 1 of 42,265

Powered by CXL memory and integrated with Gen5 NVMe and GPUDirect RDMA, PEAK:AIO’s feature delivers up to 150 GB/sec sustained throughput with sub-5 microsecond latency. It enables:

KVCache reuse across sessions, models, and nodes
Context-window expansion for longer LLM history
GPU memory offload via true CXL tiering
Ultra-low latency access using RDMA over NVMe-oF

This is the first feature that treats token memory as infrastructure rather than storage, allowing teams to cache token history, attention maps, and streaming data at memory-class latency.

Unlike passive NVMe-based storage, PEAK:AIO’s architecture aligns directly with NVIDIA’s KVCache reuse and memory reclaim models. This provides plug-in support for teams building on TensorRT-LLM or Triton, accelerating inference with minimal integration effort. By harnessing true CXL memory-class performance, it delivers what others cannot: token memory that behaves like RAM, not files.

Also Read: The Evolution of Data Engineering: Making Data AI-Ready

“While others are bending file systems to act like memory, we built infrastructure that behaves like memory, because that is what modern AI needs,” continued Lemberger. “At scale, it is not about saving files; it is about keeping every token accessible in microseconds. That is a memory problem, and we solved it at embracing the latest silicon layer.”

The fully software-defined solution utilizes off-the-shelf servers is expected to enter production by Q3. To discuss early access, technical consultation, or how PEAK:AIO can support AI infrastructure needs,

[To share your insights with us, please write to psen@itechseries.com]

PEAK:AIO Solves Long-Running AI Memory Bottleneck for LLM Inference and Model Innovation with Unified Token Memory Feature

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy

PEAK:AIO Solves Long-Running AI Memory Bottleneck for LLM Inference and Model Innovation with Unified Token Memory Feature

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

﻿Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought. Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy

Please fill your details and we’ll get in touch with you!

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2025 AiThority. All Rights Reserved. Privacy Policy