Fix: Rolling KV cache and top-k logit trimming (fixes #675) by medmomoait · Pull Request #698 · google-deepmind/gemma

medmomoait · 2026-06-14T18:10:26Z

Fixes #675

Changes

1. ChatSampler — Rolling KV Cache (ring buffer)

Adds rolling_cache and rolling_cache_preserve_tokens options to
ChatSampler. When enabled, old tokens are evicted from the KV cache
in a ring-buffer fashion before each turn, preventing context-exhaustion
OOM in long multi-turn conversations. A configurable prefix (e.g. system
prompt) can be protected from eviction.

2. SamplerLoop — Top-k logit trimming

Adds a top_k_logits option to SamplerLoop. When set, logits are
masked to top-k immediately after the forward pass, reducing the
transient VRAM footprint during sampling without changing the sampling
distribution.

Both changes default to the previous behavior (disabled), so this is
backward compatible.

…#675)

google-cla · 2026-06-14T18:10:35Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

fix: rolling KV cache and top-k logit trimming (fixes google-deepmind…

fec4b63

…#675)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Rolling KV cache and top-k logit trimming (fixes #675)#698

Fix: Rolling KV cache and top-k logit trimming (fixes #675)#698
medmomoait wants to merge 1 commit into
google-deepmind:mainfrom
medmomoait:fix/rolling-kv-cache-vram-675

medmomoait commented Jun 14, 2026

Uh oh!

google-cla Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

medmomoait commented Jun 14, 2026

Changes

1. ChatSampler — Rolling KV Cache (ring buffer)

2. SamplerLoop — Top-k logit trimming

Uh oh!

google-cla Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant