Together AI's OSCAR slashes KV cache memory by 8x

By PulseAugur Editorial · [1 sources] · 2026-05-27 05:43

Together AI has released OSCAR, an open-source 2-bit KV cache method that significantly reduces memory usage. Unlike previous 2-bit methods that failed at longer contexts, OSCAR maintains performance up to 128K tokens. This innovation was demonstrated using the Qwen3-8B model, showing an 8x reduction in KV cache memory. AI

IMPACT Reduces memory requirements for large language models, potentially enabling longer context windows and more efficient deployment.

RANK_REASON The cluster describes a new open-source technical method for improving AI model efficiency, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Towards AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

Towards AI TIER_1 English(EN) · Chew Loong Nian - AI ENGINEER · 2026-05-27 05:43

Together AI's OSCAR Killed KV Cache Memory 8x — The First 2-Bit That Doesn't Collapse at 128K

<div class="medium-feed-item"><p class="medium-feed-snippet">Every 2-bit KV cache method I tried in 2025 collapsed past 32K context. Together AI’s OSCAR, open-sourced on May 25, 2026, kept Qwen3–8B…</p><p class="medium-feed-link"><a href="https://pub.towardsa…

COVERAGE [1]

Together AI's OSCAR Killed KV Cache Memory 8x — The First 2-Bit That Doesn't Collapse at 128K

RELATED ENTITIES

RELATED TOPICS