Together AI releases GLM 5.1 with kernel optimizations

By PulseAugur Editorial · [2 sources] · 2026-06-15 23:59

Together AI has released GLM 5.1, an open-source inference model. The optimization of GLM 5.1 focused on rewriting and fusing the indexer topk kernel to reduce memory and launch overhead. Additionally, CPU overhead was eliminated to improve prefill throughput, with significant gains attributed to the indexer improvements. AI

IMPACT Together AI's release of GLM 5.1 offers an open-source option for inference, potentially lowering costs and increasing accessibility for developers.

RANK_REASON Frontier-lab model release with system card. [lever_c_demoted from frontier_release: ic=2 ai=1.0]

Read on X — Together (inference / OSS) →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Together AI releases GLM 5.1 with kernel optimizations

COVERAGE [2]

X — Together (inference / OSS) TIER_1 English(EN) · togethercompute · 2026-06-15 23:59

Try GLM 5.1 today: https://t.co/jsWoVlNEQc

Try GLM 5.1 today: https://t.co/jsWoVlNEQc
X — Together (inference / OSS) TIER_1 English(EN) · togethercompute · 2026-06-15 23:59

Optimizing GLM 5.1 came down to three things:

Optimizing GLM 5.1 came down to three things: -> Rewrote the indexer topk kernel -> Fused the indexer kernel to reduce memory and launch overhead -> Eliminated CPU overhead that was gating prefill throughput The bigger win was in the indexer. Once we fixed that, the …

COVERAGE [2]

Try GLM 5.1 today: https://t.co/jsWoVlNEQc

Optimizing GLM 5.1 came down to three things:

RELATED ENTITIES

RELATED TOPICS