Gemma 4-E2B runs in-browser at 255 tok/s with WebGPU kernels

By PulseAugur Editorial · [1 sources] · 2026-06-17 17:06

A demo and WebGPU kernels for Gemma 4-E2B have been released, enabling in-browser operation at approximately 255 tokens per second. The optimization was reportedly aided by Fable 5 before its shutdown. The release includes a demo and kernels available on Hugging Face, with the model itself also linked. AI

IMPACT Enables faster, in-browser execution of Gemma 4-E2B, potentially improving accessibility for local LLM users.

RANK_REASON Release of optimized kernels and a demo for an existing model, not a new model release.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Gemma 4-E2B runs in-browser at 255 tok/s with WebGPU kernels

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/xenovatech · 2026-06-17 17:06

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1u8g3d0/gemma_4_e2b_running_inbrowser_at_255_toks_using/"> <img alt="Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5" src="https://external-preview.redd.it/b3E2bGx0cXJpdjdoM…

COVERAGE [1]

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

RELATED ENTITIES

RELATED TOPICS