PulseAugur
EN
LIVE 00:36:18

230M LFM2.5 model runs in-browser at 1,400 tokens/sec

A 230 million parameter model, LFM2.5, is now capable of running in a web browser at a speed of 1,400 tokens per second. This performance is achieved through custom WebGPU kernels, developed by individuals previously associated with Fable 5 and Opus 4.8. The model is available on Hugging Face, with a demo space also provided for users to experience its in-browser capabilities. AI

IMPACT Enables efficient, in-browser execution of smaller language models, potentially improving accessibility and reducing reliance on server-side processing.

RANK_REASON The cluster describes a specific model running on a specific platform with custom kernels, which is a technical implementation detail rather than a new model release or significant industry event.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

230M LFM2.5 model runs in-browser at 1,400 tokens/sec

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/xenovatech ·

    LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1ufii9b/lfm25_230m_running_inbrowser_at_1400_toks_using/"> <img alt="LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels" src="https://external-preview.redd.it/ZzBzdGIwM3R5ZzloMbNWdyfcno-…