A 230 million parameter model, LFM2.5, is now capable of running in a web browser at a speed of 1,400 tokens per second. This performance is achieved through custom WebGPU kernels, developed by individuals previously associated with Fable 5 and Opus 4.8. The model is available on Hugging Face, with a demo space also provided for users to experience its in-browser capabilities. AI
IMPACT Enables efficient, in-browser execution of smaller language models, potentially improving accessibility and reducing reliance on server-side processing.
RANK_REASON The cluster describes a specific model running on a specific platform with custom kernels, which is a technical implementation detail rather than a new model release or significant industry event.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →