Brief · PulseAugur

TOOL · Mastodon — fosstodon.org English(EN) · 4d

New week, more slides: Run LLMs Locally Now including wllama to run GGUF models inside your browser! wllama uses llama.cpp, WebAssembly and WebGPU, bringing a c

A new tool called wllama enables users to run GGUF large language models directly within their web browser. Leveraging WebAssembly and WebGPU, wllama bypasses typical browser limitations like the 4GB memory constraint and offers faster performance than existing JavaScript-based solutions. The project also incorporates translation capabilities using Tencent's HY-MT model. AI

IMPACT Enables broader accessibility of LLMs by allowing them to run directly in web browsers without significant memory limitations.

Tencent
llama.cpp
GGUF
Transformers.js
WebGPU
WebAssembly
HY-MT
wllama