PulseAugur
EN
LIVE 12:52:33

Krasis LLM runtime rewritten in Rust, boosts speed

The Krasis LLM runtime has been updated to version 1.0, featuring a complete rewrite in Rust for improved performance and efficiency. This update removes Python from the critical execution path, leading to faster prefill and decode speeds. Krasis now supports Ampere (RTX 3000 series) GPUs and has optimized memory requirements, needing only 1x the quantized model size plus overhead in system RAM. AI

IMPACT Improved efficiency for running large LLMs locally, potentially lowering hardware barriers for advanced model usage.

RANK_REASON Software update for an LLM runtime, not a new model release or core research.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/mrstoatey ·

    Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM)

    <!-- SC_OFF --><div class="md"><h1>Context</h1> <p>Krasis is an LLM runtime for running models that don't fit into VRAM. Krasis streams the model through VRAM from system RAM efficiently and handles prefill and decode as separate architectures and optimised usecases.</p> <h1>Late…