PulseAugur
EN
LIVE 21:24:37

Llama2 inference engine runs in under 1500 bytes of x86 assembly

A developer has created sectorllm, a Llama 2 inference engine that runs entirely within 1369 bytes of x86 assembly code. This engine boots directly from a disk's boot sector, loads a quantized model, and generates text before any operating system initializes. It currently supports the stories260K model, trained on children's stories, and is optimized for minimal size, though performance and precision are secondary to code golfing. AI

IMPACT Demonstrates extreme model compression and efficient inference techniques, potentially inspiring new approaches for edge AI.

RANK_REASON This is a novel implementation of an existing model architecture in a highly constrained environment, akin to an academic research project.

Read on Mastodon — mastodon.social →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

Llama2 inference engine runs in under 1500 bytes of x86 assembly

COVERAGE [3]

  1. Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] ·

    sectorllm: llama2 inference in < 1500 bytes of x86 assembly https:// lobste.rs/s/5ond6x # ai # assembly https:// github.com/rdmsr/sectorllm

    sectorllm: llama2 inference in < 1500 bytes of x86 assembly https:// lobste.rs/s/5ond6x # ai # assembly https:// github.com/rdmsr/sectorllm

  2. Lobsters — AI tag TIER_1 English(EN) · github.com by rdmsr ·

    sectorllm: llama2 inference in < 1500 bytes of x86 assembly

    <p><a href="https://lobste.rs/s/5ond6x/sectorllm_llama2_inference_1500_bytes">Comments</a></p>

  3. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    sectorllm: llama2 inference in < 1500 bytes of x86 assembly https://github.com/rdmsr/sectorllm # Assembly # AI # Programming

    sectorllm: llama2 inference in < 1500 bytes of x86 assembly https://github.com/rdmsr/sectorllm # Assembly # AI # Programming