Apple's SeedLM compresses LLM weights using pseudo-random generators

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed SeedLM, a novel post-training compression technique for large language models that utilizes pseudo-random generator seeds to encode model weights. This method aims to reduce the high runtime costs associated with LLMs by generating weight matrices on-the-fly during inference, thereby decreasing memory access and improving speed for memory-bound tasks. SeedLM achieves this by trading compute for fewer memory accesses and notably does not require calibration data, generalizing well across diverse tasks and maintaining accuracy comparable to FP16 baselines even at significant compression levels. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This compression technique could significantly reduce the deployment costs and increase the inference speed of large language models.

RANK_REASON This is a research paper detailing a novel method for compressing large language models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on HN — machine learning stories →

paper
infra

COVERAGE [1]

HN — machine learning stories TIER_1 · pizza · 2025-04-06 08:53

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

COVERAGE [1]

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

RELATED ENTITIES

RELATED TOPICS