Brief · PulseAugur

TOOL · HN — machine learning stories English(EN) · 14mo

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Researchers have developed SeedLM, a novel post-training compression technique for large language models that utilizes pseudo-random generator seeds to encode model weights. This method aims to reduce the high runtime costs associated with LLMs by generating weight matrices on-the-fly during inference, thereby decreasing memory access and improving speed for memory-bound tasks. SeedLM achieves this by trading compute for fewer memory accesses and notably does not require calibration data, generalizing well across diverse tasks and maintaining accuracy comparable to FP16 baselines even at significant compression levels. AI

IMPACT This compression technique could significantly reduce the deployment costs and increase the inference speed of large language models.

Meta
LLMs
Llama 2
FP16
SeedLM
Llama3 70B
IEEE Visualization