Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats

Researchers have developed dMX, a novel differentiable framework for optimizing the bit-width of floating-point formats in large language models. This method allows for learnable, per-layer bit-width assignments, moving beyond uniform quantization to improve both accuracy and performance. Experiments on models like Llama and Qwen3 demonstrate that dMX can achieve better trade-offs between model quality and deployment efficiency compared to existing heuristics. AI

IMPACT Enables more efficient deployment of large language models by optimizing their precision.

Qwen3
Llama
SmolLM2
Open Compute Project
dMX