PulseAugur
实时 04:56:15

llama.cpp and ik_llama.cpp add FP4 inference support for VRAM savings

The llama.cpp and ik_llama.cpp projects have both integrated support for FP4 (4-bit floating-point) inference, a significant advancement for model quantization. llama.cpp now includes NVFP4, an Nvidia-specific format, while ik_llama.cpp supports MXFP4, adhering to the MX consortium standard. These developments are expected to substantially reduce VRAM requirements, enabling larger models to run on consumer hardware once model support catches up. AI

影响 Enables running larger language models on consumer hardware by significantly reducing VRAM requirements.

排序理由 Integration of new quantization formats (FP4) into popular open-source inference engines.

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

llama.cpp and ik_llama.cpp add FP4 inference support for VRAM savings

报道来源 [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/Usual-Carrot6352 ·

    FP4 inference in llama.cpp (NVFP4) and ik_llama.cpp (MXFP4) landed - Finally

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1svfjyv/fp4_inference_in_llamacpp_nvfp4_and_ik_llamacpp/"> <img alt="FP4 inference in llama.cpp (NVFP4) and ik_llama.cpp (MXFP4) landed - Finally" src="https://preview.redd.it/sslj9ea0tcxg1.png?width=140&amp;h…