The llama.cpp and ik_llama.cpp projects have both integrated support for FP4 (4-bit floating-point) inference, a significant advancement for model quantization. llama.cpp now includes NVFP4, an Nvidia-specific format, while ik_llama.cpp supports MXFP4, adhering to the MX consortium standard. These developments are expected to substantially reduce VRAM requirements, enabling larger models to run on consumer hardware once model support catches up. AI
影响 Enables running larger language models on consumer hardware by significantly reducing VRAM requirements.
排序理由 Integration of new quantization formats (FP4) into popular open-source inference engines.
- Abiray-Qwen3.6-27B-NVFP4-GGUF
- AVX2
- CUDA
- GGML_TYPE_MXFP4
- GGML_TYPE_NVFP4
- Hugging Face
- ik_llama.cpp
- llama.cpp
- MX consortium
- MXFP4
- NEON
- NVFP4
- Nvidia
- Qwen3-1.7B-NVFP4A16
- Zen4
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →