PulseAugur
实时 03:07:25
实体 NVFP4

NVFP4

PulseAugur coverage of NVFP4 — every cluster mentioning NVFP4 across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
10
90 天内 10
发布 · 30天
0
90 天内 0
论文 · 30天
8
90 天内 8
层级分布 · 90 天
情绪 · 30 天

4 天有情绪数据

最近 · 第 1/1 页 · 共 10 条
  1. TOOL · CL_41845 ·

    Mix-Quant framework speeds up LLM agents with phase-aware quantization

    Researchers have introduced Mix-Quant, a novel quantization framework designed to accelerate the inference process for Large Language Model (LLM) agents. This method strategically applies quantization to the prefilling …

  2. TOOL · CL_38810 ·

    LongLive-2.0 infrastructure accelerates long video generation training and inference

    Researchers have developed LongLive-2.0, a parallel infrastructure designed to optimize the training and inference of long video generation models. This system utilizes NVFP4 precision and sequence-parallel autoregressi…

  3. RESEARCH · CL_36696 ·

    NVIDIA NVFP4 slashes AI training costs by 75% with 4-bit pretraining

    NVIDIA has introduced a new 4-bit pretraining method called NVFP4, designed to significantly reduce the costs and energy consumption associated with training large AI models. This technique, validated on a 12 billion pa…

  4. RESEARCH · CL_36662 ·

    NVIDIA 推出 LLM 的 4 位预训练方法 NVFP4

    NVIDIA 开发了一种新的 4 位预训练方法 NVFP4,旨在克服窄浮点格式中动态范围减小和量化误差增加的挑战。该方法通过在 10 万亿词元上预训练一个 120 亿参数的混合 Mamba-Transformer 模型得到了成功验证,标志着迄今为止公开记录的最长 4 位精度训练运行。在 MMLU-Pro 基准测试中,所得模型在性能上几乎与 FP8 基线相同,证明了 NVFP4 在大规模模型训练中的可行性。

  5. RESEARCH · CL_36932 ·

    新的ScaleSearch方法通过优化量化提高了生成模型的效率

    研究人员开发了一种名为ScaleSearch的新方法,通过量化来提高生成模型的效率。该技术优化了块浮点(BFP)格式中尺度因子的选择,将量化误差降低了高达27%。提出的ScaleSearchAttention算法与BFP集成,在因果语言建模中表现出接近零的性能损失,并在Qwen3-8B和Llama 3.1 70B等模型的准确性方面显示出显著的改进。

  6. TOOL · CL_29454 ·

    SOAR框架通过新颖的NVFP4量化提升LLM准确性

    研究人员推出了一种新的训练后量化框架SOAR,旨在提高NVFP4量化在大型语言模型上的准确性。SOAR采用闭式联合尺度优化(CJSO)通过最小化重建误差来联合优化全局和块级尺度。它还利用解耦尺度搜索(DSS)来分离量化和反量化尺度,从而提高精度。实验表明,SOAR在不增加内存占用或需要新硬件的情况下,实现了优于现有NVFP4方法的准确性。

  7. TOOL · CL_25619 ·

    New number formats boost AI direction preservation

    Researchers have developed a new geometric framework to analyze how well low-precision number formats in machine learning preserve vector direction. The study analytically quantifies the suboptimality of standard format…

  8. TOOL · CL_22142 ·

    New 4/6 quantization method boosts LLM accuracy with adaptive scaling

    Researchers have developed a new quantization method called Four Over Six (4/6) to improve the accuracy of low-precision numerical formats like NVFP4 for large language models. This technique adaptively scales blocks to…

  9. RESEARCH · CL_03567 ·

    Qwen3.6-35B 模型量化显示 FP8 质量不如 INT8,NVFP4 是谎言

    Reddit 的 LocalLLaMA 社区的一位用户分享了关于 Qwen3.6-35B 模型的研究结果,重点关注了 Kullback-Leibler (KLD) 散度指标在 INT8、FP8 和 NVFP4 等不同量化格式下的表现。使用修改后的 VLLM 框架进行的分析表明,FP8 和 NVFP4 格式虽然可能速度更快,但质量可能不如 INT8。用户强调,量化格式的选择应与具体用例相匹配,平衡准确性、速度和 GPU 兼容性。

  10. RESEARCH · CL_03577 ·

    llama.cpp and ik_llama.cpp add FP4 inference support for VRAM savings

    The llama.cpp and ik_llama.cpp projects have both integrated support for FP4 (4-bit floating-point) inference, a significant advancement for model quantization. llama.cpp now includes NVFP4, an Nvidia-specific format, w…