Mamba-Transformer
PulseAugur coverage of Mamba-Transformer — every cluster mentioning Mamba-Transformer across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
NVIDIA 推出 LLM 的 4 位预训练方法 NVFP4
NVIDIA 开发了一种新的 4 位预训练方法 NVFP4,旨在克服窄浮点格式中动态范围减小和量化误差增加的挑战。该方法通过在 10 万亿词元上预训练一个 120 亿参数的混合 Mamba-Transformer 模型得到了成功验证,标志着迄今为止公开记录的最长 4 位精度训练运行。在 MMLU-Pro 基准测试中,所得模型在性能上几乎与 FP8 基线相同,证明了 NVFP4 在大规模模型训练中的可行性。
-
New nGPT architecture enables native 4-bit training for LLMs
Researchers have developed a new neural network architecture called nGPT that natively supports 4-bit precision training for large language models. This architecture constrains weights and hidden representations to a un…
-
Why Nvidia builds open models with Bryan Catanzaro
Nvidia is significantly expanding its open model program, releasing higher quality models and datasets. This strategy benefits Nvidia by capturing value from open language models, creating a sustainable advantage. The c…