PulseAugur / Brief
EN
LIVE 07:28:21

Brief

last 24h
[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon

    NVIDIA has developed a new 4-bit pretraining methodology called NVFP4, designed to overcome the challenges of reduced dynamic range and increased quantization error in narrower floating-point formats. This method was successfully validated by pretraining a 12-billion-parameter hybrid Mamba-Transformer model on 10 trillion tokens, marking the longest publicly documented training run in 4-bit precision to date. The resulting model achieved performance nearly identical to an FP8 baseline on the MMLU-Pro benchmark, demonstrating the viability of NVFP4 for large-scale model training. AI

    NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon

    IMPACT Enables more efficient training of large language models by reducing precision requirements without significant performance loss.

  2. LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

    Researchers have developed LongLive-2.0, a parallel infrastructure designed to optimize the training and inference of long video generation models. This system utilizes NVFP4 precision and sequence-parallel autoregressive training to reduce memory requirements and accelerate computations. For inference, LongLive-2.0 employs techniques like W4A4 NVFP4 inference and asynchronous streaming VAE decoding to achieve high throughput, demonstrating up to a 2.15x speedup in training and 1.84x in inference. AI

    LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

    IMPACT Enables more efficient training and faster inference for long video generation models, potentially leading to wider adoption and new applications.

  3. Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

    Researchers have introduced Mix-Quant, a novel quantization framework designed to accelerate the inference process for Large Language Model (LLM) agents. This method strategically applies quantization to the prefilling stage, which is computationally intensive in agentic workflows, while maintaining higher precision for the decoding phase. By decoupling these stages and utilizing NVFP4 quantization for prefilling and BF16 for decoding, Mix-Quant aims to reduce accuracy loss and improve efficiency. AI

    Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

    IMPACT This phase-aware quantization technique could significantly reduce inference costs and latency for complex LLM agentic workflows.