PulseAugur
实时 22:34:54
实体 Fp8

Fp8

PulseAugur coverage of Fp8 — every cluster mentioning Fp8 across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
13
90 天内 13
发布 · 30天
0
90 天内 0
论文 · 30天
9
90 天内 9
层级分布 · 90 天
情绪 · 30 天

3 天有情绪数据

最近 · 第 1/1 页 · 共 13 条
  1. RESEARCH · CL_44358 ·

    Together AI releases FlashAttention-3 and -4 for faster LLM processing

    Together AI has released FlashAttention-3 and FlashAttention-4, significant upgrades to their GPU-accelerated attention mechanism for large language models. FlashAttention-3, designed for Hopper GPUs, achieves up to 75%…

  2. RESEARCH · CL_36662 ·

    NVIDIA unveils 4-bit pretraining method, NVFP4, for LLMs

    NVIDIA has developed a new 4-bit pretraining methodology called NVFP4, designed to overcome the challenges of reduced dynamic range and increased quantization error in narrower floating-point formats. This method was su…

  3. RESEARCH · CL_35775 ·

    llmcompressor tool enables LLM compression via FP8, GPTQ, SmoothQuant

    A new open-source tool named llmcompressor allows developers to compress and benchmark instruction-tuned large language models. The tool demonstrates how to apply post-training quantization techniques such as FP8, GPTQ,…

  4. TOOL · CL_28269 ·

    LoKA framework enables low-precision FP8 for large recommendation models

    Researchers have developed LoKA, a framework designed to make low-precision arithmetic, specifically FP8, practical for large recommendation models (LRMs). Unlike previous attempts that often degraded model quality, LoK…

  5. SIGNIFICANT · CL_23577 ·

    Superhuman and Databricks build 200K QPS AI inference platform

    Superhuman and Databricks engineers collaborated to build a high-throughput inference platform capable of handling over 200,000 queries per second. This joint effort modernized Superhuman's serving stack, migrating from…

  6. TOOL · CL_20689 ·

    LLM Study Diary #3: PyTorch tensors, float types, and training infrastructure

    This LLM study diary entry focuses on PyTorch fundamentals for training large language models. It details tensor basics, exploring various floating-point data types like FP32, BF16, and FP8 for efficiency and stability.…

  7. RESEARCH · CL_08634 ·

    SnapMLA paper details hardware-aware FP8 quantized pipelining for efficient long-context MLA decoding

    Researchers have developed SnapMLA, a new framework designed to enhance the efficiency of long-context decoding in Multi-head Latent Attention (MLA) architectures. This approach utilizes hardware-aware FP8 quantization …

  8. RESEARCH · CL_07014 ·

    TACO framework boosts LLM training throughput by 1.87X with tensor compression

    Researchers have introduced TACO, a novel framework designed to enhance the efficiency of training large-scale tensor-parallel Large Language Models (LLMs). TACO addresses communication overhead by employing an FP8-base…

  9. FRONTIER RELEASE · CL_07710 ·

    NVIDIA launches Nemotron 3 Nano Omni, unifying multimodal AI for efficiency

    NVIDIA has released Nemotron 3 Nano Omni, an open multimodal model capable of processing text, images, audio, and video. This model aims to unify these modalities into a single architecture, improving efficiency and ena…

  10. RESEARCH · CL_03567 ·

    Qwen3.6-35B model quantizations show FP8 quality worse than INT8, NVFP4 is a lie

    A user on Reddit's LocalLLaMA community shared findings on the Qwen3.6-35B model, focusing on Kullback-Leibler (KLD) divergence metrics for different quantization formats like INT8, FP8, and NVFP4. The analysis, conduct…

  11. RESEARCH · CL_03804 ·

    AI safety research proposes formal framework for computational substrates

    This series of posts explores the concept of 'substrates' in AI, which refers to the computational context layers necessary for implementing AI systems. The authors argue that current AI safety research lacks a clear fr…

  12. FRONTIER RELEASE · CL_02784 ·

    DeepSeek V4 models offer high performance with reduced inference costs and NPU support

    DeepSeek has released its V4 family of open-weight large language models, featuring a 1.6 trillion parameter model and a smaller 284 billion parameter Flash MoE model. These new models claim to rival top proprietary LLM…

  13. RESEARCH · CL_05065 ·

    SpikingBrain2.0 model offers efficient long-context and cross-platform AI inference

    Researchers have introduced SpikingBrain2.0 (SpB2.0), a 5 billion parameter model designed for efficient long-context processing and cross-platform inference. The model features a novel Dual-Space Sparse Attention mecha…