PulseAugur
LIVE 07:40:40
research · [1 source] ·
0
research

Qwen3.6-35B model quantizations show FP8 quality worse than INT8, NVFP4 is a lie

A user on Reddit's LocalLLaMA community shared findings on the Qwen3.6-35B model, focusing on Kullback-Leibler (KLD) divergence metrics for different quantization formats like INT8, FP8, and NVFP4. The analysis, conducted using a modified VLLM framework, suggests that FP8 and NVFP4 formats, while potentially faster, may offer lower quality compared to INT8. The user emphasizes that the choice of quantization should align with specific use cases, balancing accuracy, speed, and GPU compatibility. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Provides insights into quantization trade-offs, guiding operators on selecting optimal formats for specific hardware and performance needs.

RANK_REASON The cluster discusses a technical analysis of model quantization formats and their performance implications, which falls under research.

Read on r/LocalLLaMA →

Qwen3.6-35B model quantizations show FP8 quality worse than INT8, NVFP4 is a lie

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 Deutsch(DE) · /u/Phaelon74 ·

    Qwen3.6-35B-A3B KLDs - INTs and NVFPs

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1svq8lm/qwen3635ba3b_klds_ints_and_nvfps/"> <img alt="Qwen3.6-35B-A3B KLDs - INTs and NVFPs" src="https://preview.redd.it/c76w57d1yexg1.png?width=140&amp;height=83&amp;auto=webp&amp;s=33c0eae8633265e0074c8e89a…