FP8 quantization
PulseAugur coverage of FP8 quantization — every cluster mentioning FP8 quantization across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
Gemma 2 9B FP8 quantization shows prefill tax but faster generation
A benchmark evaluation of the self-hosted Gemma 2 9B model, particularly its FP8 quantized variant, revealed trade-offs when compared to frontier APIs. While FP8 quantization significantly increases the time to first to…
-
club-3090 adds FP8 support for Qwen3.6-27B model
The club-3090 project has introduced experimental FP8 quantization support for the Qwen3.6-27B model. This new feature is particularly relevant for users operating dual RTX 3090 graphics card setups. The performance of …
-
DeepSeek V4 benchmarks show 85 tok/s at 524k context; Ollama guide for Ryzen APUs released
New benchmarks reveal DeepSeek V4 Flash achieving 85 tokens per second with a 524k context window, utilizing MTP self-speculation and FP8 quantization on dual RTX PRO 6000 Max-Q GPUs. Additionally, a guide has been publ…