ENTITY Fp8

Fp8

PulseAugur coverage of Fp8 — every cluster mentioning Fp8 across labs, papers, and developer communities, ranked by signal.

Total · 30d

32

32 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

17

17 over 90d

TIER MIX · 90D

frontier release 1
significant 2
research 11
tool 18

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

10 day(s) with sentiment data

RECENT · PAGE 1/2 · 32 TOTAL

TOOL · CL_111954 · Jun 26 · 06:14

Ornith 1.0 models explained: Dense vs MoE and format/precision details

A guide has been released to explain the terminology and concepts behind the new Ornith 1.0 models. The guide clarifies the difference between Dense and Mixture of Experts (MoE) architectures, noting that MoE models act…
TOOL · CL_111060 · Jun 25 · 20:28

ComfyUI adds native INT8 support for faster Stable Diffusion image generation

ComfyUI, a popular interface for Stable Diffusion, has officially integrated native support for INT8 quantization. This update allows users to load INT8 models and text encoders directly within ComfyUI, significantly im…
RESEARCH · CL_108307 · Jun 24 · 06:59

Krea2 Turbo FP8 model tested for character recognition and performance

Users are testing the Krea2 Turbo FP8 model, noting its performance and character recognition capabilities. One extensive test involved over 1000 prompts to evaluate how well the model identifies characters from various…
TOOL · CL_107964 · Jun 24 · 04:00

New FFT method leverages FP8 tensor cores for high-precision GPU computation

A new research paper proposes an efficient method for calculating Fast Fourier Transforms (FFTs) using NVIDIA's Blackwell Ultra (B300) GPUs. The Ozaki-Bailey FFT technique leverages FP8 tensor cores for dense matrix mul…
TOOL · CL_107495 · Jun 24 · 00:30

Krea2 models released for StableDiffusion in GGUF and FP8 formats

New models and workflows for Krea2 have been released, including GGUF and FP8 formats. These resources are intended for use with StableDiffusion and are available via Hugging Face. The release also includes additional f…
TOOL · CL_106864 · Jun 23 · 09:59

Krea 2 image model released in multiple quantized formats for broader GPU access

The Krea 2 image generation model has been released in quantized versions, including FP8, MXFP8, NVFP4, and INT8 formats, making it accessible for a wider range of GPUs. The model comes in two variants: Krea 2 Raw for t…
TOOL · CL_106207 · Jun 20 · 11:15

NVIDIA Blackwell platform dominates MLPerf Training 6.0 benchmarks

NVIDIA's Blackwell platform has set new records in the MLPerf Training 6.0 benchmarks, achieving the fastest times across all seven tests. The platform demonstrated strong scaling, with clusters of up to 8,192 GPUs show…
TOOL · CL_92176 · Jun 15 · 15:34

Ideogram 4.0 FP8 VRAM Needs: 16GB vs 24GB GPU Debate

A user is seeking advice on GPU VRAM requirements for running Ideogram 4.0 FP8 locally. They are debating between a 16GB RTX 4070 Ti Super and a 24GB RTX 3090, noting that Ideogram 4.0 with its text encoder can consume …
RESEARCH · CL_90898 · Jun 12 · 16:19

New INT8 Kernel Accelerates Diffusion Transformers on Consumer GPUs

Researchers have developed a fused INT8 GEMM kernel that significantly speeds up diffusion transformers on consumer Ampere GPUs. This new kernel allows the hardware's INT8 tensor cores to be utilized, overcoming a softw…
TOOL · CL_86852 · Jun 12 · 04:00

Apple M4 Max GPU's Tensor Compute Path Emulated, Not Accelerated

Researchers have reverse-engineered the Metal 4.1 tensor compute path on Apple's M4 Max GPU, revealing that the fp8 matmul2d operation is emulated rather than hardware-accelerated. This means the operation runs on the G…
RESEARCH · CL_84482 · Jun 10 · 16:19

New quantization methods enable Ideogram 4.0 on consumer GPUs

Researchers have developed new post-training quantization techniques for the Ideogram 4.0 text-to-image diffusion transformer. Their INT8 W8A8 method maintains FP8 quality on consumer GPUs lacking FP8 tensor cores, outp…
RESEARCH · CL_79487 · Jun 8 · 16:04

Paper catalogs 84 numeric formats for ML hardware consistency

A new paper introduces a comprehensive catalog of 84 numeric formats used in machine learning hardware, addressing the challenge of silent divergences when porting models across different accelerators. The catalog inclu…
TOOL · CL_77247 · Jun 8 · 04:00

FP8 attention precision issues analyzed, reverse iteration and S=256 scaling proposed

A new research paper analyzes precision challenges in FP8 attention computations, specifically focusing on the softmax probability matrix (P) when cast to FP8. The study identifies an issue called "P-collapse" that occu…
TOOL · CL_77245 · Jun 8 · 04:00

FP8 with reconstruction schemes matches FP64 accuracy in HPC

A new research paper challenges the long-held belief that double-precision (FP64) hardware is essential for high-performance computing (HPC). The authors propose that using FP8 tensor cores, combined with specific recon…
SIGNIFICANT · CL_66950 · Jun 2 · 14:13

Hcompany ships Holo3.1 agents for fast, local computer use

Hcompany has released Holo3.1, a new family of computer-use agents designed for robust performance across various environments and agent frameworks. This release emphasizes local inference capabilities, offering quantiz…
TOOL · CL_66775 · Jun 2 · 12:59

Fizgig Klein 9b Lora Studio updates for 16GB cards

Fizgig Klein 9b Lora Studio has released version 1.2.4, focusing on performance improvements and optimizations for users with 16GB graphics cards. This update enhances training speed through FP8 utilization and allows f…
TOOL · CL_57174 · May 28 · 13:23

RTX 3060 users: Disable low-VRAM flags for better Flux Klein performance

A user on Reddit discovered that for the Flux 2 Klein model on an RTX 3060 with 12GB VRAM, FP8 quantization performed similarly to GGUF quantization in terms of speed. The primary performance bottleneck was not the mode…
RESEARCH · CL_55741 · May 28 · 03:32

Trillion-parameter AI models challenge Kubernetes orchestration

Running trillion-parameter AI models within Kubernetes clusters presents significant challenges beyond standard container orchestration. These massive models require distributed systems approaches, where a single 'repli…
RESEARCH · CL_44358 · May 22 · 15:59

Together AI releases FlashAttention-3 and -4 for faster LLM processing

Together AI has released FlashAttention-3 and FlashAttention-4, significant upgrades to their GPU-accelerated attention mechanism for large language models. FlashAttention-3, designed for Hopper GPUs, achieves up to 75%…
RESEARCH · CL_48868 · May 21 · 22:23

New methods enhance LLM quantization for efficiency and accuracy

Researchers have developed several new methods to improve the efficiency and accuracy of quantizing large language models (LLMs). These techniques aim to reduce the memory footprint and computational cost of LLMs, makin…