FeatherOps: Fast fp8 matmul on RDNA3 without native fp8, now supports more models
FeatherOps, a new integration for ComfyUI, enables faster matrix multiplication on RDNA3 GPUs by leveraging FP8 precision without native hardware support. This optimization has shown speedups of 30-50% for certain workloads, with compatibility tested for models like Anima, LTX 2.3, and Qwen-Image. The project aims to improve inference performance for various image generation models. AI
IMPACT Improves inference speed for AI image generation models on specific hardware.