ONNX Runtime outperforms HF Transformers in CPU-only speech benchmark

By PulseAugur Editorial · [1 sources] · 2026-06-05 13:01

A benchmark comparing ONNX Runtime, Hugging Face Transformers, and GGUF for the Parakeet TDT 0.6B model on CPU-only hardware revealed that ONNX Runtime achieved a 37% faster inference time than Hugging Face Transformers. This performance gain is attributed to ONNX Runtime's operator fusion and AVX2 optimizations, though it comes at the cost of higher memory usage. GGUF offered a more memory-efficient solution but with a doubled inference time, making it suitable for constrained deployments. AI

IMPACT ONNX Runtime's performance advantage on CPU could enable more efficient on-device speech processing.

RANK_REASON Benchmark comparing inference runtimes for a specific model on CPU hardware. [lever_c_demoted from research: ic=1 ai=0.7]

Read on r/MachineLearning →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

ONNX Runtime outperforms HF Transformers in CPU-only speech benchmark

COVERAGE [1]

r/MachineLearning TIER_1 English(EN) · /u/gvij · 2026-06-05 13:01

Benchmark: ONNX Runtime vs HF Transformers vs GGUF for Parakeet TDT 0.6B on CPU-only hardware [D]

<div class="md">Sharing a small CPU inference benchmark for nvidia/parakeet-tdt-0.6b-v3 that turned up a result I didn't expect going in. Setup: 2 x86-64 vCPUs (AVX2/FMA), 7.7GB RAM, no GPU. Test audio: 16.78s Harvard sentences at 16kHz m…

COVERAGE [1]

Benchmark: ONNX Runtime vs HF Transformers vs GGUF for Parakeet TDT 0.6B on CPU-only hardware [D]

RELATED ENTITIES

RELATED TOPICS