Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 5d

I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python

A developer has successfully ported NVIDIA's Parakeet speech-to-text models to the ggml framework, enabling them to run efficiently on CPUs and GPUs without Python or PyTorch. This port achieves byte-for-byte identical output to NVIDIA's NeMo models, offering significant speedups of up to 5x on GPUs and 1.86x on CPUs, while also reducing memory usage. The quantized GGUF versions are available, and the project includes a C-API for broad integration, even powering a local OpenAI-compatible transcription endpoint via LocalAI. AI

IMPACT Enables wider, more efficient local deployment of advanced speech-to-text capabilities.

NVIDIA
ggml
NeMo
LocalAI
Parakeet
mudler_it