NVIDIA Parakeet speech-to-text ported to ggml for faster CPU/GPU use

By PulseAugur Editorial · [1 sources] · 2026-05-31 20:35

A developer has successfully ported NVIDIA's Parakeet speech-to-text models to the ggml framework, enabling them to run efficiently on CPUs and GPUs without Python or PyTorch. This port achieves byte-for-byte identical output to NVIDIA's NeMo models, offering significant speedups of up to 5x on GPUs and 1.86x on CPUs, while also reducing memory usage. The quantized GGUF versions are available, and the project includes a C-API for broad integration, even powering a local OpenAI-compatible transcription endpoint via LocalAI. AI

IMPACT Enables wider, more efficient local deployment of advanced speech-to-text capabilities.

RANK_REASON Porting an existing model to a new framework for improved performance and accessibility.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

NVIDIA Parakeet speech-to-text ported to ggml for faster CPU/GPU use

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/mudler_it · 2026-05-31 20:35

I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tt6oja/i_ported_nvidia_parakeet_speechtotext_to_ggml/"> <img alt="I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python" src="https://external-preview.redd.…

COVERAGE [1]

I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python

RELATED ENTITIES

RELATED TOPICS