OpenAI launches advanced audio models for API, enhancing voice agents

By PulseAugur Editorial · [3 sources] · 2022-07-28 00:00

OpenAI has released new, advanced audio models through its API, enhancing capabilities for voice agents. The updated speech-to-text models, including gpt-4o-transcribe and gpt-4o-mini-transcribe, offer improved accuracy and reliability, particularly in challenging audio conditions. Additionally, a new text-to-speech model, gpt-4o-mini-tts, allows developers to customize vocal delivery for more expressive and tailored applications. AI

RANK_REASON OpenAI released new generation audio models with improved performance benchmarks and new steerability features.

Read on Hugging Face Blog →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

OpenAI launches advanced audio models for API, enhancing voice agents

COVERAGE [3]

OpenAI News TIER_1 English(EN) · 2025-03-20 11:00

Introducing next-generation audio models in the API

For the first time, developers can also instruct the text-to-speech model to speak in a specific way—for example, “talk like a sympathetic customer service agent”—unlocking a new level of customization for voice agents.
Hugging Face Blog TIER_1 Português(PT) · 2022-12-15 00:00

A Complete Guide to Audio Datasets
Hugging Face Blog TIER_1 English(EN) · 2022-07-28 00:00

Introducing new audio and vision documentation in 🤗 Datasets

COVERAGE [3]

Introducing next-generation audio models in the API

A Complete Guide to Audio Datasets

Introducing new audio and vision documentation in 🤗 Datasets

RELATED ENTITIES

RELATED TOPICS