Brief

last 24h

[4/4] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · Together AI blog English(EN) · 2w

Introducing voice finder — a new tool to quickly find the right voice for your app from over 600+ voices

Together AI has launched Voice Finder, a new tool designed to help developers quickly select the most suitable voice for their applications from a catalog of over 600 options. The tool allows users to search for voices by describing their desired characteristics or by uploading an audio sample for comparison. Voice Finder categorizes each voice across more than 15 attributes, including pitch, accent, and emotion, to streamline the selection process for voice agents. AI

IMPACT Simplifies voice selection for developers building voice agents, potentially accelerating deployment.
TOOL · Together AI blog English(EN) · 3mo

How speech models fail where it matters the most and what to do about it

Researchers at Together AI have found that current state-of-the-art speech recognition models exhibit a significant failure rate, averaging 39% error in transcribing street names, particularly for non-native English speakers who are 18% more likely to be misunderstood. This inaccuracy can lead to substantial real-world consequences, such as increased travel time and costs for services like ride-sharing. The study suggests that a synthetic data generation technique called "cross-lingual style transfer" can improve transcription accuracy by up to 60% with minimal training data. AI

IMPACT Speech recognition systems need improvement for real-world applications, especially for diverse linguistic groups, to avoid costly errors.
- OpenAI
- Microsoft
- Google
- Together AI
- Whisper
- Deepgram
- Phi-4
SIGNIFICANT · Together AI blog English(EN) · 6mo · [4 sources]

Announcing the fastest inference for realtime voice AI agents

Together AI has launched a unified platform for building real-time voice agents, integrating speech-to-text (STT), large language models (LLM), and text-to-speech (TTS) within a single cloud environment. This co-location aims to reduce latency to under 500ms and simplify deployment by eliminating inter-vendor network hops. The platform now natively hosts models like Deepgram for STT and Cartesia Sonic-3 for TTS, offering developers more choice and a streamlined experience for production-ready voice applications. AI

IMPACT Accelerates development of real-time conversational AI applications by simplifying infrastructure and reducing latency.
SIGNIFICANT · arXiv cs.AI English(EN) · 10mo · [3 sources]

KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI

Together AI has launched new speech-to-text (STT) and text-to-speech (TTS) capabilities, integrating Deepgram's advanced voice models and its own high-performance Whisper V3 API. This move aims to streamline the development of real-time voice agents by providing a unified platform for transcription, LLM processing, and synthesis. The offerings emphasize speed, accuracy, and enterprise-grade features like zero data retention and large file handling, addressing key latency and quality issues in current voice AI applications. AI

IMPACT Streamlines voice AI development by unifying STT, LLM, and TTS, addressing critical latency and quality issues for real-time applications.
- OpenAI
- Together AI
- Flux
- Deepgram
- Whisper V3
- Aura-2
- Nova-3

Brief

Introducing voice finder — a new tool to quickly find the right voice for your app from over 600+ voices

How speech models fail where it matters the most and what to do about it

Announcing the fastest inference for realtime voice AI agents

KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI