StepFun launches StepAudio 2.5 with real-time voice and persona consistency

By PulseAugur Editorial · [1 sources] · 2026-05-24 22:51

StepFun has released StepAudio 2.5 Realtime, an end-to-end speech large language model capable of real-time, customizable persona interactions. The model integrates speech understanding and generation, utilizing a million-scale persona data augmentation and roleplay-specific Reinforcement Learning from Human Feedback (RLHF) to maintain character consistency. A key differentiator is its paralinguistic comprehension, allowing it to perceive user mood and intentions from vocal cues like tone and speech rate, achieving a score of 82.18 on a relevant benchmark. AI

IMPACT Enhances real-time conversational AI with improved persona consistency and paralinguistic understanding.

RANK_REASON Release of a new speech LLM with novel architectural innovations and benchmark results. [lever_c_demoted from research: ic=1 ai=1.0]

Read on MarkTechPost →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

StepFun launches StepAudio 2.5 with real-time voice and persona consistency

COVERAGE [1]

MarkTechPost TIER_1 English(EN) · Michal Sutter · 2026-05-24 22:51

StepFun Releases StepAudio 2.5 Realtime: An End-to-End Voice Model with Roleplay-Specific RLHF and Paralinguistic Comprehension

<p>StepFun, the Shanghai-based AI lab, released StepAudio 2.5 Realtime in May 2026 — an end-to-end real-time speech large language model with fully customizable persona capabilities. The model connects via a WebSocket API, supports Chinese and English, and ranked first across all…

COVERAGE [1]

StepFun Releases StepAudio 2.5 Realtime: An End-to-End Voice Model with Roleplay-Specific RLHF and Paralinguistic Comprehension

RELATED ENTITIES

RELATED TOPICS