StepFun has released StepAudio 2.5 Realtime, an end-to-end speech large language model capable of real-time, customizable persona interactions. The model integrates speech understanding and generation, utilizing a million-scale persona data augmentation and roleplay-specific Reinforcement Learning from Human Feedback (RLHF) to maintain character consistency. A key differentiator is its paralinguistic comprehension, allowing it to perceive user mood and intentions from vocal cues like tone and speech rate, achieving a score of 82.18 on a relevant benchmark. AI
Summary written by gemini-2.5-flash-lite from 1 sources. How we write summaries →
IMPACT Enhances real-time conversational AI with improved persona consistency and paralinguistic understanding.
RANK_REASON Release of a new speech LLM with novel architectural innovations and benchmark results. [lever_c_demoted from research: ic=1 ai=1.0]