Thinking Machines has unveiled a new class of "interaction models" designed for real-time conversational AI. These models process audio, video, and text in rapid 200-millisecond intervals, eliminating the need for separate turn-detection components. This architecture allows for continuous, interleaved input and output streams, enabling capabilities like speaking while listening and reacting to visual cues without explicit prompts. The system utilizes two co-trained models: a lightweight interaction model for live conversation and a background model for complex tasks like planning and tool use, ensuring low latency for users. AI
IMPACT Enables more natural, responsive conversational AI by integrating interactivity directly into model architecture.
RANK_REASON Research preview announcement of a new class of models with novel architectural approach. [lever_c_demoted from research: ic=1 ai=1.0]
- The Bitter Lesson
- FD-bench V1
- Gemini-3.1-flash-live
- GPT-realtime-2.0
- IFEval
- Thinking Machines
- TML-Interaction-Small
- Whisper
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →