Moshi
PulseAugur coverage of Moshi — every cluster mentioning Moshi across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
New dialogue system integrates real-time facial generation with speech
Researchers have developed Moshi-Face, a novel full-duplex spoken dialogue system that integrates facial generation with audio processing. This system utilizes a VQ-VAE to encode facial data into discrete tokens and a F…
-
BayLing-Duplex enables native full-duplex speech dialogue with single LLM
Researchers have developed BayLing-Duplex, a novel full-duplex speech language model that enables simultaneous listening and speaking without relying on external turn-taking modules. This single autoregressive LLM can m…
-
Moshi dialogue models show synchronized internal states and predict turn-taking
Researchers have explored how full-duplex speech dialogue models coordinate their internal representations during interaction. By simulating dialogues between two instances of the Moshi model, they observed strong repre…
-
Thinking Machines previews interaction models for real-time AI collaboration
Thinking Machines has introduced a research preview of interaction models designed for native, real-time collaboration. These models process audio, video, and text simultaneously, allowing for continuous thought, respon…
-
New methods boost full-duplex speech models for better interaction
Researchers have developed new methods to enhance full-duplex speech models, enabling more natural and interactive conversations. One approach focuses on improving interactivity axes like pause handling and turn-taking …
-
Sakana AI's KAME architecture injects LLM knowledge into speech AI without latency
Sakana AI has developed KAME, a novel tandem architecture for speech-to-speech AI that aims to combine the speed of direct systems with the knowledge depth of LLM-based approaches. KAME operates with two asynchronous co…
-
Josh Talks launches first full-duplex Hindi conversational AI model
Researchers have developed the first open and reproducible full-duplex spoken dialogue system for the Hindi language. This system, named Human-1, adapts the Moshi architecture and was trained on over 26,000 hours of rea…