English(EN) How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

大型语言模型对话系统在全双工交互中面临路由权衡

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-11 08:46

研究人员探讨了大型语言模型如何在全双工对话系统中有效地处理用户输入，同时生成口语响应。他们比较了两种方法：通道融合（将用户输入直接集成到大型语言模型的输入流中）和交叉注意力路由（使用通过交叉注意力访问的外部内存）。通道融合提高了语义基础和问答准确性，但容易在中断期间发生上下文损坏。交叉注意力路由通过保留生成上下文，对中断更具鲁棒性，尽管其在问答方面的表现较低。 AI

影响探讨了大型语言模型在实时口语对话中的架构选择，影响了未来的语音助手和对话式人工智能开发。

排序理由学术论文，详细介绍了对大型语言模型对话系统架构的研究。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Zhiyong Wu · 2026-05-11 08:46

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Full-duplex spoken dialogue requires a model to keep listening while generating its own spoken response. This is challenging for large language models (LLMs), which are designed to extend a single coherent sequence and do not naturally support user input arriving during generatio…

报道来源 [1]

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

相关实体

相关话题