Qwen2.5 Omni
PulseAugur coverage of Qwen2.5 Omni — every cluster mentioning Qwen2.5 Omni across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
Raon-Speech 发布 90 亿参数模型,用于语音理解与生成
研究人员推出了 Raon-Speech,一个拥有 90 亿参数的语音语言模型,能够理解、回答和生成英语和韩语的语音。该模型在超过 138 万小时的精选语音和文本数据上进行训练,在以语音为中心的任务上表现优于同等规模的音频基础模型,同时保持了强大的文本问答能力。一个名为 Raon-SpeechChat 的扩展通过额外的对话数据训练,进一步增强了实时全双工对话能力,在轮次转换和中断敏感性方面表现出色。
-
SEATS 方法通过修剪音视频 Token 削减大语言模型计算量
研究人员开发了一种名为 SEATS 的新方法,以提高全模态大语言模型(om-LLMs)的效率。SEATS 在模型的各个层中修剪冗余的音视频 Token,并根据跨模态融合自适应地调整 Token 选择过程。这种方法在保持高性能的同时,显著降低了计算负荷并加快了推理速度。
-
AffectVerse model predicts future emotions using temporal imagination
Researchers have introduced AffectVerse, a new multimodal model designed for affective computing that integrates temporal prediction into its reasoning process. Unlike previous models that treated emotion recognition st…
-
Omni-Encoder unifies vision and audio processing for human-like motion perception
Researchers have developed Omni-Encoder, a novel Transformer backbone that unifies visual and audio signals for more holistic perception. Unlike previous models that process modalities separately and at different rates,…
-
New framework reveals audio hallucinations in egocentric video models
Researchers have developed a new framework to evaluate audio hallucinations in egocentric videos, where models infer sounds from visual cues that are not actually heard. Their study found that advanced audio-visual lang…