PulseAugur
实时 17:50:38
English(EN) MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

MaineCoon:新型220亿参数音视频模型实现实时社交互动

研究人员推出了MaineCoon,一个拥有220亿参数的音视频自回归模型,专为实时社交互动而设计。该模型在单个GPU上可实现高达47.5 FPS的帧率,并支持带有代理推理框架的长时序生成。MaineCoon采用了新颖的训练技术,如自重采样和强化在线策略蒸馏,旨在为针对AI原生社交平台的低延迟、高质量音视频生成设定新标杆。 AI

影响 为实时音视频生成设定了新标杆,有望赋能下一代AI原生社交平台。

排序理由 该集群描述了一篇在arXiv上发布的新研究论文,详细介绍了一个新颖的音视频模型。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

MaineCoon:新型220亿参数音视频模型实现实时社交互动

报道来源 [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

    MaineCoon represents the first real-time audio-visual autoregressive model for social worlds, achieving high frame rates and long-horizon generation through novel training techniques and inference frameworks.

  2. arXiv cs.CV TIER_1 English(EN) · Lichen Bai, Tianhao Zhang, Shitong Shao, Dingwei Tan, Qiyu Zhong, Zhengpeng Xie, Haopeng Li, Qinghao Huang, Dandan Shen, Tengjiao Ji, Wei Wang, Peicheng Wu, Yuxuan Zhao, Xiangyu Zhu, Welly Luo, Shurui Yang, Zeke Xie ·

    MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

    arXiv:2606.17800v1 Announce Type: new Abstract: As an increasing majority of global video content is consumed on social platforms for interactive social purposes, video generation models built for social worlds are important but largely overlooked by previous studies. In this wor…

  3. arXiv cs.CV TIER_1 English(EN) · Zeke Xie ·

    MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

    As an increasing majority of global video content is consumed on social platforms for interactive social purposes, video generation models built for social worlds are important but largely overlooked by previous studies. In this work, we define the position of social world models…