PulseAugur
EN
LIVE 17:45:53

MaineCoon: New 22B Parameter Audio-Visual Model Achieves Real-Time Social Interaction

Researchers have introduced MaineCoon, a 22-billion parameter audio-visual autoregressive model designed for real-time social interactions. This model achieves a high frame rate of up to 47.5 FPS on a single GPU and supports long-horizon generation with agentic inference frameworks. MaineCoon incorporates novel training techniques such as self-resampling and reinforced online-policy distillation, aiming to set a new benchmark for low-latency, high-quality audio-visual generation tailored for AI-native social platforms. AI

IMPACT Sets a new benchmark for real-time audio-visual generation, potentially enabling next-generation AI-native social platforms.

RANK_REASON The cluster describes a new research paper detailing a novel audio-visual model released on arXiv.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

MaineCoon: New 22B Parameter Audio-Visual Model Achieves Real-Time Social Interaction

COVERAGE [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

    MaineCoon represents the first real-time audio-visual autoregressive model for social worlds, achieving high frame rates and long-horizon generation through novel training techniques and inference frameworks.

  2. arXiv cs.CV TIER_1 English(EN) · Lichen Bai, Tianhao Zhang, Shitong Shao, Dingwei Tan, Qiyu Zhong, Zhengpeng Xie, Haopeng Li, Qinghao Huang, Dandan Shen, Tengjiao Ji, Wei Wang, Peicheng Wu, Yuxuan Zhao, Xiangyu Zhu, Welly Luo, Shurui Yang, Zeke Xie ·

    MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

    arXiv:2606.17800v1 Announce Type: new Abstract: As an increasing majority of global video content is consumed on social platforms for interactive social purposes, video generation models built for social worlds are important but largely overlooked by previous studies. In this wor…

  3. arXiv cs.CV TIER_1 English(EN) · Zeke Xie ·

    MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

    As an increasing majority of global video content is consumed on social platforms for interactive social purposes, video generation models built for social worlds are important but largely overlooked by previous studies. In this work, we define the position of social world models…