Meituan-Longcat has released LongCat-Video-Avatar 1.5, an open-source framework for audio-driven human video generation. This upgraded version features an improved Whisper-Large audio encoder for more natural lip-syncing and enhanced stability for consistent identity and temporal coherence. The model supports various tasks like AT2V and ATI2V, generalizes to diverse styles including anime and animals, and offers efficient 8-step inference. AI
IMPACT Enables creation of diverse avatar videos from audio, potentially impacting content creation and virtual interactions.
RANK_REASON The cluster describes the release of an open-source model framework with technical details and evaluation metrics.
Read on Hugging Face Trending Models →
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →