meituan-longcat/LongCat-Video-Avatar-1.5
Meituan-Longcat has released LongCat-Video-Avatar 1.5, an open-source framework for audio-driven human video generation. This upgraded version features an improved Whisper-Large audio encoder for more natural lip-syncing and enhanced stability for consistent identity and temporal coherence. The model supports various tasks like AT2V and ATI2V, generalizes to diverse styles including anime and animals, and offers efficient 8-step inference. AI
IMPACT Enables creation of diverse avatar videos from audio, potentially impacting content creation and virtual interactions.