CogPortrait: Fine-Grained Eye-Region Control in Portrait Animation via Hierarchical Agent Planning
Researchers have introduced CogPortrait, a novel two-stage framework designed for generating portrait animations with fine-grained control over the eye region. This system utilizes three chain-of-thought Multimodal Large Language Models (MLLMs) agents to translate high-level labels into detailed facial keypoints. A DiT-based video generation backbone then synthesizes the animation, incorporating advanced techniques for enhanced visual quality and identity consistency, particularly in challenging boundary cases. AI
IMPACT This research introduces a novel approach to portrait animation, potentially improving the realism and expressiveness of AI-generated characters by offering more precise control over facial features like the eyes.