Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 2w · [2 sources]

CogPortrait: Fine-Grained Eye-Region Control in Portrait Animation via Hierarchical Agent Planning

Researchers have introduced CogPortrait, a novel two-stage framework designed for generating portrait animations with fine-grained control over the eye region. This system utilizes three chain-of-thought Multimodal Large Language Models (MLLMs) agents to translate high-level labels into detailed facial keypoints. A DiT-based video generation backbone then synthesizes the animation, incorporating advanced techniques for enhanced visual quality and identity consistency, particularly in challenging boundary cases. AI

IMPACT This research introduces a novel approach to portrait animation, potentially improving the realism and expressiveness of AI-generated characters by offering more precise control over facial features like the eyes.

Multimodal Large Language Models
EMH benchmark
CogPortrait