Researchers have introduced HuM-Eval, a new framework designed to better evaluate the quality of human motion in generated videos. Current metrics often miss fine-grained human details, leading to evaluations that don't align with human preferences. HuM-Eval employs a coarse-to-fine approach, first using a Vision Language Model for a general assessment and then analyzing 2D pose for anatomical correctness and 3D motion for stability. This method achieved a 58.2% correlation with human judgment, surpassing existing benchmarks. AI
IMPACT Introduces a more accurate method for evaluating human motion in generated videos, potentially guiding future improvements in text-to-video models.
RANK_REASON Academic paper introducing a new evaluation framework for video generation models.
Read on Hugging Face Daily Papers →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →