Universal Skeleton Understanding via Differentiable Rendering and MLLMs
Researchers have developed SkeletonLLM, a novel approach to enable multimodal large language models (MLLMs) to understand structured, non-visual data like human skeletons. The system uses DrAction, a differentiable renderer that converts skeletal motion into image sequences, allowing MLLMs to process this data directly. This method facilitates open-vocabulary action recognition, motion captioning, and question answering across diverse skeleton formats, suggesting a path for MLLMs to engage with non-native data types. AI
IMPACT Enables LLMs to process structured, non-visual data like human skeletons, expanding their application scope.