Brief · PulseAugur

TOOL · arXiv cs.CV Dansk(DA) · 4d

Universal Skeleton Understanding via Differentiable Rendering and MLLMs

Researchers have developed SkeletonLLM, a novel approach to enable multimodal large language models (MLLMs) to understand structured, non-visual data like human skeletons. The system uses DrAction, a differentiable renderer that converts skeletal motion into image sequences, allowing MLLMs to process this data directly. This method facilitates open-vocabulary action recognition, motion captioning, and question answering across diverse skeleton formats, suggesting a path for MLLMs to engage with non-native data types. AI

IMPACT Enables LLMs to process structured, non-visual data like human skeletons, expanding their application scope.

MLLMs
Ziyi Wang
SkeletonLLM
DrAction