ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection
Researchers have developed ScriptHOI, a novel framework for open-vocabulary human-object interaction detection. This approach decomposes interaction phrases into specific state slots like body-role and contact, enabling a more nuanced understanding beyond simple co-occurrence. ScriptHOI utilizes a visual state tokenizer and slot-wise matching to assess script coverage and conflict, improving recognition of rare interactions and reducing false positives. The method also incorporates interval partial-label learning to better handle incomplete annotations. AI
IMPACT Enhances the ability of AI systems to understand complex human actions in visual scenes, improving applications like robotics and surveillance.