Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 7h

Auditing Demonstration Curation Metrics: Action-Only Scorers Fail on the Structural Defects That Degrade Imitation Policies

Researchers have developed a controlled testbed to evaluate imitation-learning curation metrics, finding that action-only scoring methods fail to detect critical structural errors in demonstration data. These errors, where a demonstration executes a wrong action at a key moment, are invisible to many metrics and can even lead to worse policy performance. Only metrics that analyze the state trajectory show promise in identifying these structural defects, though they still recover only a fraction of the performance loss. AI

IMPACT Highlights critical flaws in current AI training data validation, potentially impacting the reliability and safety of imitation learning models.

Imitation-learning
Demonstration curation metrics