Task Robustness via Re-Labelling Vision-Action Robot Data
Researchers have developed a new framework called TREAD to improve robot learning by augmenting existing datasets. This method uses large Vision-Language Models (VLMs) to generate more diverse and linguistically rich instructions for robot tasks. By decomposing demonstrations into grounded language-action pairs and adding variations of text goals, TREAD enhances a robot's ability to understand and generalize to new instructions and scenarios. AI
IMPACT Enhances robot instruction following and generalization by leveraging VLM capabilities for data augmentation.