DEFLECT: Temporal Counterfactual Preference Learning for Delay-Robust Asynchronous VLAs
Researchers have developed DEFLECT, a new post-training framework designed to improve the robustness of asynchronous Vision-Language-Action (VLA) policies in robotics. This method addresses the challenge of stale observations during inference by converting latency-induced mismatches into counterfactual preference supervision. DEFLECT trains policies to favor actions aligned with the execution-time state, without requiring human labels, online robot rollouts, or additional inference computation. Experiments across various tasks showed DEFLECT significantly enhances delay robustness, improving success rates by up to 6.4 percentage points. AI
IMPACT Enhances robotic control by improving VLA policy performance under latency, potentially enabling more complex real-world applications.