Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue
Researchers have developed a new framework called Ask-to-Clarify to address ambiguity in instructions given to embodied agents. This system uses a multi-turn dialogue to ask clarifying questions before generating low-level actions. The framework integrates a Visual-Language Model (VLM) for collaboration and a diffusion model for action generation, with a connection module to condition the diffusion process. Evaluated on eight real-world tasks, Ask-to-Clarify demonstrated superior performance compared to existing state-of-the-art VLAs, paving the way for more collaborative embodied agents. AI
IMPACT Enhances embodied AI's ability to understand and execute complex, ambiguous instructions, moving towards more collaborative human-AI interaction.