GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
Researchers have developed GUI-CIDER, a novel mid-training method designed to enhance the world knowledge of GUI agents built with multimodal large language models. This approach explicitly internalizes GUI operational knowledge through causal internalization and density-aware exemplar reselection, addressing limitations of traditional post-training methods. GUI-CIDER synthesizes data, refines it by prioritizing causal structures and reducing redundancy, and then uses this refined data for mid-training. Experiments show significant improvements in GUI understanding and task success rates for agents trained with this method. AI
IMPACT This method could lead to more capable and reliable GUI agents, improving user interaction with software.