Brief · PulseAugur

RESEARCH · Hugging Face Daily Papers English(EN) · 6d · [5 sources]

GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection

Researchers have developed GUI-CIDER, a novel mid-training method designed to enhance the world knowledge of GUI agents built with multimodal large language models. This approach explicitly internalizes GUI operational knowledge through causal internalization and density-aware exemplar reselection, addressing limitations of traditional post-training methods. GUI-CIDER synthesizes data, refines it by prioritizing causal structures and reducing redundancy, and then uses this refined data for mid-training. Experiments show significant improvements in GUI understanding and task success rates for agents trained with this method. AI

IMPACT This method could lead to more capable and reliable GUI agents, improving user interaction with software.

multimodal large language models
GUI agents
Supervised Fine-Tuning (SFT)
GUI-CIDER