Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 5h

Reasoning for Mobile User Experience with Multimodal LLMs: Task, Benchmark, and Approach

Researchers have introduced UXBench, a new benchmark designed to evaluate how well multimodal large language models (MLLMs) can reason about user experience (UX) based on UI screenshots. The benchmark includes 2,000 VQA data samples across 8 tasks, assessing issues like layout, visual hierarchy, and content consistency. Evaluations of existing MLLMs revealed significant limitations in UI-based reasoning, prompting the development of UI-UX, an MLLM that uses a Qwen3-VL-4B-Thinking foundation model enhanced with reinforcement learning. UI-UX achieved state-of-the-art performance on UXBench, outperforming models like Claude-4.5-Sonnet. AI

IMPACT Highlights the need for improved multimodal reasoning in LLMs for practical UI/UX applications.

Claude-4.5-Sonnet
MLLMs
UXBench
UI-UX
Qwen3-VL-4B-Thinking