Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 9h

LongWebBench: Evaluating Structural and Functional Webpage Generation in Long-Horizon Settings

Researchers have introduced LongWebBench, a new benchmark designed to evaluate the generation of long webpages by vision-language models (VLMs). The benchmark assesses both structural coherence and functional interactivity, using real-world long webpages and goal-oriented interaction tasks. Experiments with current VLMs show that while visual plausibility can be maintained, structural fidelity decreases and functional execution fails as webpage length increases, highlighting the need for more robust evaluation methods beyond visual similarity. AI

IMPACT Highlights limitations in current VLM webpage generation, pushing for more functional and structural evaluation metrics.

Hugging Face
arXiv
DagsHub
alphaXiv
ScienceCast
CatalyzeX
Gotit.pub
LongWebBench