Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 10h · [2 sources]

Benchmarking Multimodal LLMs on Code Generation for Complex Interactive Webpages

Researchers have developed new benchmarks to evaluate the ability of multimodal large language models (MLLMs) to generate code for complex, interactive webpages. The first benchmark, WebIGBench, focuses on real-world websites and assesses code generation for dynamic user interactions like clicks and inputs. The second, I-WebGenBench, specifically targets the conversion of scientific research papers into executable interactive web systems, evaluating the models' capacity to handle dynamic mechanisms and state transitions. AI

IMPACT These benchmarks will drive improvements in LLMs' ability to create functional, interactive web applications and systems from various inputs.

multimodal large language models
WebIGBench
I-WebGenBench
PaperVoyager