Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 5h

WildIFEval: Instruction Following in the Wild

Researchers have introduced WildIFEval, a new dataset comprising 7,000 real-world user instructions designed to test the ability of large language models (LLMs) to follow complex, multi-constraint commands. The dataset spans a wide range of topics and constraint types, categorized into eight classes to analyze their real-world distribution. Experiments using WildIFEval revealed that while larger models perform better, all current LLMs still have significant room for improvement in handling such intricate instructions, with performance varying based on the number and type of constraints. AI

IMPACT This dataset will enable more rigorous evaluation of LLMs' ability to handle complex, real-world instructions, potentially driving improvements in their practical usability.

LLMs
WildIFEval