Researchers have introduced Chat2Workflow, a new benchmark designed to evaluate the ability of large language models (LLMs) to generate executable visual workflows from natural language prompts. The benchmark, derived from real-world business workflows, aims to automate the current manual process of workflow construction, which is often costly and error-prone. While current LLMs can grasp high-level intentions, they struggle with generating accurate and deployable workflows, with even an advanced agentic baseline achieving only a 6.05% improvement in resolution rate, highlighting the need for further advancements in industrial-grade automation. AI
IMPACT This benchmark could drive progress in automating complex task execution via LLMs, potentially streamlining business process development.
RANK_REASON The cluster describes a new academic benchmark for evaluating LLM capabilities in generating visual workflows. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →