A new framework called BEF4LLM has been developed to systematically evaluate the quality of Business Process Model and Notation (BPMN) models generated by large language models (LLMs). The framework assesses models across four dimensions: syntactic, pragmatic, semantic, and validity. In a comprehensive analysis, LLMs demonstrated strong performance in syntactic and pragmatic quality, while human experts maintained an edge in semantic aspects, though the differences were not substantial. The findings highlight LLMs' potential for BPMN modeling while identifying areas for improvement, particularly in validity and semantic quality, to enhance practical deployment. AI
IMPACT Provides a structured method for evaluating LLM performance in business process modeling, guiding future development.
RANK_REASON Academic paper introducing a new evaluation framework for LLMs. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →