New benchmark evaluates LLMs for generating Mermaid sequence diagrams

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-28 04:00

Researchers have introduced MermaidSeqBench, a new benchmark designed to evaluate the ability of large language models to generate Mermaid sequence diagrams from natural language prompts. The benchmark includes 132 human-verified and LLM-augmented samples, assessing aspects like syntax correctness and practical usability. Initial evaluations using LLM judges revealed significant capability gaps among current state-of-the-art models, highlighting the need for improved diagram generation standards for software engineering applications. AI

影响 Provides a standardized evaluation for LLM-generated diagrams, crucial for reliable deployment in software engineering.

排序理由 Introduction of a new evaluation benchmark for LLM capabilities in generating structured diagrams.

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Basel Shbita, Farhan Ahmed, Chad DeLuca · 2026-04-28 04:00

MermaidSeqBench: An Evaluation Benchmark for NL-to-Mermaid Sequence Diagram Generation

arXiv:2511.14967v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have shown great promise in generating structured diagrams from natural language descriptions, particularly Mermaid sequence diagrams for software engineering. However, the lack of existing ben…

报道来源 [1]

MermaidSeqBench: An Evaluation Benchmark for NL-to-Mermaid Sequence Diagram Generation

相关实体

相关话题