PulseAugur
实时 05:50:26

New benchmark reveals LLMs struggle with structured engineering design tasks

A new benchmark called BIM-Edit has been developed to evaluate the capabilities of large language models (LLMs) in editing Building Information Models (BIM) represented in the Industry Foundation Classes (IFC) format. The benchmark includes 324 editing tasks across 11 realistic and 36 synthetic building models, covering direct, spatial, and topological edits. Current LLMs show significant limitations, with the best-performing model achieving only a 49.5% score across geometric accuracy, semantic validity, and topological consistency, and failing to fully solve more than 3.4% of tasks. This highlights a substantial gap between LLM abilities and the demands of structured engineering design workflows. AI

影响 Highlights significant limitations in current LLMs for structured engineering design, indicating a need for further development in this domain.

排序理由 The item is a research paper introducing a new benchmark for evaluating LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New benchmark reveals LLMs struggle with structured engineering design tasks

报道来源 [1]

  1. arXiv cs.AI TIER_1 English(EN) · Christian Bartelt ·

    BIM-Edit:用于IFC建筑信息模型的语言模型基准测试

    Large language models (LLMs) are increasingly applied to computer-aided design (CAD) to generate design artifacts from textual instructions. In engineering practice, this requires more than creating new geometry, models must also understand existing scenes, edit them correctly, a…