PulseAugur
EN
LIVE 04:15:37

New benchmark reveals LLMs struggle with structured engineering design tasks

A new benchmark called BIM-Edit has been developed to evaluate the capabilities of large language models (LLMs) in editing Building Information Models (BIM) represented in the Industry Foundation Classes (IFC) format. The benchmark includes 324 editing tasks across 11 realistic and 36 synthetic building models, covering direct, spatial, and topological edits. Current LLMs show significant limitations, with the best-performing model achieving only a 49.5% score across geometric accuracy, semantic validity, and topological consistency, and failing to fully solve more than 3.4% of tasks. This highlights a substantial gap between LLM abilities and the demands of structured engineering design workflows. AI

IMPACT Highlights significant limitations in current LLMs for structured engineering design, indicating a need for further development in this domain.

RANK_REASON The item is a research paper introducing a new benchmark for evaluating LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New benchmark reveals LLMs struggle with structured engineering design tasks

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Christian Bartelt ·

    BIM-Edit: Benchmarking Large Language Models for IFC-Based Building Information Modeling

    Large language models (LLMs) are increasingly applied to computer-aided design (CAD) to generate design artifacts from textual instructions. In engineering practice, this requires more than creating new geometry, models must also understand existing scenes, edit them correctly, a…