A new study assessed the reliability of Large Language Models (LLMs) generating code for construction safety, a practice termed "vibe coding." The research found that while LLMs can produce syntactically correct code, they often introduce silent failures due to flawed mathematical logic and a lack of defensive programming. Across tested models like Claude 3.5 Haiku, GPT-4o-Mini, and Gemini 2.5 Flash, a significant portion of generated code exhibited logic deficits, with GPT-4o-Mini producing inaccurate outputs in over half of its functional code. AI
IMPACT Current LLMs lack the deterministic rigor for standalone safety engineering in construction, necessitating AI wrappers and governance.
RANK_REASON Academic paper assessing LLM-generated code reliability.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →