New P4IR framework uses RL to boost LLM accuracy in code compliance systems

By PulseAugur Editorial · [1 sources] · 2026-06-21 09:17

Researchers have developed P4IR, a novel two-stage framework designed to enhance the accuracy of large language models (LLMs) in generating automated code compliance (ACC) systems for building regulations. The framework first employs supervised fine-tuning (SFT) to imbue LLMs with domain-specific knowledge, followed by Group Relative Policy Optimization (GRPO) to refine the generated code skeletons. This approach demonstrated significant improvements, reducing tree edit distance by up to 23.8% and token-level Levenshtein distance by 38.6% compared to SFT-only baselines, while also showing a reduction in false positives. AI

IMPACT This research offers a method to improve the reliability and accuracy of LLM-generated code compliance systems, potentially reducing errors in automated regulatory checks.

RANK_REASON The cluster contains a research paper detailing a new framework for improving LLM performance on a specific task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New P4IR framework uses RL to boost LLM accuracy in code compliance systems

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Justin K. W. Yeoh · 2026-06-21 09:17

Reinforcement learning to improve large language model-based automated code compliance systems

Large language model (LLM)-based approaches for automated code compliance (ACC) of building regulations are prone to generating incorrect and hallucinated computer-processable rules. This paper introduces P4IR, a two-stage framework that uses supervised fine-tuning (SFT) to insti…

COVERAGE [1]

Reinforcement learning to improve large language model-based automated code compliance systems

RELATED ENTITIES

RELATED TOPICS