HalluJudge: A Reference-Free Hallucination Detection for Context Misalignment in Code Review Automation
Researchers have developed HalluJudge, a novel system designed to detect hallucinations in AI-generated code review comments without requiring reference code. HalluJudge employs four strategies, including structured multi-branch reasoning, to assess the alignment of review comments with the provided context. Evaluations on Atlassian's software projects indicate that HalluJudge is cost-effective, achieving an F1 score of 0.85 with an average cost of $0.009 per assessment. The system's judgments align with developer preferences in 67% of real-world production scenarios, offering a practical safeguard against inaccurate AI-generated feedback. AI
IMPACT Introduces a practical method to improve trust and reduce errors in AI-assisted code reviews.