Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

HalluJudge: A Reference-Free Hallucination Detection for Context Misalignment in Code Review Automation

Researchers have developed HalluJudge, a novel system designed to detect hallucinations in AI-generated code review comments without requiring reference code. HalluJudge employs four strategies, including structured multi-branch reasoning, to assess the alignment of review comments with the provided context. Evaluations on Atlassian's software projects indicate that HalluJudge is cost-effective, achieving an F1 score of 0.85 with an average cost of $0.009 per assessment. The system's judgments align with developer preferences in 67% of real-world production scenarios, offering a practical safeguard against inaccurate AI-generated feedback. AI

IMPACT Introduces a practical method to improve trust and reduce errors in AI-assisted code reviews.

large-language models
Atlassian
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
HalluJudge
Chakkrit Tantithamthavorn