Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 12h

FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification

Researchers have introduced FineDialFact, a new benchmark designed for fine-grained fact verification in dialogue systems. This benchmark addresses the limitations of existing methods that use coarse-grained labels by focusing on verifying individual atomic facts within dialogue responses. The dataset, constructed from publicly available dialogue data, was evaluated using baseline methods, which showed that Chain-of-Thought reasoning can improve performance. However, the best F1-score achieved was 0.74, indicating that dialogue fact verification remains a challenging area for future research. AI

IMPACT This benchmark aims to improve the factual accuracy of dialogue systems by enabling more granular verification of generated content.

Large language models
Chain-of-Thought
FineDialFact
HybriDialogue
Xiangyan Chen