Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 5h

Cross-Model Disagreement as a Label-Free Correctness Signal

Researchers have introduced a novel method for detecting errors in language models without needing ground truth labels. This new approach, termed cross-model disagreement, utilizes a secondary model to assess the generating model's output. Specifically, Cross-Model Perplexity (CMP) and Cross-Model Entropy (CME) measure the verifying model's surprise or uncertainty regarding the generated answer tokens. These methods have demonstrated superior performance over existing within-model uncertainty baselines on benchmarks like MMLU, TriviaQA, and GSM8K, offering a practical solution for monitoring and improving the safety of deployed language models. AI

IMPACT Offers a practical, label-free method for detecting AI errors, improving safety and oversight in deployed language models.

Hugging Face
arXiv
Massive Multitask Language Understanding
GSM8K
TriviaQA
Cross-Model Perplexity
Cross-Model Entropy
Matt Gorbett