Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 6h

Hybrid Adversarial Defence for Natural Language Understanding Tasks

Researchers have developed a novel hybrid defense framework to combat both hallucinations and adversarial manipulation in Large Language Models (LLMs). This approach integrates entropy-based models, designed to reduce hallucinations, with uncertainty-based and geometric-based models that enhance adversarial robustness. Testing on various Natural Language Understanding datasets demonstrated significant improvements in both clean-task accuracy and resistance to attacks, outperforming existing single-feature defense strategies. AI

IMPACT Enhances LLM reliability by combining defenses against hallucination and adversarial attacks, improving performance on diverse tasks.

Large Language Models
HotpotQA
FEVER
AdvBench
CSQA
SIQA
AeroEngQA
CPIQA