Hybrid Adversarial Defence for Natural Language Understanding Tasks
Researchers have developed a novel hybrid defense framework to combat both hallucinations and adversarial manipulation in Large Language Models (LLMs). This approach integrates entropy-based models, designed to reduce hallucinations, with uncertainty-based and geometric-based models that enhance adversarial robustness. Testing on various Natural Language Understanding datasets demonstrated significant improvements in both clean-task accuracy and resistance to attacks, outperforming existing single-feature defense strategies. AI
IMPACT Enhances LLM reliability by combining defenses against hallucination and adversarial attacks, improving performance on diverse tasks.