New defense framework detects and unlearns data poisoning in text summarization models

By PulseAugur Editorial · [1 sources] · 2026-06-24 17:12

Researchers have developed a new framework to defend text summarization models against data poisoning attacks during the fine-tuning process. This framework can detect poisoned data by analyzing training influence and semantic consistency, and can remediate affected models. The defenses achieve high detection precision and can restore model behavior with minimal loss in utility, even under adaptive attacks. AI

IMPACT This research offers a crucial defense mechanism against data poisoning, enhancing the reliability and trustworthiness of AI summarization tools.

RANK_REASON The cluster contains an academic paper detailing a new method for defending AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New defense framework detects and unlearns data poisoning in text summarization models

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Shirin Nilizadeh · 2026-06-24 17:12

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model behavior. In this setting, adversaries manipulate f…

COVERAGE [1]

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

RELATED ENTITIES

RELATED TOPICS