New benchmark assesses LLM safety risks from malicious knowledge edits

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed EditRisk-Bench, a new benchmark designed to evaluate the safety risks associated with malicious knowledge editing in large language models. This benchmark focuses on how injected misinformation or biased knowledge can corrupt downstream reasoning, unlike previous benchmarks that primarily assessed editing efficacy. Experiments on various LLMs demonstrate that malicious edits can reliably lead to incorrect or unsafe outputs while maintaining general capabilities, making these risks hard to detect. The study also highlights factors influencing these risks, such as the scale of edits and the complexity of reasoning tasks. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Provides a standardized method to test and mitigate safety vulnerabilities in LLMs related to knowledge editing.

RANK_REASON The cluster contains an academic paper introducing a new benchmark for evaluating LLM safety. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

COVERAGE [1]

arXiv cs.AI TIER_1 · Yuliang Chen · 2026-05-11 07:54

Benchmarking Safety Risks of Knowledge-Intensive Reasoning under Malicious Knowledge Editing

Large language models (LLMs) increasingly rely on knowledge editing to support knowledge-intensive reasoning, but this flexibility also introduces critical safety risks: adversaries can inject malicious or misleading knowledge that corrupts downstream reasoning and leads to harmf…

COVERAGE [1]

Benchmarking Safety Risks of Knowledge-Intensive Reasoning under Malicious Knowledge Editing

RELATED ENTITIES

RELATED TOPICS