Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 7h

RippleBench: Capturing Ripple Effects Using Existing Knowledge Repositories

Researchers have developed RippleBench-Maker, an automated pipeline designed to identify and quantify the ripple effects of targeted interventions on language models. This system uses existing knowledge repositories, like Wikipedia, to generate questions at varying semantic distances from a source concept. When applied to eight different unlearning methods on models such as Llama3-8B-Instruct, the system revealed that accuracy drops are largest near the target concept and decrease with semantic distance. Notably, the propagation profiles of these ripple effects were found to be consistent across different base models, suggesting they are a property of the unlearning method itself. AI

IMPACT Provides a standardized method to measure and compare the unintended consequences of AI model modifications, crucial for safety and reliability.

Hugging Face
Wikipedia
Mistral-7B
Amazon Mechanical Turk
Yi-34B
Llama3-8B-Instruct
Roy Rinberg
RippleBench
RippleBench-Maker
WikiRAG
RippleBench-WMDP-Bio
Zephyr-7B