RogueMerge: Robust and Unified Attacks against LLM Model Merging
Researchers have developed RogueMerge, a new framework designed to exploit vulnerabilities in Large Language Model (LLM) merging. This method addresses challenges posed by autoregressive decoding, unknown merging configurations, and the need for generalization across various attack prompts. RogueMerge consistently outperforms existing attacks and remains stable across different merging settings, while also resisting standard defenses. AI
IMPACT This research highlights significant security risks in LLM model merging, potentially impacting the safe deployment of composite AI systems.