Researchers have developed a new strategy called Augmented Model Manipulation (AugMP) to attack federated fine-tuning (FFT) of large language models (LLMs). This method uses graph representation learning to identify correlations in legitimate LLM updates, which then guides the creation of malicious updates. An iterative algorithm optimizes these malicious updates to embed adversarial objectives while appearing similar to benign updates, making them difficult to detect. Experiments show AugMP can significantly degrade global LLM accuracy and local agent performance while evading standard defense mechanisms. AI
影响 Introduces a novel attack vector that could compromise the integrity of LLMs trained via federated learning.
排序理由 Academic paper detailing a novel attack method on LLMs. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →