A new research paper explores the interaction between knowledge editing (KE) and fine-tuning in large language models (LLMs). The study reveals that fine-tuning an edited model typically causes significant decay in the applied edits, with some methods like AlphaEdit on GPT-J losing over 25% of their effectiveness. The research indicates that fine-tuning only the edited layers can remove these edits with minimal impact on overall performance, and surprisingly, fine-tuning non-edited layers results in greater edit decay. This work highlights the importance of evaluating knowledge editing techniques within the full LLM application pipeline to ensure edit persistence and address potential safety concerns. AI
IMPACT Highlights potential safety risks and reduced efficiency when fine-tuning LLMs that have undergone knowledge editing.
RANK_REASON Research paper published on arXiv detailing findings about LLM knowledge editing and fine-tuning. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →