A participant in Jane Street's LLM backdoor challenge shared their experience attempting to uncover hidden triggers in fine-tuned models. Initially, prompting strategies proved unsuccessful in revealing the backdoors. The challenge involved both a smaller, locally runnable Qwen2.5-7B-Instruct model and larger DeepSeek-V3 Mixture-of-Experts models accessed via API, with the latter proving particularly difficult to analyze. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Details a novel approach to identifying vulnerabilities in large language models, potentially informing future AI security research.
RANK_REASON Participant's technical write-up of a challenge involving LLM backdoors. [lever_c_demoted from research: ic=1 ai=1.0]