PulseAugur
LIVE 15:46:06
tool · [1 source] ·

Jane Street LLM backdoor challenge reveals DeepSeek-V3 vulnerabilities

A participant in Jane Street's LLM backdoor challenge shared their experience attempting to uncover hidden triggers in fine-tuned models. Initially, prompting strategies proved unsuccessful in revealing the backdoors. The challenge involved both a smaller, locally runnable Qwen2.5-7B-Instruct model and larger DeepSeek-V3 Mixture-of-Experts models accessed via API, with the latter proving particularly difficult to analyze. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Details a novel approach to identifying vulnerabilities in large language models, potentially informing future AI security research.

RANK_REASON Participant's technical write-up of a challenge involving LLM backdoors. [lever_c_demoted from research: ic=1 ai=1.0]

Read on LessWrong (AI tag) →

Jane Street LLM backdoor challenge reveals DeepSeek-V3 vulnerabilities

COVERAGE [1]

  1. LessWrong (AI tag) TIER_1 · Cipolla ·

    Looking for backdoors in Jane Street LLMs

    <p><i><span>I am going to talk about my experience in the Jane Street LLM backdoor challenge. I am sharing partial results. I managed to crack some of the models using white-box methods, after the activation/prompting approach didn't pan out. Happy to discuss better or more promi…