PulseAugur
EN
LIVE 12:53:07

Jane Street LLM Backdoor Challenge Reveals Model Vulnerabilities

A challenge hosted by Jane Street to find hidden backdoors in large language models has revealed insights into model vulnerabilities. The author successfully identified some backdoors using white-box methods after initial attempts with activation and prompting approaches proved unsuccessful. The challenge involved four models, including a fine-tuned Qwen2.5-7B-Instruct and three large DeepSeek-V3 Mixture-of-Experts models, with access to the larger models provided via an API. AI

IMPACT Highlights potential security risks in LLMs and the ongoing research into detecting and mitigating such vulnerabilities.

RANK_REASON The item details a challenge focused on identifying vulnerabilities (backdoors) in LLMs, which falls under AI safety research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Alignment Forum →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Jane Street LLM Backdoor Challenge Reveals Model Vulnerabilities

COVERAGE [1]

  1. Alignment Forum TIER_1 English(EN) · Cipolla ·

    Looking for backdoors in Jane Street LLMs

    <p><i><span>I am going to talk about my experience in the Jane Street LLM backdoor challenge. I am sharing partial results. I managed to crack some of the models using white-box methods, after the activation/prompting approach didn't pan out. Happy to discuss better or more promi…