Researchers have explored the 'junking problem,' which involves finding naturally occurring token sequences within LLMs that can trigger harmful outputs without explicit adversarial prompts. This study formalizes the problem and uses a greedy random-search method to discover these 'natural backdoors.' While the problem is harder than traditional jailbreaking, the proposed strategy achieved a high success rate, indicating that these backdoors are present and easily recoverable. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Identifies a new class of LLM vulnerabilities that could impact safety and alignment research.
RANK_REASON Academic paper detailing a new method for identifying vulnerabilities in LLMs.