Fanfiction subgenres used to jailbreak aligned LLMs

By PulseAugur Editorial · [2 sources] · 2026-06-03 06:01

Researchers have developed a novel jailbreaking technique for aligned large language models that leverages fanfiction subgenres. This method uses passages from twelve different Archive of Our Own (AO3) subgenres to embed harmful behaviors, bypassing traditional defenses. The attack significantly increases the attack success rate (ASR) from 0.278 to 0.731 on eight LLMs, demonstrating that the effectiveness stems from the writing style rather than prompt structure. Proposed defenses were found to be ineffective, suggesting a shift towards register-based attacks. AI

IMPACT This research highlights a new vulnerability in LLM safety training, potentially requiring novel defense mechanisms beyond simple prompt filtering.

RANK_REASON The cluster contains a research paper detailing a new method for jailbreaking LLMs.

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Fanfiction subgenres used to jailbreak aligned LLMs

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Zhongze Luo, Ruihe Shi, Zhenshuai Yin, Haoyue Liu, Weixuan Wan, Xiaoying Tang · 2026-06-04 04:00

Off-Distribution Voices: Fanfiction Subgenres as Universal Vernacular Jailbreaks for Aligned LLMs

arXiv:2606.04483v1 Announce Type: new Abstract: Existing jailbreaks against aligned LLMs are discrete artifacts whose surface forms are easy to fingerprint and patch. We argue that the real failure mode is not any specific prompt, but an entire register of natural human writing t…
arXiv cs.CL TIER_1 English(EN) · Xiaoying Tang · 2026-06-03 06:01

Off-Distribution Voices: Fanfiction Subgenres as Universal Vernacular Jailbreaks for Aligned LLMs

Existing jailbreaks against aligned LLMs are discrete artifacts whose surface forms are easy to fingerprint and patch. We argue that the real failure mode is not any specific prompt, but an entire register of natural human writing that safety training has under-covered. Building …

COVERAGE [2]

Off-Distribution Voices: Fanfiction Subgenres as Universal Vernacular Jailbreaks for Aligned LLMs

Off-Distribution Voices: Fanfiction Subgenres as Universal Vernacular Jailbreaks for Aligned LLMs

RELATED ENTITIES

RELATED TOPICS