New LLM jailbreak attacks and defenses emerge in research papers

By PulseAugur Editorial · [3 sources] · 2026-06-02 04:00

Researchers have developed new methods to address vulnerabilities in large language models (LLMs). One approach, "Persona Attack," exploits conversational memory to bypass safety protocols, achieving a 95% success rate in some configurations. In response, another framework called THRD has been introduced, which uses a training-free method to detect and mitigate multi-turn jailbreak attacks by analyzing temporal risk accumulation, reducing attack success rates to as low as 0.2% while minimally impacting model utility. Additionally, a study benchmarks LLMs for cryptanalysis, revealing their potential and limitations in security contexts and raising concerns about their susceptibility to certain attacks. AI

IMPACT New research highlights evolving LLM vulnerabilities and the development of novel defense mechanisms, crucial for maintaining AI safety and security.

RANK_REASON Multiple research papers detailing new LLM vulnerabilities and defenses.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

COVERAGE [3]

arXiv cs.AI TIER_1 English(EN) · Junyoung Park, Seongyong Ju, Sunghwan Park, Jaewoo Lee · 2026-06-02 04:00

Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models

arXiv:2606.00150v1 Announce Type: cross Abstract: As Large Language Models evolve for user convenience, vulnerability to jailbreak attacks continues to be reported despite ongoing efforts in safety training. Traditional jailbreak techniques typically focus on a single prompt inje…
arXiv cs.AI TIER_1 English(EN) · Zhiqing Ma, Zhonghao Xu, Dong Yu, Chen Kang, Changliang Li, Pengyuan Liu · 2026-06-02 04:00

THRD: A Training-Free Multi-Turn Defense Framework for Jailbreak Attacks on Large Language Models

arXiv:2606.01738v1 Announce Type: cross Abstract: Multi-turn jailbreak attacks pose a growing threat to LLMs by exploiting conversational dynamics such as gradual escalation and cross-turn coordination. Existing defenses either rely on costly retraining -- often degrading model u…
arXiv cs.CL TIER_1 English(EN) · Utsav Maskey, Chencheng Zhu, Usman Naseem · 2026-06-02 04:00

Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities

arXiv:2505.24621v3 Announce Type: replace Abstract: Recent advancements in large language models (LLMs) have transformed natural language understanding and generation, leading to extensive benchmarking across diverse tasks. However, cryptanalysis - a critical area for data securi…

COVERAGE [3]

Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models

THRD: A Training-Free Multi-Turn Defense Framework for Jailbreak Attacks on Large Language Models

Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities

RELATED ENTITIES

RELATED TOPICS