New LLM Watermarking Techniques Face Evasion and Robustness Challenges

By PulseAugur Editorial · [8 sources] · 2026-05-22 02:51

Researchers have developed several new methods to address vulnerabilities in large language model (LLM) watermarking techniques. One approach, SeedHijack, targets the pseudo-random number generator (PRNG) to manipulate watermarks without knowledge of the key or model logits, while another, Bias-Inversion Rewriting Attack (BIRA), uses a negative logit bias to evade detection. New watermarking algorithms like PASA and SAFESEAL aim for robustness against semantic-invariant attacks and minimal distortion, with SAFESEAL preserving named entities and using context-aware synonyms. ArcMark focuses on embedding multiple bytes of information without distorting the LLM's next-token distribution, and TextSeal offers localized detection and robustness to distillation. AI

IMPACT New research highlights vulnerabilities in LLM watermarking and introduces more robust detection and evasion techniques, impacting IP protection and content attribution.

RANK_REASON Multiple research papers published on arXiv detailing new methods for LLM watermarking and attacks against existing methods.

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 8 sources. How we write summaries →

New LLM Watermarking Techniques Face Evasion and Robustness Challenges

COVERAGE [8]

arXiv cs.AI TIER_1 English(EN) · Ziyang You, Huilong He, Xiaoke Yang, Xuxing Lu · 2026-05-28 04:00

Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

arXiv:2605.28632v1 Announce Type: cross Abstract: Cryptographic watermarking is a leading defense for attributing text generated by large language models (LLMs). Existing schemes, including KGW, Unigram, and DipMark, derive their security guarantees from the assumption that the u…
arXiv cs.AI TIER_1 English(EN) · Jeongyeon Hwang, Sangdon Park, Jungseul Ok · 2026-05-28 04:00

LLM Watermark Evasion via Bias Inversion

arXiv:2509.23019v5 Announce Type: replace-cross Abstract: Watermarking offers a promising solution for detecting LLM-generated content, yet its robustness under realistic query-free (black-box) evasion remains an open challenge. Existing query-free attacks often achieve limited s…
arXiv cs.AI TIER_1 English(EN) · Xuxing Lu · 2026-05-27 15:39

Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

Cryptographic watermarking is a leading defense for attributing text generated by large language models (LLMs). Existing schemes, including KGW, Unigram, and DipMark, derive their security guarantees from the assumption that the underlying pseudo-random number generator (PRNG) is…
arXiv cs.AI TIER_1 English(EN) · Zhenxin Ai, Haiyun He · 2026-05-26 04:00

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

arXiv:2605.10977v2 Announce Type: replace-cross Abstract: Watermarking for large language models (LLMs) is a promising approach for detecting LLM-generated text and enabling responsible deployment. However, existing watermarking methods are often vulnerable to semantic-invariant …
arXiv cs.CL TIER_1 English(EN) · Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin · 2026-05-25 04:00

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

arXiv:2605.23175v1 Announce Type: cross Abstract: Proprietary large language models (LLMs) face risks of intellectual property (IP) violation, as adversaries can replicate an LLM by collecting input-output pairs to train a surrogate model, causing financial setbacks. Watermarks o…
arXiv cs.AI TIER_1 English(EN) · Atefeh Gilani, Sajani Vithana, Carol Xuan Long, Oliver Kosut, Lalitha Sankar, Flavio P. Calmon · 2026-05-25 04:00

ArcMark: Distortion-Free Multi-Byte LLM Watermark via Optimal Transport

arXiv:2602.07235v2 Announce Type: replace-cross Abstract: Watermarking is an important tool for promoting the responsible use of large language models (LLMs). Existing watermarks insert a signal into generated tokens that either flags LLM-generated text (zero-bit watermarking) or…
arXiv cs.CL TIER_1 English(EN) · Tom Sander, Hongyan Chang, Tom\'a\v{s} Sou\v{c}ek, Tuan Tran, Valeriu Lacatusu, Sylvestre-Alvise Rebuffi, Alexandre Mourachko, Surya Parimi, Christophe Ropers, Rashel Moritz, Vanessa Stark, Hady Elsahar, Pierre Fernandez · 2026-05-22 04:00

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

arXiv:2605.12456v2 Announce Type: replace-cross Abstract: We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and m…
arXiv cs.CL TIER_1 English(EN) · Ruoming Jin · 2026-05-22 02:51

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

Proprietary large language models (LLMs) face risks of intellectual property (IP) violation, as adversaries can replicate an LLM by collecting input-output pairs to train a surrogate model, causing financial setbacks. Watermarks offer a promising defense to verify ownership, but …

COVERAGE [8]

RELATED ENTITIES

RELATED TOPICS