English(EN) Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

新大语言模型水印技术面临逃避和鲁棒性挑战

作者 PulseAugur 编辑部 · [8 个来源] · 2026-05-22 02:51

研究人员开发了几种新方法来解决大语言模型（LLM）水印技术的漏洞。一种方法SeedHijack针对伪随机数生成器（PRNG），在不知道密钥或模型logits的情况下操纵水印；另一种方法Bias-Inversion Rewriting Attack（BIRA）则使用负logits偏差来逃避检测。PASA和SAFESEAL等新的水印算法旨在抵抗语义不变攻击并实现最小失真，其中SAFESEAL保留命名实体并使用上下文感知的同义词。ArcMark专注于在不扭曲LLM的下一个词分布的情况下嵌入多字节信息，而TextSeal提供本地化检测和对蒸馏的鲁棒性。 AI

影响新研究突显了大语言模型水印的漏洞，并引入了更鲁棒的检测和逃避技术，影响知识产权保护和内容归属。

排序理由多篇发表在arXiv上的研究论文详细介绍了新的大语言模型水印方法以及针对现有方法的攻击。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 8 个来源。我们如何撰写摘要 →

报道来源 [8]

arXiv cs.AI TIER_1 English(EN) · Ziyang You, Huilong He, Xiaoke Yang, Xuxing Lu · 2026-05-28 04:00

盲目伪随机数生成器劫持：一种针对大语言模型水印的不可检测的完整性保持攻击

arXiv:2605.28632v1 Announce Type: cross Abstract: Cryptographic watermarking is a leading defense for attributing text generated by large language models (LLMs). Existing schemes, including KGW, Unigram, and DipMark, derive their security guarantees from the assumption that the u…
arXiv cs.AI TIER_1 English(EN) · Jeongyeon Hwang, Sangdon Park, Jungseul Ok · 2026-05-28 04:00

LLM 水印规避 via 偏见反演

arXiv:2509.23019v5 Announce Type: replace-cross Abstract: Watermarking offers a promising solution for detecting LLM-generated content, yet its robustness under realistic query-free (black-box) evasion remains an open challenge. Existing query-free attacks often achieve limited s…
arXiv cs.AI TIER_1 English(EN) · Xuxing Lu · 2026-05-27 15:39

盲目伪随机数生成器劫持：一种针对大语言模型水印的不可检测的完整性保持攻击

Cryptographic watermarking is a leading defense for attributing text generated by large language models (LLMs). Existing schemes, including KGW, Unigram, and DipMark, derive their security guarantees from the assumption that the underlying pseudo-random number generator (PRNG) is…
arXiv cs.AI TIER_1 English(EN) · Zhenxin Ai, Haiyun He · 2026-05-26 04:00

PASA：一种面向语义不变攻击的LLM生成文本的原则性嵌入空间水印方法

arXiv:2605.10977v2 Announce Type: replace-cross Abstract: Watermarking for large language models (LLMs) is a promising approach for detecting LLM-generated text and enabling responsible deployment. However, existing watermarking methods are often vulnerable to semantic-invariant …
arXiv cs.CL TIER_1 English(EN) · Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin · 2026-05-25 04:00

具有最小语义失真的鲁棒LLM水印技术，用于知识产权保护

arXiv:2605.23175v1 Announce Type: cross Abstract: Proprietary large language models (LLMs) face risks of intellectual property (IP) violation, as adversaries can replicate an LLM by collecting input-output pairs to train a surrogate model, causing financial setbacks. Watermarks o…
arXiv cs.AI TIER_1 English(EN) · Atefeh Gilani, Sajani Vithana, Carol Xuan Long, Oliver Kosut, Lalitha Sankar, Flavio P. Calmon · 2026-05-25 04:00

ArcMark：无失真多字节大模型水印通过最优传输

arXiv:2602.07235v2 Announce Type: replace-cross Abstract: Watermarking is an important tool for promoting the responsible use of large language models (LLMs). Existing watermarks insert a signal into generated tokens that either flags LLM-generated text (zero-bit watermarking) or…
arXiv cs.CL TIER_1 English(EN) · Tom Sander, Hongyan Chang, Tom\'a\v{s} Sou\v{c}ek, Tuan Tran, Valeriu Lacatusu, Sylvestre-Alvise Rebuffi, Alexandre Mourachko, Surya Parimi, Christophe Ropers, Rashel Moritz, Vanessa Stark, Hady Elsahar, Pierre Fernandez · 2026-05-22 04:00

TextSeal：一种用于溯源和蒸馏保护的本地化 LLM 水印

arXiv:2605.12456v2 Announce Type: replace-cross Abstract: We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and m…
arXiv cs.CL TIER_1 English(EN) · Ruoming Jin · 2026-05-22 02:51

具有最小语义失真的鲁棒LLM水印技术用于IP保护

Proprietary large language models (LLMs) face risks of intellectual property (IP) violation, as adversaries can replicate an LLM by collecting input-output pairs to train a surrogate model, causing financial setbacks. Watermarks offer a promising defense to verify ownership, but …

报道来源 [8]

相关实体

相关话题