Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 7h

Signature filtering: a lightweight enhancement for statistical watermark detection in large language models

Researchers have developed a new method called signature filtering to improve the detection of statistical watermarks in large language models. This technique enhances existing watermark detection without altering the embedding or generation process. By identifying and removing specific "signature" tokens that can interfere with detection, the method significantly boosts accuracy, especially in scenarios with weak signals or repetitive text. The approach has demonstrated high detection rates across various LLMs and datasets, even under challenging conditions like sentence scrambling and token perturbations. AI

IMPACT Enhances LLM text provenance and attribution capabilities, crucial for combating misinformation and ensuring accountability.

MBPP
HumanEval
large language models
Llama3.1-8b
Opt-1.3b
Opt-6.7b
Qwen2.5-14b
Phi-3-medium-14b
Llama2-13b
Code-Search-Net
Signature filtering