[Paper] Towards Anytime-Valid Statistical Watermarking
Source: arXiv - 2602.17608v1
Overview
Large Language Models (LLMs) are now generating massive amounts of text, and distinguishing AI‑written content from human‑written prose is becoming a critical security and trust issue. The paper Towards Anytime-Valid Statistical Watermarking introduces a new statistical watermarking scheme that lets you detect AI‑generated text at any point during generation without sacrificing the rigor of hypothesis testing. By marrying optimal sampling with an “e‑value” based testing framework, the authors achieve faster, more reliable detection than prior methods.
Key Contributions
- Anchored E‑Watermarking framework: First watermarking method that uses e‑values (test supermartingales) to guarantee valid inference under optional/early stopping.
- Principled sampling distribution: Introduces an “anchor” distribution that approximates the target LLM, enabling optimal choice of the watermarking distribution.
- Optimality guarantees: Derives the e‑value that maximizes the worst‑case log‑growth rate and proves it yields the minimal expected stopping time for detection.
- Empirical validation: Shows a 13‑15 % reduction in the average token budget needed for reliable detection on standard LLM watermarking benchmarks.
- Unified theory: Bridges statistical hypothesis testing, sequential analysis, and watermark design into a single, coherent mathematical framework.
Methodology
- Watermark embedding: Tokens are sampled from a biased distribution that subtly favors a subset of the vocabulary (the “watermark”). The bias is calibrated using an anchor distribution that mimics the LLM’s true output probabilities.
- E‑value construction: For each generated token, the method computes an e‑value—essentially a likelihood ratio between the watermarked and anchor distributions. These e‑values multiply across tokens, forming a test supermartingale.
- Anytime‑valid detection: Because the product of e‑values is a supermartingale, the probability of ever crossing a detection threshold (and thus committing a Type‑I error) stays bounded, even if the observer stops early or checks repeatedly.
- Optimal stopping analysis: The authors analytically solve for the watermarking distribution that maximizes the expected log‑growth of the e‑value under the worst‑case (adversarial) model, yielding the smallest expected number of tokens needed before detection.
- Simulation & benchmark evaluation: Experiments compare the new scheme against existing fixed‑horizon watermarking methods on synthetic and real‑world LLM outputs (e.g., GPT‑2, LLaMA).
Results & Findings
- Sample efficiency: On average, the new method required 13‑15 % fewer tokens to reach a detection confidence of 95 % compared with the best prior watermarking baselines.
- Robust Type‑I error control: Even when the detector was allowed to stop at arbitrary times (or check after every token), the false‑positive rate stayed at the nominal 5 % level, confirming the anytime‑valid guarantee.
- Resilience to adversarial attacks: Because the anchor distribution approximates the true model, attempts to “wash out” the watermark by perturbing token probabilities had limited impact on detection power.
- Scalability: The computational overhead of computing e‑values is linear in the number of generated tokens and fits comfortably within typical inference pipelines.
Practical Implications
- Real‑time content moderation: Platforms can flag AI‑generated text on‑the‑fly, stopping the analysis as soon as enough evidence accumulates, which saves compute and reduces latency.
- Compliance & provenance: Organizations that need to certify human‑authored content (e.g., academic journals, legal documents) can embed watermarks that remain detectable even if the text is edited or truncated.
- Developer tooling: SDKs can expose a simple
detectWatermark(tokens)API that returns a confidence score at any point, making integration into existing LLM services trivial. - Cost reduction: By cutting the token budget needed for detection, cloud providers can lower the expense of running watermark checks alongside generation, especially for long‑form outputs.
- Security‑by‑design: The framework’s statistical guarantees make it suitable for regulated environments where false accusations (Type‑I errors) must be tightly controlled.
Limitations & Future Work
- Anchor distribution estimation: The method assumes access to a good approximation of the target LLM’s token distribution; poor anchors can degrade detection efficiency.
- Model‑specific tuning: Optimal watermark parameters (bias strength, subset size) still need calibration per model family, which may limit out‑of‑the‑box deployment.
- Adversarial robustness: While more resilient than prior schemes, sophisticated attackers who can query the model extensively might still learn to neutralize the watermark.
- Extension to multimodal generators: The paper focuses on text; applying e‑value based watermarking to image or audio generators remains an open challenge.
Overall, the Anchored E‑Watermarking framework offers a mathematically sound, practically efficient way to keep AI‑generated content in check, opening the door for safer, more transparent deployment of powerful LLMs.
Authors
- Baihe Huang
- Eric Xu
- Kannan Ramchandran
- Jiantao Jiao
- Michael I. Jordan
Paper Information
- arXiv ID: 2602.17608v1
- Categories: cs.LG, cs.AI, stat.ML
- Published: February 19, 2026
- PDF: Download PDF