[Paper] How Good is Post-Hoc Watermarking With Language Model Rephrasing?

Published: 1 month ago (December 18, 2025 at 01:57 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.16904v1

Overview

The paper investigates post‑hoc watermarking, a technique that lets a language model rewrite already‑written text while embedding a hidden statistical signal (a “watermark”). This approach could help protect copyrighted material, flag AI‑generated content used in training pipelines, or detect the presence of watermarked text in retrieval‑augmented generation (RAG) systems. By moving the watermarking step from generation time to a re‑phrasing stage, the authors explore new levers—larger re‑writer models, beam search, multi‑candidate generation, and entropy‑based filtering—that can improve the balance between text quality and watermark detectability.

Key Contributions

Introduces post‑hoc watermarking as a practical alternative to generation‑time watermarking for existing documents.
Systematically evaluates how compute allocation (model size, beam width, candidate count, detection‑time filtering) influences the quality‑detectability trade‑off.
Shows that simple Gumbel‑max sampling outperforms more sophisticated watermarking schemes under nucleus sampling.
Demonstrates strong detectability and semantic fidelity on long‑form, open‑ended text (e.g., books).
Reveals a surprising limitation: for highly verifiable text like source code, smaller re‑writer models actually watermark more reliably than larger ones.
Provides a set of practical recipes (beam search + entropy filtering, multi‑candidate voting) that can be adopted by developers today.

Methodology

Baseline Generation‑Time Watermark – The authors start from a standard watermark that biases token selection during generation (e.g., “green‑list” vs. “red‑list” tokens).
Post‑Hoc Re‑writing Pipeline – An LLM (the re‑writer) receives an existing passage and is instructed to paraphrase it while applying the same watermarking logic internally.
Compute‑Allocation Strategies
- Model Size: Experiments with 0.7B‑to‑13B parameter models.
- Beam Search: Varying beam widths (1, 4, 8) to explore diverse yet high‑probability rewrites.
- Multi‑Candidate Generation: Produce several paraphrases per input and select the one with the strongest watermark signal.
- Entropy Filtering at Detection: At detection time, discard low‑entropy (high‑certainty) tokens that could dilute the watermark’s statistical signature.
Evaluation Metrics
- Detectability: Measured by the “radioactivity” score (how strongly the watermark can be recovered).
- Semantic Fidelity: Assessed with BLEU, ROUGE, and human judgments on meaning preservation.
- Domain Split: Separate test sets for open‑ended prose (books) and highly verifiable code snippets.

Results & Findings

Setting	Detectability (↑)	Semantic Fidelity (↑)	Notable Observation
Gumbel‑max + nucleus sampling	★★★★★	★★★★☆	Outperforms newer schemes despite its simplicity.
Beam search (beam = 8)	+15% radioactivity vs. greedy	+8% ROUGE	Beam search consistently boosts both signal and quality.
Multi‑candidate voting (k = 5)	+10% radioactivity	–2% BLEU (minor meaning drift)	Trade‑off: stronger watermark at slight fidelity loss.
Entropy filtering (threshold = 0.7)	+12% detection recall	No measurable fidelity loss	Effective “noise‑reduction” at detection time.
Code domain	Larger models (≥6B) ↓ detectability	Smaller models (≤1B) ↑ detectability	Counter‑intuitive: over‑parameterized rewrites introduce too much variance, breaking the watermark.

Overall, the best‑performing recipe for prose was Gumbel‑max + beam = 8 + entropy filtering, achieving >90% detection recall while keeping BLEU >0.85 relative to the original text.

Practical Implications

Copyright Protection: Publishers can run a lightweight re‑writer on their manuscripts before distribution, embedding a hidden tag that survives downstream transformations (e.g., OCR, summarization).
Training‑Data Auditing: Companies can scan large corpora for “watermark radioactivity” to flag content that may have been derived from protected sources, helping enforce data‑use policies.
RAG Safeguards: Retrieval‑augmented pipelines can discard or down‑weight documents that carry a strong watermark, reducing the risk of unintentionally leaking proprietary text into generated answers.
Tooling Integration: The study’s recipes are compatible with existing open‑source LLM stacks (e.g., Hugging Face Transformers). Implementing beam search and entropy filtering adds negligible latency compared to a single forward pass.
Code‑Specific Use Cases: For source‑code repositories, a smaller re‑writer (≈1B parameters) should be used to retain watermark detectability, suggesting a “dual‑model” strategy—large model for prose, small model for code.

Limitations & Future Work

Domain Sensitivity: The approach struggles with highly deterministic text (e.g., code, legal clauses) where even minor paraphrasing can break functional correctness.
Adversarial Removal: An attacker could apply aggressive paraphrasing or back‑translation to dilute the watermark; robustness against such attacks remains an open question.
Scalability: While beam search improves results, it multiplies compute cost; real‑time services may need to balance latency vs. watermark strength.
Evaluation Scope: Experiments were limited to English prose and Python code; multilingual and cross‑language scenarios need exploration.

Bottom line: Post‑hoc watermarking opens a practical pathway for embedding traceable signals into existing text, offering developers a new lever to protect IP and monitor data usage—provided they respect the method’s current constraints and continue to monitor emerging research on robustness and scalability.

Authors

Pierre Fernandez
Tom Sander
Hady Elsahar
Hongyan Chang
Tomáš Souček
Valeriu Lacatusu
Tuan Tran
Sylvestre‑Alvise Rebuffi
Alexandre Mourachko

Paper Information

arXiv ID: 2512.16904v1
Categories: cs.CR, cs.CL
Published: December 18, 2025
PDF: Download PDF

[Paper] How Good is Post-Hoc Watermarking With Language Model Rephrasing?

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] When Reasoning Meets Its Laws

[Paper] ShareChat: A Dataset of Chatbot Conversations in the Wild

[Paper] DEER: A Comprehensive and Reliable Benchmark for Deep-Research Expert Reports

[Paper] Bangla MedER: Multi-BERT Ensemble Approach for the Recognition of Bangla Medical Entity