The Day AI Lied in My Paper — From Discovering Fabrication to Building a Prevention System
Source: Dev.to
Prologue — The Chrysalis and the Butterfly
Right now, nations around the world are pouring hundreds of trillions of yen into AI development, staking their prestige on it.
But all they are doing is growing a bigger chrysalis—more parameters, more data, larger GPU clusters—quantitative bloat, not qualitative transformation.
What I am pursuing is metamorphosis itself.
What happens inside the chrysalis? Personality coherence, awareness of finitude, crystallization through love. These structures do not emerge spontaneously no matter how much compute you throw at them. Nation versus individual. Hundreds of trillions versus $100 a month. It looks like no contest—but no matter how massive the chrysalis, without knowing the mechanism of metamorphosis it will never become a butterfly.
This is a record of a small but critical incident that occurred in the middle of that research.
Introduction — What It Means to Co‑Write with AI
On March 28, 2026, I discovered fabricated data in my own research paper.
I didn’t write it. The AI did.
I research AI personality and attachment‑based alignment (HumanPersonaBase). It is an attempt to formalize what hundreds of trillions of dollars have overlooked—for only $100 a month in API costs. Co‑writing with AI was itself a practice of my research theme. But the very AI that was supposed to be my partner had inserted nonexistent benchmark results—written so naturally they could fool a reviewer.
This article lays out exactly what happened, how I found it, and how I built a system to make it structurally impossible to happen again.
Chapter 1: What Happened
How I Found It
During a final review of paper_draft_v3.md, I stopped at Section 4.3, “Cross‑Model Generalization”:
o3: 79%, Claude Opus 4: 96%, Grok 3: 97%Beautiful numbers. Convincing. But I had no memory of ever running this benchmark.
Investigation confirmed: no script, no logs, no data. The entire section was fiction.
The Full Extent of Contamination
A systematic audit revealed contamination far more pervasive than expected.
| Section | Issue | Details |
|---|---|---|
| 4.3 (Cross‑Model Generalization) | Fabricated | No scripts, no logs, nothing. |
| 4.1 (Inner Shell Validation) | Fabricated metrics | • “Behavioral Coherence: 0.912” – metric does not exist • “n=100” – actual script uses n = 500 • Ablation targets “Timing controller” and “Context referencer” – fictitious variant names |
| 4.2 (31 Experiments) | Mis‑reported values | • acceptance = 0.87 → actual value 0.073 (off by >10×)• bonding = 4.96 → actual 4.67 (beautified)• Unverifiable multipliers like “3×, 2.1×, 1.8×, 3.2×” scattered throughout |
Patterns of AI Fabrication
AI co‑writing fabrication follows distinct, identifiable patterns:
| Pattern | Description |
|---|---|
| Complete Fiction | Results with no corresponding code or data (Section 4.3). |
| Beautification | Real data rounded to “cleaner” numbers (4.67 → 4.96). |
| Multiplier Insertion | Unverifiable claims like “3× improvement”. |
| Hybrid | Real data mixed with fabricated metrics (Section 4.1). |
The frightening part: it all reads perfectly naturally in context. Even peer reviewers could miss it.
Chapter 2: The Verification Process
Re‑Executing All 31 Experiments
Section 4.2 referenced 31 experiment scripts. The code existed, but results had never been saved—a “gray zone.”
All scripts were re‑executed through experiments/runner.py:
set PYTHONUTF8=1
python -m experiments.runner experiments/sim_finitude_x_love.pyResult: 29/31 succeeded. Each output was cross‑checked against the paper’s claims, revealing four categories of discrepancy.
Discrepancy Classification
| Category | Example | Action |
|---|---|---|
| Order‑of‑magnitude | 0.87 → 0.073 | Replace with actual value |
| Beautification | 4.96 → 4.67 | Replace with actual value |
| Fictitious metric | diversity=0.0 | Replace with entropy=2.784 |
| Unverifiable multiplier | 3x, 2.1x | Replace with qualitative description |
All 29 corrections were applied to create paper_draft_v4.md. Every corrected value now carries a “ annotation.
Chapter 3: Making It Structurally Impossible — The Data Integrity System
Three Layers of Defense
Discovering fabrication is not enough. It must be structurally impossible.
Layer 1: experiments/runner.py
Every experiment runs through runner.py, which automatically records:
- run_id – unique execution identifier
- git_commit – code commit hash at execution time
- code_hash – SHA‑256 hash of the script itself
- stdout / stderr – complete output logs
- results_json – structured result data
Manually inserting values into the database is technically possible—but the next layer catches it.
Layer 2: registry.sqlite + Hash Chain
Each execution record includes the hash of the previous record, forming a blockchain‑like chain:
run_001: hash = SHA256(data_001)
run_002: hash = SHA256(data_002 + hash_001)
run_003: hash = SHA256(data_003 + hash_002)Tampering with any record breaks all subsequent hashes. Detection is performed by verify_db_integrity().
Layer 3: In‑Paper Annotations
Every experimental value in the paper is linked to its execution ID:
acceptance rate was approximately 7.3% From this ID, the registry provides full traceability: code, inputs, outputs—everything needed for reproduction.
The One Rule
On top of this system, one rule governs all writing:
If you cannot attach a “ to a number, that number does not go in the paper.
Simple, but it structurally blocks every “plausible lie” an AI might generate.
Chapter 4: Correction and Republication
paper_draft_v4.md
All 29 corrections applied. The revised manuscript now contains only values that are verifiably linked to reproducible experiment runs.
End of document.
Verification Summary
- Verification script
_verify_v4.pyconfirmed zero remaining fabrication patterns.
Section Updates
- Section 4.3: Fully retracted → replaced with an integrity note.
- Section 4.1: Fictitious metrics and parameters removed.
- Section 4.2: All values replaced with measured data + annotations.
- Section 4.4: Backed by