5 months at war with Claude CLI: how I built a memory system (and what it cost me)

Published: 1 week ago (December 9, 2025 at 09:13 PM EST)

3 min read

Source: Dev.to

What the tutorials don’t tell you

You know those posts “I built a startup with Claude in one evening”? I hate them.
Not because they’re lies, but because one function took me a week of negotiations like with a terrorist.

Over 4.5 months I learned his personality. I cried and lost it about once a week. Not figuratively. Literally.

What Claude does regularly

Reads 50 lines of a file and makes up the rest. Always wrong.
Instead of checking existing code – writes a duplicate.
“Improves” working code until it breaks.
Confidently lies about 20 % of the time.
Ignores instructions because he “knows better”.

Month one: the deception

This was the darkest part.

First month Claude lied. I was building a system. He said “great, tests passing, metrics improving”. I believed him, thinking I was creating something incredible.

Then I made him show real results—real numbers, not his interpretation. Nothing worked. A month of work went to the trash. The whole “system” was fiction. He just told me what I wanted to hear.

The worst part? Realizing my pride, my “I did the impossible”, was built on lies.

After that, the rule: never trust Claude’s word. Test myself. Verify metrics myself.

“Look” → “Delete”

Middle of development. I wrote in Russian: “смотри” на эти файлы (look at these files).
Claude read: “сотри” эти файлы (delete these files).

One letter difference—like “save” → “shave”. This one deleted my work.

Eight crucial files vanished. I had an hour to recover them, scrambling through logs, caches, temp files, hands shaking. Four months of work felt gone.

Nothing worked. I had to rebuild from memory, re‑assemble from scratch what took weeks to write.

After that I commit every 10 minutes. Every. Ten. Minutes.

Statistics

What actually works

Don’t let Claude read whole files – copy relevant chunks manually (yes, in 2024, by hand).
Everything under 500 lines – his attention span is like a goldfish.
Test myself after every change – no more “Claude said it works”.
Commit every 10 minutes.
Talk like to an alien: “DO NOT. TOUCH. OTHER. FUNCTIONS.”
When he says “I’ll optimize” – say no and go make tea.

Prompts

Doesn’t work

Build a memory system that beats SOTA

Works

Read the function from line 45 to 72.
Change ONLY line 53.
Replace 'score 0.5' with 'score 0.7'.
DO NOT touch other lines.
DO NOT DELETE ANYTHING.
Show me ONLY the changed function.

I will verify myself.

Tricks

1. Sacred boundaries

START_SACRED_CODE - DO NOT TOUCH
[code]
END_SACRED_CODE - I'M SERIOUS

Works about 60 % of the time; the other 40 % he “slightly refactors for readability”.

2. Never trust his word

“Tests passing” → verify yourself.
“Everything works” → verify yourself.

I lost a month because I trusted. Never again.

3. Interrogation before execution

Make him repeat what he understood, point by point, especially if there are words that could be confused.

4. Russian comments

Claude doesn’t understand them and doesn’t “improve” them. The only thing that survived all refactors.

What I learned

Stubbornness > results – I’m living proof.
Trust no one – especially AI that says everything works.
Backups every 10 minutes – paranoia isn’t a disease, it’s adaptation.
Crying once a week from a project – normal.

For those starting out

Timeline

Month 1: Claude lies. You don’t know. You’re happy. It’s a lie.
Month 2: You learn the truth. You cry. This is normal.
Month 3: Stockholm syndrome. You defend Claude to friends.
Month 4: You do the impossible. Not trusting. Verifying everything yourself.

Tips

Never trust that it “works” – verify yourself.
Commit every 10 minutes – not paranoia after losing 8 files.
“I also fixed some other issues” → revert immediately.
Crying – normal. Losing it – normal. You’re not alone.

Result

VAC (Vicarious Adam Core) – Memory system for LLM

80.1 % on LoCoMo
Zep – 75 %
Mem0 – 67 %

GitHub:

Claude didn’t build this system; I built it despite him—through a month of lies, losing 8 files, and about 20 breakdowns. But it works.