Kimi Killed 4 of Claude's Best Ideas — An AI Peer Review in Practice
Source: Dev.to
Content Strategy Review: Claude vs. Kimi
I had Claude (Opus 4.6) build a content strategy: six title rewrites and five new article themes, all data‑backed and logically airtight. I then handed the proposals to Kimi K2.5. Four of the six titles were flagged, and one was outright rejected with the comment “this would backfire, don’t do it.”
When one AI critiques another AI’s proposals, the scope of consideration widens. Perspectives that Claude alone would never have surfaced appeared, and assumptions I had unconsciously agreed with became visible.
My Portfolio Snapshot
- Total articles: 21 (18 published, 3 drafts)
- Published‑title pattern: 11 / 18 (≈ 61 %) follow a descriptive format – “The story of how I did X” or “A record of doing Y”.
Zenn’s weekly trending articles, however, are dominated by “comprehensive guide” and “checklist” formats. My portfolio had zero articles in either pattern. This wasn’t a deliberate choice – I had simply fallen into the “story of how I …” habit without realizing it.
Claude’s Trend Analysis
Claude collected trend data via the Zenn API and web searches, then classified buzz‑worthy titles into nine patterns based on all‑time top‑10 and weekly trending articles.
| # | Pattern | Example |
|---|---|---|
| 1 | Provocative / Declarative | “The real value of X isn’t Y” |
| 2 | Comprehensive | “Complete guide to X”, “Top N picks” |
| 3 | Checklist | “Things to check before doing X” |
| 4 | Numeric | “It got 9× slower”, “In 0 lines” |
| 5 | Hypothetical / Result | “I tried X and Y happened” |
| 6 | Behind‑the‑scenes | “The inside story of X”, “The full picture” |
| 7 | Flow‑tracking | “A month‑long record of doing X” |
| 8 | OSS Release | “I built X and open‑sourced it” |
| 9 | Tacit Knowledge | “What senior engineers do unconsciously” |
Claude mapped this taxonomy against my existing articles, identified the gaps, and produced six title‑rewrite proposals plus a set of title‑design rules. At this point Claude’s output was internally consistent and well‑supported by data; I felt no discomfort with the proposals.
The Limitation of a Solo Model
Claude tends to favor data that supports its own analysis and does not automatically seek out perspectives that would undermine its hypotheses. Because the proposals were backed by solid data, they felt “plausible” to me as well.
Introducing Kimi K2.5
To obtain a different viewpoint, I brought in Kimi K2.5 (Mixture‑of‑Experts architecture, 1 trillion parameters). I already had Kimi set up as a CLI tool (setup details are in a previous article).
Use case: peer review (not implementation delegation).
Prompt Structure
Input 1: Full text of 7 existing articles by the author (A‑rank quality)
Input 2: Full text of Claude's analysis results and proposals
Instruction: Review from 4 perspectives — strategist, editor, reader advocate, and marketer
Kimi’s Agent Swarm architecture decomposes tasks and distributes them to up to 100 sub‑agents. I explicitly demanded critique from four perspectives. The output was ~350 lines (≈ 17 KB), with each perspective returning specific criticisms and alternative suggestions.
Kimi’s Verdict on Claude’s Title Proposals
| Claude’s Proposal | Kimi’s Verdict | Kimi’s Reasoning (Summary) |
|---|---|---|
| “最強モデルで司令塔を組んだら9倍遅くなった” (Built an orchestrator with the strongest model; it got 9× slower) | ⚠️ Revise | The lesson of “rejection” is lost. The article’s real value lies in the criteria for rejecting an approach. |
| “Claude Codeに397問の試験問題を自作し始めた” (Started creating 397 exam questions with Claude Code) | ❌ Reject | The number dominates too much. The core insight — “AI doesn’t propose leveraging its own capabilities” — gets buried. |
| Strategy of “targeting the optimal zone by character count” | ⚠️ Correct | “Information density” is the right metric, not character count. |
| “Claude Code で技術記事を20本書いて育てた Zenn 執筆環境の全貌” (The full picture of a Zenn writing environment built by writing 20 tech articles with Claude Code) | ⚠️ Reconsider | Should downplay the “even a non‑engineer can do it” angle and let achievement numbers speak instead. |
Key Takeaways
-
Invisible agreement became visible.
- When I read Claude’s proposals, the data backing made them feel “plausible.”
- Kimi’s critique exposed my unconscious alignment with Claude’s biases.
-
The number‑optimization trap.
- “397 questions” and “9× slower” are striking numbers, but foregrounding them sacrifices the article’s actual lessons (AI’s blind spots, criteria for rejection decisions).
-
Value of peer review.
- It isn’t just about producing the “right answer”; it’s about surfacing hidden assumptions and ensuring the core insight shines through.
Final Thoughts
Having two distinct LLMs evaluate each other turned a seemingly solid, data‑driven strategy into a richer, more nuanced one. Claude gave me a well‑structured, data‑backed baseline, while Kimi forced me to question the underlying premises and refine the messaging. The process highlighted how easy it is to let impressive numbers or familiar patterns dominate our thinking, and how essential it is to surface the real value behind every title.
Reflections on Using Kimi as a Peer‑Review Partner
“Kimi called this ‘counterproductive.’” – a common content‑marketing principle, but seeing it applied directly to my own articles (being told exactly which number was erasing which lesson) gave me a level of resolution I could only get through this experience.
Same Tool, Different Value
- Previous article: Kimi K2.5 was used as a code‑writing worker.
- Current article: Kimi K2.5 is used as a reviewer critiquing proposals.
For implementation delegation, Kimi’s swarm intelligence (parallel execution power) shines.
For peer review, the multi‑perspective nature of that swarm intelligence shines instead.
Even with the same model, handing it a spec.md versus handing it full article‑text extracts yields completely different kinds of value.
Limitations of This Approach
- Kimi’s critique isn’t necessarily “correct.”
- The model has its own biases.
- When two AIs agree, that doesn’t guarantee the answer is right.
- A human makes the final call, so human bias remains.
What peer review expands is the “scope of consideration,” not the “accuracy rate.”
What I Executed
- Title rewrites – Out of the original six proposals, I finalized five with Kimi’s revisions incorporated.
- Title‑design rules – Added seven rules to the
zenn-writerskill (e.g., “lead with numbers and pair them with emotional words,” “preserve the learning element,” etc.). - New article themes – Listed five ideas, such as:
- A comprehensive “Top 10 Settings” piece.
- A checklist‑style “Before You Trust LLM Output” piece.
- Others to fill identified gaps.
- Branding transition – Shifted direction from “even a non‑engineer can do it” to “an explorer pushing the limits of Claude Code.”
- Effectiveness tracking – The impact of these changes hasn’t been verified yet. I’ll monitor page‑views (PV) and like counts after retitling and report back once the data is in.
This was a shift in perspective from “using AI as a tool” to “using AI as a sparring partner.”
Workflow Overview
Claude (data analysis & structuring)
→ Author (review & approval)
→ Kimi (multi‑perspective critique)
→ Author (integration & final judgment)
→ Execution
- Claude handled data analysis and structuring.
- Kimi handled multi‑perspective critique and brand‑consistency checking.
This division of roles emerged through the peer‑review process itself.
Meta‑Note on This Article
- The article was designed using findings from the buzz analysis.
- The title intentionally combines the “numeric” and “hypothetical/result” patterns.
- The structure deliberately follows a “failure‑to‑lesson” arc with a “concrete‑abstract‑concrete” flow.
Whether this structure actually works will be verified by this article’s own PV and like counts.
Step‑by‑Step Production Log
-
Planning (Claude)
- Created the plan: 8 sections, three title candidates, and source materials.
-
Specification
- Converted the plan to
spec.mdand dispatched to Kimi K2.5. - Spec included:
- Tone specification (
da/dearustyle – assertive Japanese) - Reference paths for source files
- Section structure
- Tone specification (
- Converted the plan to
-
First Draft (Kimi)
- Autonomously read three source files.
- Generated a draft of ~3,800 characters.
- Result: Low quality – thin, bland prose despite following the spec.
-
Harsh Review (Claude’s editor agent)
- Verdict: “REVISE AND RESUBMIT.”
- Flagged:
- 3 CRITICAL numerical inconsistencies
- 6 MEDIUM issues (shallow thesis validation, lack of embodied lessons)
-
Revision (Claude)
- Addressed all CRITICAL and MEDIUM issues.
- Added author introspection (e.g., “I had been swept up by ‘397 questions’ myself”) and acknowledged methodology limitations.
-
Second Review (Kimi)
- No persona specified – allowed swarm intelligence to make autonomous judgments.
- Returned an A rating (recommended for publication) with three minor corrections (numerical consistency in the opening).
-
Final Integration
- Incorporated Kimi’s feedback to produce the final version.
Insights from Different Uses of Kimi
| Use Case | Outcome |
|---|---|
| Peer review (critique & analysis) | Swarm intelligence delivered multi‑perspective insights, generating 350 lines of detailed feedback. |
| Article writing (prose generation) | Claude produced dramatically higher quality prose. |
Even with the same model, critique and generation bring out capabilities in fundamentally different ways.