Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training
Source: Hacker News
Overview
I replicated David Ng’s RYS method on consumer AMD GPUs (RX 7900 XT + RX 6950 XT) and discovered an unexpected behavior.
Transformers appear to contain discrete reasoning circuits—contiguous blocks of 3–4 layers that act as indivisible cognitive units. Duplicating the right block causes the model to run its reasoning pipeline twice, without any weight changes or additional training. The model simply “thinks longer.”
Results (lm‑evaluation‑harness, n=50)
DevStral‑24B
Layers 12‑14 duplicated once
- BBH Logical Deduction: 0.22 → 0.76
- GSM8K (strict): 0.48 → 0.64
- MBPP (code generation): 0.72 → 0.78
- No degradation observed.
Qwen2.5‑Coder‑32B
Layers 7‑9 duplicated once
- Reasoning probe: 76 % → 94 %
Observations
-
Different duplication patterns create distinct cognitive “modes” from the same weights.
- Double‑pass boosts mathematical reasoning.
- Triple‑pass enhances emotional reasoning.
- Interleaved doubling (e.g., 13,13,14,14,15,15,16) yields a pure math specialist.
-
The circuit boundaries are sharp—shifting the duplicated block by a single layer causes the effect to disappear or even invert.
-
Smaller models (≈24 B parameters) exhibit tighter circuits (≈3 layers) compared to larger ones; Ng reported 7‑layer circuits in a 72 B model.
Tools
A repository is provided that can:
- Detect reasoning circuits in any GGUF model.
- Apply arbitrary layer routing (e.g., duplication, reordering).
The entire workflow—search, discovery, and validation—was completed in a single evening.
Contact
Feel free to ask questions in the comments thread: