Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

Published: (March 18, 2026 at 05:31 PM EDT)
2 min read

Source: Hacker News

Overview

I replicated David Ng’s RYS method on consumer AMD GPUs (RX 7900 XT + RX 6950 XT) and discovered an unexpected behavior.

Transformers appear to contain discrete reasoning circuits—contiguous blocks of 3–4 layers that act as indivisible cognitive units. Duplicating the right block causes the model to run its reasoning pipeline twice, without any weight changes or additional training. The model simply “thinks longer.”

Results (lm‑evaluation‑harness, n=50)

DevStral‑24B

Layers 12‑14 duplicated once

  • BBH Logical Deduction: 0.22 → 0.76
  • GSM8K (strict): 0.48 → 0.64
  • MBPP (code generation): 0.72 → 0.78
  • No degradation observed.

Qwen2.5‑Coder‑32B

Layers 7‑9 duplicated once

  • Reasoning probe: 76 % → 94 %

Observations

  • Different duplication patterns create distinct cognitive “modes” from the same weights.

    • Double‑pass boosts mathematical reasoning.
    • Triple‑pass enhances emotional reasoning.
    • Interleaved doubling (e.g., 13,13,14,14,15,15,16) yields a pure math specialist.
  • The circuit boundaries are sharp—shifting the duplicated block by a single layer causes the effect to disappear or even invert.

  • Smaller models (≈24 B parameters) exhibit tighter circuits (≈3 layers) compared to larger ones; Ng reported 7‑layer circuits in a 72 B model.

Tools

A repository is provided that can:

  1. Detect reasoning circuits in any GGUF model.
  2. Apply arbitrary layer routing (e.g., duplication, reordering).

The entire workflow—search, discovery, and validation—was completed in a single evening.

Contact

Feel free to ask questions in the comments thread:

0 views
Back to Blog

Related posts

Read more »

OpenAI to acquire Astral

OpenAI Accelerates Codex growth to power the next generation of Python developer tools Today we’re announcing that OpenAI will acquire Astralhttps://astral.sh/,...