๐Ÿง โœ‚๏ธ Neural Network Lobotomy: LLM์—์„œ 7๊ฐœ์˜ ๋ ˆ์ด์–ด๋ฅผ ์ œ๊ฑฐ โ€” 30% ๋” ๋นจ๋ผ์ง

๋ฐœํ–‰: (2026๋…„ 1์›” 10์ผ ์˜ค์ „ 02:46 GMT+9)
5 min read
์›๋ฌธ: Dev.to

Iโ€™m happy to translate the article for you, but Iโ€™ll need the full text of the post (excluding the source line you already provided). Could you please paste the content youโ€™d like translated? Once I have it, Iโ€™ll keep the source link unchanged and translate the rest into Korean while preserving the original formatting and any code blocks or URLs.

TL;DR

์ œ๊ฑฐ ์ „๋žต์†๋„ โ†‘Perplexity ฮ”ํ’ˆ์งˆ ฮ”Works?
Baseline (์ œ๊ฑฐ ์—†์Œ)โ€“1.82โ€”โœ…
์ค‘๊ฐ„ ๋ ˆ์ด์–ด #11 ์ œ๊ฑฐ+10โ€ฏ% (59โ€ฏโ†’โ€ฏ64โ€ฏtok/s)1.89 (+4โ€ฏ%)โ€“4โ€ฏ%โœ…
์ค‘๊ฐ„ ๋ ˆ์ด์–ด 3๊ฐœ #10โ€‘12 ์ œ๊ฑฐ+12โ€ฏ% (59โ€ฏโ†’โ€ฏ66โ€ฏtok/s)2.24 (+23โ€ฏ%)โ€“23โ€ฏ%โœ…
์ฒซ ๋ฒˆ์งธ ๋ ˆ์ด์–ด #0 ์ œ๊ฑฐ+10โ€ฏ% (59โ€ฏโ†’โ€ฏ64โ€ฏtok/s)5.74 (+215โ€ฏ%)โ€“215โ€ฏ%โŒ
7๊ฐœ์˜ โ€œ์•ˆ์ „โ€ ๋ ˆ์ด์–ด ์ œ๊ฑฐ (3,โ€ฏ4,โ€ฏ5,โ€ฏ9,โ€ฏ10,โ€ฏ11,โ€ฏ12)+30โ€ฏ% (59โ€ฏโ†’โ€ฏ77โ€ฏtok/s)~1.87 (โ‰ˆโ€ฏโ€‘2.5โ€ฏ%)โ€“2.5โ€ฏ%โœ…

๋ชจ๋“  ์ธก์ •๊ฐ’์€ MPS ๋ฐฑ์—”๋“œ์—์„œ 10ํšŒ ์‹คํ–‰(5ํšŒ ์›Œ๋ฐโ€‘์—…) ํ‰๊ท ์ž…๋‹ˆ๋‹ค.

Motivation

์Šคํƒ€ํŠธ์—…์€ LLM ์ถ”๋ก ์„ ์œ„ํ•ด GPU์— ์ˆ˜๋ฐฑ๋งŒ ๋‹ฌ๋Ÿฌ๋ฅผ ํˆฌ์žํ•ฉ๋‹ˆ๋‹ค. OpenAI๋Š” ํ•˜๋ฃจ์— $700โ€ฏk ์ •๋„๋ฅผ ์ปดํ“จํŒ… ๋น„์šฉ๋งŒ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ์•Œ๋ ค์กŒ์Šต๋‹ˆ๋‹ค. ํ’ˆ์งˆ ์†์‹ค์ด ๋ˆˆ์— ๋„์ง€ ์•Š๋Š” ๋ชจ๋ธ ๊ฐ€์† ์ตœ์ ํ™”๋Š” ์ง์ ‘์ ์ธ ๋น„์šฉ ์ ˆ๊ฐ์œผ๋กœ ์ด์–ด์ง‘๋‹ˆ๋‹ค.

๋ ˆ์ด์–ด ํ”„๋ฃจ๋‹์€ ํ•˜๋“œ์›จ์–ด์— ๊ตฌ์• ๋ฐ›์ง€ ์•Š๋Š” ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค:

  • ์ตœ์‹  ๋ชจ๋ธ์€ ์ˆ˜์‹ญ(๋˜๋Š” ์ˆ˜๋ฐฑ) ๊ฐœ์˜ ๋ ˆ์ด์–ด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค (GPTโ€‘4 โ‰ˆโ€ฏ120+).
  • ๋ชจ๋“  ๋ ˆ์ด์–ด๊ฐ€ ์ตœ์ข… ์„ฑ๋Šฅ์— ๋™์ผํ•˜๊ฒŒ ๊ธฐ์—ฌํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค.
  • ์ผ๋ถ€ ๋ ˆ์ด์–ด๋Š” ๋ชจ๋ธ์ด โ€œ๊ฑฐ์˜ ๋ˆˆ์น˜์ฑ„์ง€ ๋ชปํ•˜๊ฒŒโ€ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์—ฐ๊ตฌ ShortGPT (2024) ์—์„œ๋Š” LLaMAโ€‘2์—์„œ **์ตœ๋Œ€ 25โ€ฏ%**์˜ ๋ ˆ์ด์–ด๋ฅผ ์ œ๊ฑฐํ•ด๋„ ๋œ๋‹ค๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค

Note: The โ€œAggressiveโ€ setting is shown for completeness; quality deteriorates quickly beyond the balanced configuration.

๋งˆ๋ฌด๋ฆฌ ์ƒ๊ฐ

  • ์ดˆ๊ธฐ ๋ ˆ์ด์–ด๋Š” ์œ„์น˜ ์ •๋ณด์™€ ๊ธฐ๋ณธ ํ† ํฐ ๊ด€๊ณ„๋ฅผ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹คโ€”์ด๋ฅผ ์ œ๊ฑฐํ•˜๋ฉด ์žฌ์•™์ด ๋ฉ๋‹ˆ๋‹ค.
  • ๋‘ ๋ฒˆ์งธ ๋ ˆ์ด์–ด๋Š” ์–ธ์–ด ํŒจํ„ด์˜ โ€œ๊ฒฐ์ •ํ™” ์ง€์ โ€์ธ ๊ฒƒ์œผ๋กœ ๋ณด์ด๋ฉฐ, ์˜ˆ์ƒ์™ธ๋กœ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
  • ์ค‘๊ฐ„โ€‘ํ›„๋ฐ˜ ๋ ˆ์ด์–ด์˜ ์ƒ๋‹น ๋ถ€๋ถ„์€ ์ด ์ž‘์€ ๋ชจ๋ธ์— ๋ถˆํ•„์š”ํ•˜์—ฌ, ์ ์€ ๋…ธ๋ ฅ์œผ๋กœ ๋” ๋น ๋ฅธ ์ถ”๋ก ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

ํ–ฅํ›„ ์ž‘์—…์—์„œ๋Š” ๋™์  ํ”„๋ฃจ๋‹(ํ”„๋กฌํ”„ํŠธ๋ณ„ ๋ ˆ์ด์–ด ํ™œ์„ฑํ™”/๋น„ํ™œ์„ฑํ™”)์ด๋‚˜ ์ง€์‹โ€‘์ฆ๋ฅ˜๋ฅผ ํƒ๊ตฌํ•˜์—ฌ ๋ถˆํ•„์š”ํ•œ ๋ ˆ์ด์–ด์˜ ๊ธฐ์—ฌ๋ฅผ ๋” ์Šฌ๋ฆผํ•œ ์•„ํ‚คํ…์ฒ˜์— ๋…น์—ฌ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋“  ์ฝ”๋“œ์™€ ์›์‹œ ์ธก์ • ๋กœ๊ทธ๋Š” ๋‚ด ๊ณต๊ฐœ GitHub ์ €์žฅ์†Œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๊ฐ„๊ฒฐํ•จ์„ ์œ„ํ•ด ๋งํฌ๋Š” ์ƒ๋žต).

Source:

edup

Pruning Results

StrategyRemoved LayersSpeedโ€‘upQuality loss
Minimal{3}~0.4โ€ฏ%~5โ€ฏ%
Moderate{3, 5, 10, 11}~1โ€ฏ%~18โ€ฏ%
Aggressive{3, 4, 5, 9, 10, 11, 12}~2.5โ€ฏ%~32โ€ฏ%

Optimal strategy: remove least important layers

# Layers whose PPL increase
# **Important:** Never remove layers 0, 2, 15 โ€“ they are critical points.
YearProjectFocus
2024ShortGPTRemoving entire layers
2024FinerCutRemoving components within layers
2024SliceGPTRemoving rows/columns from weight matrices
2025LinearPatchRecovering 94โ€ฏ% quality after pruning via Hadamard transform (arXiv)
2025MRP (Maximum Redundancy Pruning)Adaptive removal of most redundant layers (arXiv)
2025CLP (Automatic segment search)Finding optimal segments to remove (arXiv)

Combining pruning with quantisation (INT4/INT8) can yield even greater speedโ€‘ups.

Business impact

  • Cost saving: For a $10โ€ฏk/month inference GPU budget, pruning can save $2โ€“3โ€ฏk without noticeable quality loss.
  • Scale: At OpenAIโ€™s scale, this translates to millions of dollars.

Caveats & considerations

  • Model size: Results shown for TinyLlamaโ€ฏ1.1B; may differ for 7โ€ฏB / 70โ€ฏB models.
  • Metric limitation: Perplexity does not capture all quality aspects.
  • Fineโ€‘tuning: Postโ€‘pruning fineโ€‘tuning can recover some lost quality.
  • Dataset diversity: Experiments were run on a single dataset; broader testing is needed.
  • Measurement variance: Speed on MPS backend varies ยฑ10โ€ฏ%; run many trials for reliable numbers.
  • Chainโ€‘ofโ€‘thought degradation: Recent work (arXivโ€ฏ2510.22228) shows that removing even 1โ€“2 layers can break multiโ€‘step reasoning, while simple tasks remain unaffected.

Reproducibility

All experiment code is available on GitLab:

git clone https://gitlab.com/molchanov.artem.1994/lobotomyllm
cd lobotomyLlm
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python experiments/run_ablation.py --experiment quick

Key insights

  • Layerโ€ฏ2 is unexpectedly the most important (more so than Layerโ€ฏ0).
  • Layersโ€ฏ3โ€‘5 and 9โ€‘12 are largely redundant and can be removed with minimal impact.
  • Layerโ€ฏ15 is a hidden critical layer in the later part of the network.
  • Practical result: Removing 7 layers (22โ€ฏโ†’โ€ฏ15) yields ~32โ€ฏ% speedโ€‘up with ~2.5โ€ฏ% quality loss.

Next steps

  1. Run the same pipeline on Llamaโ€‘3โ€ฏ8B for stronger validation.
  2. Explore pruningโ€ฏ+โ€ฏquantisation combinations.
  3. Investigate what critical layers (2 & 15) actually encode.

If you liked this, subscribe, star the GitLab repo, and share with colleagues.

Questions and suggestions? Drop a comment or DM.

Tags: #MachineLearning #LLM #Optimization #PyTorch #NLP #DeepLearning

Back to Blog

๊ด€๋ จ ๊ธ€

๋” ๋ณด๊ธฐ ยป

ํŒŒ์ธํŠœ๋‹ & ๋ชจ๋ธ ์ตœ์ ํ™”: ํ•ต์‹ฌ ํŠธ๋ Œ๋“œ ๋ฐ ์ธ์‚ฌ์ดํŠธ

๊ฐœ์š” ์ธ๊ณต์ง€๋Šฅ(AI) ๋ฐ ๋จธ์‹ ๋Ÿฌ๋‹(ML) ๋ถ„์•ผ๊ฐ€ ์ง€์†์ ์œผ๋กœ ๋ฐœ์ „ํ•จ์— ๋”ฐ๋ผ, ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ํŒŒ์ธํŠœ๋‹ ๋ฐ ์ตœ์ ํ™”๊ฐ€โ€ฆ

GLM-4.7-ํ”Œ๋ž˜์‹œ

๋ฒˆ์—ญํ•  ํ…์ŠคํŠธ๋ฅผ ์ œ๊ณตํ•ด ์ฃผ์‹œ๊ฒ ์–ด์š”? ํ…์ŠคํŠธ๋ฅผ ์•Œ๋ ค์ฃผ์‹œ๋ฉด ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญํ•ด ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

Show HN: Intent Layer: AI ์—์ด์ „ํŠธ๋ฅผ ์œ„ํ•œ ์ปจํ…์ŠคํŠธ ์—”์ง€๋‹ˆ์–ด๋ง ์Šคํ‚ฌ

์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค๋งŒ, ์ œ๊ณตํ•ด ์ฃผ์‹  URL์˜ ๋‚ด์šฉ์„ ์ง์ ‘ ํ™•์ธํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๋ฒˆ์—ญ์„ ์›ํ•˜๋Š” ํ…์ŠคํŠธ๋ฅผ ์—ฌ๊ธฐ ์ฑ„ํŒ…์— ๋ถ™์—ฌ ์ฃผ์‹œ๋ฉด ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญํ•ด ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.