๐Ÿš€ ํ•˜๋“œ์›จ์–ด ์—…๊ทธ๋ ˆ์ด๋“œ ์—†์ด ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต ์‹œ๊ฐ„์„ 45% ๋‹จ์ถ•ํ•œ ๋ฐฉ๋ฒ•

๋ฐœํ–‰: (2025๋…„ 11์›” 30์ผ ์˜คํ›„ 02:57 GMT+9)
6 min read
์›๋ฌธ: Dev.to

Source: Dev.to

๐Ÿš€ ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต ์‹œ๊ฐ„์„ 45% ๋‹จ์ถ•ํ•œ ๋ฐฉ๋ฒ• โ€” ํ•˜๋“œ์›จ์–ด ์—…๊ทธ๋ ˆ์ด๋“œ ์—†์ด

๋จธ์‹ ๋Ÿฌ๋‹ ์—”์ง€๋‹ˆ์–ด๋“ค์€ ์ข…์ข… ๋†’์€ ์ •ํ™•๋„, ๋” ์ข‹์€ ์•„ํ‚คํ…์ฒ˜, ์ตœ์‹  ๋ชจ๋ธ์„ ์ถ•ํ•˜ํ•˜์ง€๋งŒ, ๊ฑฐ์˜ ์ฃผ๋ชฉ๋ฐ›์ง€ ๋ชปํ•˜๋Š” ๋˜ ๋‹ค๋ฅธ ๊ฐ•๋ ฅํ•œ ๋ ˆ๋ฒ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:

ํ•™์Šต ํšจ์œจ์„ฑ โ€” ์‹คํ—˜, ๋ฐ˜๋ณต, ๊ฐœ์„ ์„ ์–ผ๋งˆ๋‚˜ ๋น ๋ฅด๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€.

์‹ค์ œ ์—”์ง€๋‹ˆ์–ด๋ง ํ™˜๊ฒฝ์—์„œ๋Š” ์†๋„ = ์ƒ์‚ฐ์„ฑ ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ ํ•™์Šต์ด ๋นจ๋ผ์ง€๋ฉด:

  • ํ•˜๋ฃจ์— ๋” ๋งŽ์€ ์‹คํ—˜ ์ˆ˜ํ–‰
  • ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„ ๊ฐ€์†ํ™”
  • ์ปดํ“จํŒ… ๋น„์šฉ ์ ˆ๊ฐ
  • ๋ฐฐํฌ ์†๋„ ํ–ฅ์ƒ

๋” ํฐ GPU๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜๊ฑฐ๋‚˜ ๋น„์‹ผ ํด๋ผ์šฐ๋“œ ์„œ๋ฒ„๋ฅผ ์ž„๋Œ€ํ•˜๋Š” ๋Œ€์‹ , ๋‚˜๋Š” ์†Œํ”„ํŠธ์›จ์–ด ์ˆ˜์ค€ ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•ด ํ•™์Šต์„ ์–ผ๋งˆ๋‚˜ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์‹คํ—˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

๐ŸŽฏ ์‹คํ—˜ ์„ค์ •

๋ฐ์ดํ„ฐ์…‹

  • MNIST โ€“ ํ•™์Šต ์ƒ˜ํ”Œ 20,000๊ฐœ + ํ…Œ์ŠคํŠธ 5,000๊ฐœ (๋น ๋ฅธ ๋น„๊ต๋ฅผ ์œ„ํ•œ ์„œ๋ธŒ์…‹)

ํ”„๋ ˆ์ž„์›Œํฌ

  • TensorFlow 2
  • Google Colab GPU ํ™˜๊ฒฝ

ํ…Œ์ŠคํŠธํ•œ ๊ธฐ๋ฒ•

TechniqueDescription
Baseline๊ธฐ๋ณธ ํ•™์Šต (float32), ์ตœ์ ํ™” ์—†์Œ
Caching + Prefetching๋ฐ์ดํ„ฐ ๋กœ๋”ฉ ๋ณ‘๋ชฉ ์ œ๊ฑฐ
Mixed PrecisionFP16 + FP32 ํ˜ผํ•ฉ ์—ฐ์‚ฐ ์‚ฌ์šฉ
Gradient Accumulationํฐ VRAM ์—†์ด ํฐ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ ์‹œ๋ฎฌ๋ ˆ์ด์…˜

๐Ÿ“Š ํ•™์Šต ์‹œ๊ฐ„ ๊ฒฐ๊ณผ (5 Epoch)

TechniqueTime (seconds)
Baseline20.03
Caching + Prefetching11.27 (โ‰ˆโ€ฏ45โ€ฏ% faster)
Mixed Precision15.89
Gradient Accumulation14.65

Caching + Prefetching๋งŒ์œผ๋กœ๋„ ํ•™์Šต ์‹œ๊ฐ„์ด ๊ฑฐ์˜ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์–ด๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿง  ํ•ต์‹ฌ ์ธ์‚ฌ์ดํŠธ

์ž‘์€ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š” ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ โ†’ GPU ์œ ํœด ์‹œ๊ฐ„์ด ๋ณ‘๋ชฉ์ด ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์•„๋‹ˆ๋ผ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ณ ์น˜์„ธ์š”.

๐Ÿงฉ ๊ธฐ๋ฒ• ์ƒ์„ธ ๋ถ„์„

1. Data Caching + Prefetching

train_ds = train_ds.cache().prefetch(tf.data.AUTOTUNE)

์™œ ๋„์›€์ด ๋˜๋Š”๊ฐ€

  • ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ ๋ฒˆ๋งŒ ๋กœ๋“œํ•˜๊ณ  RAM์— ์ €์žฅ
  • Prefetch๊ฐ€ ๋ฐ์ดํ„ฐ ์ค€๋น„์™€ GPU ์—ฐ์‚ฐ์„ ๊ฒน์น˜๊ฒŒ ํ•จ
  • GPU ๋Œ€๊ธฐ ์‹œ๊ฐ„ ์ œ๊ฑฐ

ํŠธ๋ ˆ์ด๋“œ์˜คํ”„

  • ์ถฉ๋ถ„ํ•œ RAM ํ•„์š”
  • ์ปดํ“จํŒ…์ด ๋ณ‘๋ชฉ์ด๋ฉด ํšจ๊ณผ๊ฐ€ ์ ์Œ

2. Mixed Precision Training

from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy('mixed_float16')

์™œ ๋„์›€์ด ๋˜๋Š”๊ฐ€

  • FP16 ์—ฐ์‚ฐ์ด ๋” ๋น ๋ฅด๊ณ  ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ์ ์Œ
  • Tensor Core๊ฐ€ ํ–‰๋ ฌ ์—ฐ์‚ฐ์„ ๊ฐ€์†ํ™”

๊ฐ€์žฅ ์ ํ•ฉํ•œ ๊ฒฝ์šฐ

  • CNN, Transformer, diffusion ๋ชจ๋ธ
  • ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹ + ์ตœ์‹  GPU (T4, A100, RTXโ€ฏ30/40 ์‹œ๋ฆฌ์ฆˆ)

ํŠธ๋ ˆ์ด๋“œ์˜คํ”„

  • ์ •ํ™•๋„ ์•ฝ๊ฐ„์˜ ๋“œ๋ฆฌํ”„ํŠธ ๊ฐ€๋Šฅ์„ฑ
  • CPU ์ „์šฉ ์‹œ์Šคํ…œ์—์„œ๋Š” ์ด์  ์—†์Œ

3. Gradient Accumulation

loss = loss / accumulation_steps
loss.backward()
if (step + 1) % accumulation_steps == 0:
    optimizer.step()
    optimizer.zero_grad()

์™œ ๋„์›€์ด ๋˜๋Š”๊ฐ€

  • VRAM์ด ์ ์€ GPU์—์„œ๋„ ํฐ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
  • ๊ทธ๋ž˜๋””์–ธํŠธ ์•ˆ์ •์„ฑ ํ–ฅ์ƒ

ํŠธ๋ ˆ์ด๋“œ์˜คํ”„

  • ์—ํฌํฌ๋‹น ์‹ค์ œ ์‹œ๊ฐ„์€ ๋А๋ ค์ง
  • ์ปค์Šคํ…€ ๋ฃจํ”„ ๊ตฌํ˜„ ํ•„์š”

โš  ์‹ค์ œ ํ˜„์žฅ ๊ด€์ : ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๊ฐ€ ์ค‘์š”

TechniqueMain BenefitPotential Issue
Caching + PrefetchingGPU ํ™œ์šฉ๋„ ๊ทน๋Œ€ํ™”๋†’์€ RAM ์‚ฌ์šฉ๋Ÿ‰
Mixed Precisionํฐ ์†๋„ ํ–ฅ์ƒํ˜ธํ™˜ ํ•˜๋“œ์›จ์–ด ํ•„์š”
Gradient Accumulation์ž‘์€ GPU์—์„œ๋„ ๋Œ€ํ˜• ๋ชจ๋ธ ํ•™์Šต๋‹จ๊ณ„๋‹น ์‹œ๊ฐ„ ์ฆ๊ฐ€

์™„๋ฒฝํ•œ ๊ธฐ๋ฒ•์€ ์—†์Šต๋‹ˆ๋‹คโ€”์˜ค์ง ์ƒํ™ฉ์— ๋งž๋Š” ํŒ๋‹จ๋งŒ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ตœ๊ณ ์˜ ์—”์ง€๋‹ˆ์–ด๋Š” ์‹ค์ œ ๋ณ‘๋ชฉ์— ๋”ฐ๋ผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

๐Ÿง  ์–ธ์ œ ์–ด๋–ค ๊ธฐ๋ฒ•์„ ์จ์•ผ ํ• ๊นŒ

ProblemBest Solution
๋А๋ฆฐ ๋ฐ์ดํ„ฐ ๋•Œ๋ฌธ์— GPU๊ฐ€ ์œ ํœด ์ƒํƒœCaching + Prefetch
GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑGradient Accumulation
์—ฐ์‚ฐ์ด ๋ณ‘๋ชฉ์ธ ์›Œํฌ๋กœ๋“œMixed Precision

๐ŸŽฏ ์ตœ์ข… ์ •๋ฆฌ

ํ•ญ์ƒ ํฐ GPU๊ฐ€ ํ•„์š”ํ•œ ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค. ๋” ๋˜‘๋˜‘ํ•œ ํ•™์Šต์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
ํšจ์œจ์„ฑ ์—”์ง€๋‹ˆ์–ด๋ง์€ ํŠนํžˆ ๋Œ€๊ทœ๋ชจ ํ™˜๊ฒฝ์—์„œ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”— ์ „์ฒด ๋…ธํŠธ๋ถ + ๊ตฌํ˜„

  • ํ•™์Šต ์‹œ๊ฐ„ ๋น„๊ต
  • ์„ฑ๋Šฅ ์‹œ๊ฐํ™” ์ฐจํŠธ
  • ๋ฐ”๋กœ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ Colab ๋…ธํŠธ๋ถ
  • ์™„์ „ ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๊ตฌํ˜„

๐Ÿ’ฌ ๋‹ค์Œ์— ํƒ๊ตฌํ•  ๋‚ด์šฉ

  • ๋ถ„์‚ฐ ํ•™์Šต (DDP / Horovod)
  • XLA & ONNX Runtime ๊ฐ€์†
  • ResNet / EfficientNet / Transformer ๋ฒค์น˜๋งˆํฌ
  • ํŒŒ์ดํ”„๋ผ์ธ ๋ณ‘๋ชฉ ํ”„๋กœํŒŒ์ผ๋ง

๐Ÿค ์ปค๋ฎค๋‹ˆํ‹ฐ ์งˆ๋ฌธ

์—ฌ๋Ÿฌ๋ถ„์ด ๋‹ฌ์„ฑํ•œ ๊ฐ€์žฅ ํฐ ํ•™์Šต ์†๋„ ํ–ฅ์ƒ์€ ๋ฌด์—‡์ด๋ฉฐ, ์–ด๋–ป๊ฒŒ ์ด๋ฃจ์—ˆ๋‚˜์š”?

Back to Blog

๊ด€๋ จ ๊ธ€

๋” ๋ณด๊ธฐ ยป

TensorFlow ๋ชจ๋ธ์„ PyTorch๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋„์ „์— ๋Œ€ํ•˜์—ฌ

๋ ˆ๊ฑฐ์‹œ AI/ML ๋ชจ๋ธ์„ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜๊ณ  ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ• โ€˜TensorFlow ๋ชจ๋ธ์„ PyTorch๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋„์ „ ๊ณผ์ œโ€™๋ผ๋Š” ๊ธ€์ด ์ฒ˜์Œ์œผ๋กœ Towards Data Science์— ๊ฒŒ์žฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค....