ML로 차 판매 예측: Linear Regression, Gradient Descent & Regularization (초보자 친화적 + 코드)

발행: 5일 전 (2025년 12월 21일 오전 12:33 GMT+9)

10 min read

Source: Dev.to

위에 있는 소스 링크 외에 번역할 텍스트를 제공해 주시면 한국어로 번역해 드리겠습니다.

📚 배울 내용

선형 회귀 (차 판매량 vs. 온도)
비용 함수 (예측이 얼마나 틀렸는지)
경사 하강법 (단계별로 개선하는 방법)
과적합 (암기 vs. 패턴 학습)
정규화 (모델을 간단하게 유지)
정규화된 비용 함수 (Ridge/Lasso)
NumPy와 scikit‑learn을 활용한 실용적인 코드 예제

🧪 설정 (먼저 실행)

# Install if needed:
# pip install numpy pandas scikit-learn matplotlib

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

np.random.seed(42)

⭐ 시나리오 1 – 선형 회귀 (차 판매량 vs. 온도)

아이디어: 온도가 낮을수록 차 판매량이 높아진다. 온도에서 판매량을 예측하기 위해 직선을 그린다.

# Synthetic dataset: temperature (°C) → tea cups sold
temps = np.array([10, 12, 15, 18, 20, 22, 24, 26, 28]).reshape(-1, 1)
tea_sales = np.array([100, 95, 85, 70, 60, 55, 50, 45, 40])

# Fit a basic linear regression
lin = LinearRegression()
lin.fit(temps, tea_sales)

print("Slope (m):", lin.coef_[0])          # cups change per 1 °C
print("Intercept (c):", lin.intercept_)   # base demand when temp = 0 °C

# Predict for tomorrow (e.g., 21 °C)
tomorrow_temp = np.array([[21]])
pred_sales = lin.predict(tomorrow_temp)
print("Predicted tea cups at 21 °C:", int(pred_sales[0]))

# Plot
plt.scatter(temps, tea_sales, color="teal", label="Actual")
plt.plot(temps, lin.predict(temps), color="orange", label="Fitted line")
plt.xlabel("Temperature (°C)")
plt.ylabel("Tea cups sold")
plt.title("Linear Regression: Tea Sales vs. Temperature")
plt.legend()
plt.show()

⭐ Scenario 2 – Cost Function (Measuring Wrongness)

Idea: Cost is the average of squared errors — big mistakes hurt more.

def mse(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

y_pred = lin.predict(temps)
print("Mean Squared Error (MSE):", mse(tea_sales, y_pred))

⭐ 시나리오 3 – 경사 하강법 (단계별 개선)

아이디어: 기울기 m와 절편 c를 점진적으로 조정하여 비용을 감소시킵니다 — 차 레시피를 조정하는 것처럼.

# Gradient Descent for y = m*x + c (from scratch)
X = temps.flatten()
y = tea_sales.astype(float)

m, c = 0.0, 0.0          # initial guesses
lr = 0.0005              # learning rate (step size)
epochs = 5000

def predictions(m, c, X):
    return m * X + c

def gradients(m, c, X, y):
    y_hat = predictions(m, c, X)
    dm = (-2 / len(X)) * np.sum(X * (y - y_hat))
    dc = (-2 / len(X)) * np.sum(y - y_hat)
    return dm, dc

history = []
for _ in range(epochs):
    dm, dc = gradients(m, c, X, y)
    m -= lr * dm
    c -= lr * dc
    history.append(mse(y, predictions(m, c, X)))

print(f"GD learned slope m={m:.3f}, intercept c={c:.3f}, final MSE={history[-1]:.2f}")

# Plot loss curve
plt.plot(history)
plt.xlabel("Epoch")
plt.ylabel("MSE (Cost)")
plt.title("Gradient Descent: Cost vs. Epochs")
plt.show()

팁: lr이 너무 크면 손실이 튀거나 폭발합니다. 너무 작으면 학습이 매우 느려집니다.

⭐ 시나리오 4 – 과적합 (노이즈 암기)

유용한 특징과 노이즈 특징을 포함한 더 풍부한 데이터셋을 시뮬레이션합니다.

# Build a dataset with signal + noise
n = 300
temp      = np.random.uniform(5, 35, size=n)               # useful
rain      = np.random.binomial(1, 0.3, size=n)             # somewhat useful
festival  = np.random.binomial(1, 0.1, size=n)             # sometimes useful
traffic   = np.random.normal(0, 1, size=n)                # weak/noisy
dog_barks = np.random.normal(0, 1, size=n)                # pure noise

# True relationship (unknown to the model)
true_sales = (120 - 2.5 * temp + 10 * rain + 15 * festival
              + 1.0 * np.random.normal(0, 3, size=n))   # added noise

# Feature matrix
X = np.column_stack([temp, rain, festival, traffic, dog_barks])
feature_names = ["temp", "rain", "festival", "traffic", "dog_barks"]

X_train, X_test, y_train, y_test = train_test_split(
    X, true_sales, test_size=0.25, random_state=42
)

# Plain Linear Regression (can overfit)
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

print("Linear Regression Coefficients:")
for name, coef in zip(feature_names, lr_model.coef_):
    print(f"  {name}: {coef:.3f}")

print("Train MSE:", mean_squared_error(y_train, lr_model.predict(X_train)))
print("Test  MSE:", mean_squared_error(y_test,  lr_model.predict(X_test)))

명백히 노이즈가 많은 특징(예: dog_barks)에 큰 계수가 나타나거나 학습 MSE가 테스트 MSE보다 현저히 낮은 경우, 이는 과적합을 의미합니다.

⭐ 시나리오 5 – 과적합 해결

전략

쓸모없는 특성 제거 (수동 특성 선택).
데이터 더 수집 (고전적인 해결책).
정규화 사용 (큰 가중치에 대한 체계적인 패널티).

⭐ Scenario 6 – Regularization (Penalty for Complexity)

정규화는 큰 계수를 축소시키는 비용에 패널티 항을 추가합니다 — 마치 차 제조사에게 재료를 적게 사용하라고 말하거나 보너스를 잃는 것과 같습니다.

⭐ Scenario 7 – Regularized Linear Regression (Ridge & Lasso)

# Ridge (L2) – penalizes squared weights
ridge = Ridge(alpha=1.0)          # alpha = regularization strength
ridge.fit(X_train, y_train)

# Lasso (L1) – penalizes absolute weights, can zero‑out features
lasso = Lasso(alpha=0.5, max_iter=10000)
lasso.fit(X_train, y_train)

def show_results(model, name):
    print(f"\n{name} Coefficients:")
    for feat, coef in zip(feature_names, model.coef_):
        print(f"  {feat}: {coef:.3f}")
    train_mse = mean_squared_error(y_train, model.predict(X_train))
    test_mse  = mean_squared_error(y_test,  model.predict(X_test))
    print(f"Train MSE: {train_mse:.2f}")
    print(f"Test  MSE: {test_mse:.2f}")

show_results(ridge, "Ridge")
show_results(lasso, "Lasso")

확인할 내용

모델	계수에 대한 효과	일반적인 결과
Ridge	모든 계수를 0에 가깝게 축소하지만 모두 유지한다	분산을 감소시키고 테스트 세트 성능을 향상시킨다
Lasso	일부 계수를 정확히 0으로 만들 수 있다	정규화와 특징 선택을 동시에 수행한다

🎉 Wrap‑Up

Linear regression은 간단하고 해석 가능한 모델을 제공합니다.
**Cost (MSE)**는 예측 오류를 정량화합니다.
Gradient descent는 반복적으로 해당 비용을 최소화합니다.
Overfitting은 모델이 잡음(노이즈)을 외워버릴 때 발생합니다.
Regularization (Ridge/Lasso)은 큰 가중치에 페널티를 부여하여 과적합을 억제합니다.

이제 차 가게 직관을 실제 머신러닝 실무와 연결한 완전하고 실행 가능한 노트북‑스타일 가이드를 갖게 되었습니다. 모델링을 즐기세요!

ss 특징

# Ridge: L2 penalty
ridge = Ridge(alpha=10.0)   # alpha = λ (higher = stronger penalty)
ridge.fit(X_train, y_train)

print("\nRidge Coefficients (alpha=10):")
for name, coef in zip(feature_names, ridge.coef_):
    print(f"  {name}: {coef:.3f}")

print("Ridge Train MSE:", mean_squared_error(y_train, ridge.predict(X_train)))
print("Ridge Test  MSE:", mean_squared_error(y_test,  ridge.predict(X_test)))

# Lasso: L1 penalty
lasso = Lasso(alpha=1.0)    # try different alphas like 0.1, 0.5, 2.0
lasso.fit(X_train, y_train)

print("\nLasso Coefficients (alpha=1.0):")
for name, coef in zip(feature_names, lasso.coef_):
    print(f"  {name}: {coef:.3f}")

print("Lasso Train MSE:", mean_squared_error(y_train, lasso.predict(X_train)))
print("Lasso Test  MSE:", mean_squared_error(y_test,  lasso.predict(X_test)))

살펴볼 내용

Ridge는 잡음이 섞인 계수를 축소시켜 0에 가깝게 만든다.
Lasso는 실제로 필요 없는 피처를 정확히 0으로 만들어 (특성 선택) 할 수 있다.
테스트 MSE는 일반 선형 회귀보다 개선되어야 한다.

⭐ 시나리오 8: 정규화가 과적합을 해결하는 방법 (깊이 있는 탐구)

다양한 페널티에 따른 모델을 비교하고 계수 수축을 시각화해 봅시다.

alphas = [0.0, 0.1, 1.0, 10.0, 50.0]  # 0.0 ~ 비교를 위한 일반 선형 회귀
coef_paths_ridge = []
train_mse_ridge, test_mse_ridge = [], []

for a in alphas:
    if a == 0.0:
        model = LinearRegression()
    else:
        model = Ridge(alpha=a)
    model.fit(X_train, y_train)
    coef_paths_ridge.append(model.coef_)
    train_mse_ridge.append(mean_squared_error(y_train, model.predict(X_train)))
    test_mse_ridge.append(mean_squared_error(y_test, model.predict(X_test)))

coef_paths_ridge = np.array(coef_paths_ridge)

# Plot Ridge coefficient paths
plt.figure(figsize=(8, 5))
for i, name in enumerate(feature_names):
    plt.plot(alphas, coef_paths_ridge[:, i], marker="o", label=name)
plt.xscale("log")
plt.xlabel("alpha (log scale)")
plt.ylabel("Coefficient value")
plt.title("Ridge: Coefficient Shrinkage with Increasing Penalty")
plt.legend()
plt.show()

# Plot Train vs Test MSE for Ridge
plt.figure(figsize=(8, 5))
plt.plot(alphas, train_mse_ridge, marker="o", label="Train MSE")
plt.plot(alphas, test_mse_ridge, marker="o", label="Test MSE")
plt.xscale("log")
plt.xlabel("alpha (log scale)")
plt.ylabel("MSE")
plt.title("Ridge: Train vs Test MSE Across Penalties")
plt.legend()
plt.show()

해석

알파가 낮을 때, 계수가 크게 유지되어 → 과적합 위험 (훈련 MSE는 낮고 테스트 MSE는 높음).
알파가 증가함에 따라, 계수가 수축 → 모델이 단순해지고 일반화 성능이 향상.
알파가 너무 높으면, 모델이 너무 단순해져 → 과소적합 (두 MSE 모두 상승).
테스트 MSE가 가장 낮은 알파를 찾으세요 — 바로 최적의 지점입니다.

🧠 보너스: 간단한 차 소비 예측 함수

def forecast_tea_cups(temp_c, rain=0, festival=0, model=ridge):
    """Quick helper using your fitted model (default: ridge)."""
    x = np.array([[temp_c, rain, festival, 0.0, 0.0]])  # ignore traffic/dog_barks at prediction time
    return float(model.predict(x)[0])

print("Forecast for 18°C, raining, festival day:",
      round(forecast_tea_cups(18, rain=1, festival=1)))
print("Forecast for 30°C, no rain, normal day:",
      round(forecast_tea_cups(30, rain=0, festival=0)))

✅ 최종 요약

Linear Regression: 특징과 목표 사이에 최적의 직선을 그립니다.
Cost Function (MSE): 예측 오류, 특히 큰 오류에 대해 페널티를 부여합니다.
Gradient Descent: 비용을 최소화하도록 파라미터를 반복적으로 개선합니다.
Overfitting: 모델이 잡음까지 학습합니다; 훈련 데이터에서는 뛰어나지만 새로운 데이터에서는 성능이 떨어집니다.
Regularization (Ridge/Lasso): 가중치를 축소하고 잡음을 제거하여 일반화 능력을 향상시킵니다.
Choose α (lambda) carefully: 너무 작으면 → 과적합; 너무 크면 → 과소적합.