AWS에서 비용 효율적인 AutoML 플랫폼 구축: 월 $10-25 vs SageMaker Endpoints $150 이상

발행: 3일 전 (2025년 12월 3일 오전 07:14 GMT+9)

5 min read

Source: Dev.to

TL;DR

서버리스 AutoML 플랫폼을 구축했으며, ≈ $10‑25 / month에 ML 모델을 학습합니다. CSV 파일을 업로드하고 대상 열을 선택하면 학습된 모델을 받아볼 수 있습니다—ML 전문 지식이 전혀 필요하지 않습니다.

Prerequisites

관리자 권한이 있는 AWS 계정
AWS CLI v2 (aws configure 로 설정)
Terraform ≥ 1.5
Docker (실행 중)
Node.js 18+ 및 pnpm (프론트엔드)
Python 3.11+ (로컬 개발)

Deployment

Estimated time: ~15 minutes from clone to a working platform.

git clone https://github.com/cristofima/AWS-AutoML-Lite.git
cd AWS-AutoML-Lite/infrastructure/terraform
terraform init && terraform apply

Why a Custom Solution?

AWS SageMaker Autopilot은 강력하지만 프로토타이핑에는 비용이 많이 듭니다.

무료 티어: 첫 2개월 동안 월 50 시간의 학습 제공.
실시간 추론 엔드포인트(예: ml.c5.xlarge)는 24시간 운영 시 약 $150 / month가 소요됩니다.

목표는 사이드 프로젝트용 더 저렴하고 서버리스인 대안을 만드는 것이었습니다.

Goals

목표	원하는 결과
Upload CSV → Trained model	`.pkl` 파일
Auto‑detect problem type	분류 vs. 회귀
Automatic EDA reports	즉시 제공되는 데이터 프로파일링
Cost	$25 / month 이하

Architecture Overview

Component	Technology	Reason
Backend API	FastAPI + Mangum	비동기, 자동 문서화, Lambda 친화적
Training	FLAML + scikit‑learn	빠른 AutoML, 프로덕션 준비 완료
Frontend	Next.js 16 + Tailwind	Amplify를 통한 SSR 지원
Infrastructure	Terraform	재현 가능, 다중 환경
CI/CD	GitHub Actions + OIDC	AWS 자격 증명 저장 없음

Problem‑type detection (auto)

# classification if:  **Note:** If you add a parameter to `train.py`, you must also add it to `container_overrides`.

Time budget per dataset size

행 수	시간 예산
`50 K`	20 min

Real‑time status

프론트엔드는 DynamoDB를 5 초마다 폴링하여 학습 진행 상황을 표시합니다.

Automatic reports

EDA 보고서 – 데이터 프로파일링 라이브러리로 자동 생성.
Training 보고서 – 모델 성능 지표와 특성 중요도 제공.

IAM Policy (GitHub OIDC)

{
  "Statement": [
    {
      "Sid": "CoreServices",
      "Effect": "Allow",
      "Action": ["s3:*", "dynamodb:*", "lambda:*", "batch:*", "ecr:*"],
      "Resource": "arn:aws:*:*:*:automl-lite-*"
    },
    {
      "Sid": "APIGatewayAndAmplify",
      "Effect": "Allow",
      "Action": ["apigateway:*", "amplify:*"],
      "Resource": "*"
    },
    {
      "Sid": "IAMRoles",
      "Effect": "Allow",
      "Action": ["iam:*Role*", "iam:*RolePolicy*", "iam:PassRole"],
      "Resource": "arn:aws:iam::*:role/automl-lite-*"
    },
    {
      "Sid": "ServiceLinkedRoles",
      "Effect": "Allow",
      "Action": "iam:CreateServiceLinkedRole",
      "Resource": "arn:aws:iam::*:role/aws-service-role/*"
    },
    {
      "Sid": "Networking",
      "Effect": "Allow",
      "Action": ["ec2:Describe*", "ec2:*SecurityGroup*", "ec2:*Tags"],
      "Resource": "*"
    },
    {
      "Sid": "Logging",
      "Effect": "Allow",
      "Action": "logs:*",
      "Resource": "arn:aws:logs:*:*:log-group:/aws/*/automl-lite-*"
    }
  ]
}

CI/CD Flow

브랜치	동작
`dev`	DEV에 자동 배포
`main`	`plan → manual approval → PROD에 배포`

Deployment times (approx.)

대상	시간
Lambda only	~2 min
Training container	~3 min
Frontend	~3 min
Full infrastructure	~10 min

Monthly Cost Breakdown

Service	Cost
AWS Amplify	$5‑15
Lambda + API Gateway	$1‑2
Batch (Fargate Spot)	$2‑5
S3 + DynamoDB	$1‑2
Total	$10‑25

Comparison with SageMaker

특징	SageMaker Autopilot	AWS AutoML Lite
Monthly cost	~$150 + (real‑time endpoint)	$10‑25
Setup time	30 + min (Studio)	~15 min
Portable models	❌ SageMaker에 종속	✅ `.pkl` 다운로드 가능
ML expertise	Medium	None
Auto problem detection	✅	✅
EDA reports	❌ 수동	✅ 자동
IaC	❌ 콘솔 중심	✅ 전체 Terraform
Cold start	N/A (항상 켜짐)	~200 ms (Lambda)
Best for	Production pipelines	프로토타이핑 및 사이드 프로젝트

Using the Trained Model

# Build prediction container
docker build -f scripts/Dockerfile.predict -t automl-predict .

# Show model info
docker run --rm -v ${PWD}:/data automl-predict /data/model.pkl --info

# Predict from CSV
docker run --rm -v ${PWD}:/data automl-predict \
  /data/model.pkl -i /data/test.csv -o /data/predictions.csv

Key observations

265 MB 규모의 ML 의존성 때문에 Lambda ↔ Batch로 분리해야 했습니다.
Fargate Spot을 사용하면 약 70 % 비용 절감이 가능하며, 짧은 작업에서는 중단이 거의 발생하지 않습니다.
FLAML은 AutoGluon에 비해 더 작은 풋프린트와 빠른 학습 속도를 제공하면서도 비슷한 성능을 보여줍니다.

Future Work (Roadmap)

☐ ONNX 내보내기 – 모델을 엣지 디바이스에 배포
☐ 모델 비교 UI – 여러 모델을 나란히 학습
☐ WebSocket을 통한 실시간 업데이트 (폴링 대신)
☐ Cognito 인증을 이용한 다중 사용자 지원
☐ 하이퍼파라미터 UI – 프론트엔드에서 FLAML 설정 미세 조정
☐ 학습 완료 시 이메일 알림

Contributing

기여를 환영합니다! 좋은 첫 번째 이슈는 GitHub Issues에서 확인하세요.

Repository: (link omitted)