I built a real AI video processing SaaS from Senegal no GPT wrappers, just HuggingFace + OpenCV + YOLO + Detectron2+Medidapie+ Celery
Source: Dev.to
Problem I was solving
Every creator I know spends 3–4 hours manually cutting video.
The algorithm rewards volume — not perfection.
Solution Overview
I built ClipFarmer, a SaaS that processes video entirely with on‑premise machine‑learning models instead of third‑party API wrappers.
Machine Learning Models
- Whisper (HuggingFace) – automatic speech transcription.
- YOLO + OpenCV (cv2) – scene detection.
- Detectron2 – instance segmentation.
- MediaPipe – pose and face landmark detection.
- OpenCV (cv2) – backbone for all frame‑level operations.
These are real models running locally; no external API calls.
Effects Pipeline
Each effect is a cv2 pipeline that processes frames:
- Color grading (dark moody, vintage grain, RGB split)
- CRT scanline overlay
- Motion blur
- Skeleton overlay (MediaPipe pose)
- Background removal (Detectron2 masks)
- Transitions between clips using frame blending
Architecture
Backend
- FastAPI + Celery + RabbitMQ + Redis
AI / Computer Vision Stack
- Whisper, YOLO, Detectron2, MediaPipe, OpenCV
Storage
- MinIO (self‑hosted S3‑compatible, presigned uploads)
Frontend
- React + Vite + TailwindCSS
Database
- PostgreSQL + SQLAlchemy (async)
Deployment
- Docker Compose on a VPS
Task Orchestration
# Celery chord that runs the whole pipeline
workflow = chord(
spliter_clip.s(job.job_id, input_path),
workflow_tasks_parallel.s()
)
task_result = workflow()
The workflow first splits the video, then applies effects, subtitles, and transitions in parallel.
Regional Considerations (Senegal & West Africa)
- Mobile money (Wave, Orange Money) is the primary payment method; credit cards are rare.
- ClipFarmer accepts Wave and Orange Money natively.
- Many AI tools seen locally are scams or inaccessible, so providing a locally hosted solution is crucial.
Challenges Faced
- Conflicting ML dependencies across models.
- Presigned uploads are mandatory for large video files.
cv2frame processing is slow without proper batching.- Docker networking can be unexpectedly restrictive.
Availability
Live at clipfarmer.site – free credits are available for testing.
Call for Feedback
I’m curious: has anyone else built a cv2‑based processing pipeline? What would make you switch from manual editing to an automated solution like ClipFarmer?