I built a real AI video processing SaaS from Senegal no GPT wrappers, just HuggingFace + OpenCV + YOLO + Detectron2+Medidapie+ Celery

Published: 2 days ago (May 2, 2026 at 08:25 PM EDT)

2 min read

Source: Dev.to

Problem I was solving

Every creator I know spends 3–4 hours manually cutting video.
The algorithm rewards volume — not perfection.

Solution Overview

I built ClipFarmer, a SaaS that processes video entirely with on‑premise machine‑learning models instead of third‑party API wrappers.

Machine Learning Models

Whisper (HuggingFace) – automatic speech transcription.
YOLO + OpenCV (cv2) – scene detection.
Detectron2 – instance segmentation.
MediaPipe – pose and face landmark detection.
OpenCV (cv2) – backbone for all frame‑level operations.

These are real models running locally; no external API calls.

Effects Pipeline

Each effect is a cv2 pipeline that processes frames:

Color grading (dark moody, vintage grain, RGB split)
CRT scanline overlay
Motion blur
Skeleton overlay (MediaPipe pose)
Background removal (Detectron2 masks)
Transitions between clips using frame blending

Architecture

Backend

FastAPI + Celery + RabbitMQ + Redis

AI / Computer Vision Stack

Whisper, YOLO, Detectron2, MediaPipe, OpenCV

Storage

MinIO (self‑hosted S3‑compatible, presigned uploads)

Frontend

React + Vite + TailwindCSS

Database

PostgreSQL + SQLAlchemy (async)

Deployment

Docker Compose on a VPS

Task Orchestration

# Celery chord that runs the whole pipeline
workflow = chord(
    spliter_clip.s(job.job_id, input_path),
    workflow_tasks_parallel.s()
)
task_result = workflow()

The workflow first splits the video, then applies effects, subtitles, and transitions in parallel.

Regional Considerations (Senegal & West Africa)

Mobile money (Wave, Orange Money) is the primary payment method; credit cards are rare.
ClipFarmer accepts Wave and Orange Money natively.
Many AI tools seen locally are scams or inaccessible, so providing a locally hosted solution is crucial.

Challenges Faced

Conflicting ML dependencies across models.
Presigned uploads are mandatory for large video files.
cv2 frame processing is slow without proper batching.
Docker networking can be unexpectedly restrictive.

Availability

Live at clipfarmer.site – free credits are available for testing.

Call for Feedback

I’m curious: has anyone else built a cv2‑based processing pipeline? What would make you switch from manual editing to an automated solution like ClipFarmer?