I built a real AI video processing SaaS from Senegal no GPT wrappers, just HuggingFace + OpenCV + YOLO + Detectron2+Medidapie+ Celery

Published: (May 2, 2026 at 08:25 PM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Problem I was solving

Every creator I know spends 3–4 hours manually cutting video.
The algorithm rewards volume — not perfection.

Solution Overview

I built ClipFarmer, a SaaS that processes video entirely with on‑premise machine‑learning models instead of third‑party API wrappers.

Machine Learning Models

  • Whisper (HuggingFace) – automatic speech transcription.
  • YOLO + OpenCV (cv2) – scene detection.
  • Detectron2 – instance segmentation.
  • MediaPipe – pose and face landmark detection.
  • OpenCV (cv2) – backbone for all frame‑level operations.

These are real models running locally; no external API calls.

Effects Pipeline

Each effect is a cv2 pipeline that processes frames:

  • Color grading (dark moody, vintage grain, RGB split)
  • CRT scanline overlay
  • Motion blur
  • Skeleton overlay (MediaPipe pose)
  • Background removal (Detectron2 masks)
  • Transitions between clips using frame blending

Architecture

Backend

  • FastAPI + Celery + RabbitMQ + Redis

AI / Computer Vision Stack

  • Whisper, YOLO, Detectron2, MediaPipe, OpenCV

Storage

  • MinIO (self‑hosted S3‑compatible, presigned uploads)

Frontend

  • React + Vite + TailwindCSS

Database

  • PostgreSQL + SQLAlchemy (async)

Deployment

  • Docker Compose on a VPS

Task Orchestration

# Celery chord that runs the whole pipeline
workflow = chord(
    spliter_clip.s(job.job_id, input_path),
    workflow_tasks_parallel.s()
)
task_result = workflow()

The workflow first splits the video, then applies effects, subtitles, and transitions in parallel.

Regional Considerations (Senegal & West Africa)

  • Mobile money (Wave, Orange Money) is the primary payment method; credit cards are rare.
  • ClipFarmer accepts Wave and Orange Money natively.
  • Many AI tools seen locally are scams or inaccessible, so providing a locally hosted solution is crucial.

Challenges Faced

  • Conflicting ML dependencies across models.
  • Presigned uploads are mandatory for large video files.
  • cv2 frame processing is slow without proper batching.
  • Docker networking can be unexpectedly restrictive.

Availability

Live at clipfarmer.site – free credits are available for testing.

Call for Feedback

I’m curious: has anyone else built a cv2‑based processing pipeline? What would make you switch from manual editing to an automated solution like ClipFarmer?

0 views
Back to Blog

Related posts

Read more »