From Prototype to Production: Building a Multimodal Video Search Engine

Published: (January 6, 2026 at 05:46 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

Overview

In the previous post I explored the power of model stacking for media search by combining CLIP, Whisper, and ArcFace to locate video content through visual descriptions, dialog, and faces. Over the holidays I turned that afternoon hack into a more production‑ready system.

Live Demo

  • Demo site: (desktop browser)
  • Starter code:

Example workflow

  1. In the Visual Content tab, type “older man on phone, harbor background” → click +.
  2. Click the face of the older guy with glasses sitting against the harbor.
  3. In the Dialog (Semantic mode) tab, type “Americans had launched their missiles” → click +.
  4. Play the resulting clip.

You’ve drilled down to an exact shot without relying on metadata, timecodes, or exact wording. The semantic search is fuzzy—e.g., the transcript says “What it was telling him was that the US had launched their ICBMs,” and the query still matches.

Architecture

  • Frontend: Vue.js served by Nginx
  • Backend: FastAPI
  • Ingest worker: Standalone process that polls for new media, handling drive mounting/unmounting gracefully (Watchdog is unreliable with NFS/network shares)
  • Database: PostgreSQL with the pgvector extension for vector similarity search

All components are orchestrated with docker‑compose.

Features

  • Background enrichment – Worker continuously processes new files and extracts visual, audio, and facial embeddings.
  • Semantic dialog search – Uses sentence‑transformer embeddings; queries like “Americans launched missiles” retrieve clips containing “US fired rockets.”
  • Frame‑accurate playback – HTML5 video decoded to a canvas via requestVideoFrameCallback().
  • EDL export – Queue selected scenes and export a CMX 3600 edit decision list for NLE round‑tripping.
  • Unified query – PostgreSQL + pgvector enables vector similarity combined with metadata filtering in a single query.

Code

The full source code and Docker configuration are available at:

Acknowledgements

  • Demo footage is from Pioneer One, a Creative Commons‑licensed Canadian drama.
  • Significant assistance was provided by Claude Code.
Back to Blog

Related posts

Read more »

Rapg: TUI-based Secret Manager

We've all been there. You join a new project, and the first thing you hear is: > 'Check the pinned message in Slack for the .env file.' Or you have several .env...

Technology is an Enabler, not a Saviour

Why clarity of thinking matters more than the tools you use Technology is often treated as a magic switch—flip it on, and everything improves. New software, pl...