How We Built Transcript-Powered Video Editing in Go
Source: Dev.to
Overview
You recorded a 5‑minute product walkthrough that contains a lot of filler words (e.g., “um” appears 11 times) and a shaky first 30 seconds while fumbling with screen‑share.
The original title is “Recording 2/20/2026 3:45:12 PM.”
Historically, editing meant dragging tiny handles on a timeline and scrubbing back‑and‑forth to find the exact millisecond. Removing each filler required locating it manually and trimming it one‑by‑one.
Now, with SendRec v1.45.0, three transcript‑powered features make this painless:
- Trim by transcript – set start/end points by clicking transcript segments.
- Filler‑word removal – detect and cut filler‑only segments in bulk.
- AI‑generated title suggestions – replace generic titles with concise AI‑crafted ones.
All three rely on the same word‑level transcript data that Whisper cpp already provides (timestamps, text, segment boundaries).
1️⃣ Trim by Transcript
UI Changes
- The trim modal now displays the video’s transcript segments below the timeline.
- Two mode buttons: “Set Start” and “Set End.”
- Clicking a segment in Set Start mode moves the trim start to that segment’s timestamp and automatically switches to Set End.
- Clicking another segment defines the end point.
Visual Feedback
- Segments inside the selected range → green highlight.
- Segments outside the range → 40 % opacity.
The original draggable handles remain functional; the transcript panel is an addition, not a replacement.
If a video lacks a transcript (still processing or transcription failed), the modal falls back to the classic timeline‑only UI.
2️⃣ Filler‑Word Removal
How It Works
-
Open the Library overflow menu → click “Remove fillers.”
-
A modal scans the transcript for filler‑only segments using this regex:
/^\s*(?:um+|uh+|uhh+|hmm+|ah+|er+|you know|like|so|basically|actually|right|i mean)[.,!?\s]*$/i -
The modal lists every detected filler with:
- Checkbox (checked by default)
- Timestamp
- Text
-
A summary line shows the total count and saved time, e.g.,
Found 11 filler words (4.2 s total). -
Uncheck any filler you want to keep, then press Remove.
Backend API
POST /api/videos/{id}/remove-segments
Content-Type: application/json
{
"segments": [
{ "start": 3.2, "end": 3.8 },
{ "start": 12.1, "end": 12.9 },
{ "start": 25.0, "end": 25.6 }
]
}
Validation Rules
- No overlapping segments.
- Segments must be sorted (ascending start times).
- No negative timestamps.
- All timestamps must be ≤ video duration.
- ≤ 200 segments per request.
- The final video must retain ≥ 1 second of footage.
ffmpeg Implementation
ffmpeg -i input \
-filter_complex "
[0:v]select='not(between(t,3.2,3.8)+between(t,12.1,12.9)+between(t,25.0,25.6))',
setpts=N/FRAME_RATE/TB[v];
[0:a]aselect='not(between(t,3.2,3.8)+between(t,12.1,12.9)+between(t,25.0,25.6))',
asetpts=N/SR/TB[a]"
-map "[v]" -map "[a]" output
- The
select/aselectfilters exclude the listed ranges. setptsandasetptsreset timestamps so the output plays continuously without gaps.
After trimming, the video receives a new thumbnail and is re‑transcribed (old timestamps become obsolete).
3️⃣ AI‑Generated Title Suggestions
Trigger
When the summary worker finishes transcription, it checks whether the video still has an auto‑generated title such as:
- “Recording 2/20/2026 3:45:12 PM”
- “Untitled Recording”
If so, it sends the first portion of the transcript to the AI client with this prompt:
Prompt
Given this video transcript, generate a concise title (3‑8 words) that captures the main topic. Return ONLY the title text, no quotes, no explanation. Write in the same language as the transcript.
The AI’s response is stored in a suggested_title column – it does not overwrite the existing title.
UI
- In the Library, videos with a suggested title show a small indicator beneath the title.
- Clicking the indicator opens a modal with Accept and Dismiss buttons.
- Accept → updates the video title.
- Dismiss → clears the suggestion.
Both actions are one‑click decisions.
Infrastructure
- The title generation piggybacks on the existing summary worker; no extra queue or services are required.
- If the AI service is unavailable, no suggestion is stored and the original title remains unchanged.
Trade‑offs & Limitations
| Feature | Trade‑off |
|---|---|
| Word‑level timestamps (instead of phrase‑level) | Improves filler detection but requires a different Whisper model configuration and more storage. |
| Real‑time filler detection during recording | Adds complexity; current implementation works post‑recording. |
| Automatic filler removal without preview | Faster, but risky – words like “like” or “so” can be legitimate content. |
| Batch filler removal across multiple videos | Would need a bulk‑API and UI; not yet implemented. |
Note: Whisper segments are phrase‑level. A segment like “um, so basically” is treated as a single unit; we cannot cut only the “um” because we lack word‑level timestamps. Consequently, some fillers embedded in longer phrases are intentionally left untouched to avoid false positives.
Deployment & Migration
- The three features are live at app.sendrec.eu in v1.45.0.
- Self‑hosted installations receive them automatically on upgrade; the migration runs on startup.
- AI title suggestions require the environment variable
AI_ENABLED=true(or similar) to activate the AI client.
All original content has been retained; the markdown has been reorganized for clarity and readability.
A compatible LLM endpoint.
If you're self‑hosting SendRec, check out the self‑hosting guide.