Building a Live AI Comedy Roast Show with Gemini

Published: 1 month ago (March 16, 2026 at 12:37 PM EDT)

7 min read

Source: Dev.to

Source: Dev.to

Gemini Roast LIVE

This post was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge

I built GRL: Gemini Roast LIVE, an AI comedy roast show where AI comedians watch you through your camera and roast you in real time. You can talk back, and they respond. Below is how I built it with Gemini and what I learned along the way.

The Idea

Most voice‑AI demos are chatbots. I wanted to build something people would actually want to use for fun: a live comedy show with an MC host and multiple comedians, each with their own voice and comedy style.

The roast format turned out to be a great test for the Gemini Live API. Comedy requires fast responses, consistent character voice, and the ability to riff off what the audience says. If the AI pauses too long or breaks character, the comedy falls apart.

Three Modes, One App

The app offers three ways to get roasted:

Mode	Description
Live Roast Show	The main feature. An AI MC hosts the show and introduces three comedians (randomly picked from a pool of ten personas). Each comedian gets its own Gemini Live API session with a unique voice. They see you through your camera, hear you through your mic, and roast what they observe. You can talk back. Flip to the rear camera to point it at a friend if you want them roasted instead.
Photo Roast	Takes a selfie and produces a duo comedy act. Two randomly selected personas perform an alternating script, rendered as multi‑speaker audio with synchronized highlighting.
SNS Roast	Generates a comedy video from an uploaded photo using Veo 3.1, with extension chaining for clips longer than 8 seconds.

Roast / Boost Toggle

Every mode includes a Roast/Boost toggle:

Roast (dark theme) – ≈ 95 % savage comedy.
Boost (light theme) – ≈ 90 % genuine praise with humor mixed in.

The Architecture

The backend runs on Python FastAPI with Google SDK. The key challenge was the Live API’s one‑voice‑per‑session constraint. I needed ten different comedian voices plus an MC, but each session can only have one voice.

Session‑Switching Architecture

MC session – The MC runs on its own Live API BIDI session.
Trigger detection – When the MC calls a comedian’s name, a ShowDirector class detects it via a regex on the transcription text.
Create comedian session – ShowDirector creates a new, independent Live API session for that comedian with its assigned voice.
Forward media – Camera frames and microphone audio are forwarded to the active session.
End turn – When the comedian’s turn ends (turn limit or time limit), the session is destroyed.
Next turn – The next comedian receives a fresh session.

This turns the constraint into a feature—each comedian genuinely sounds different because they have independent sessions.

Gemini API Products Used

Product	Role in the App
Gemini Live API	Real‑time bidirectional voice for the MC and each comedian (BIDI streaming).
Gemini 2.5 Flash Lite	Image analysis (selfie/SNS) and comedy‑script generation.
Gemini Multi‑Speaker TTS	Two‑voice audio for Photo Roast duo acts.
Veo 3.1	Comedy video generation for SNS Roast with extension chaining.
Google ADK	Agent orchestration, Runner lifecycle, and session management.

Infrastructure

Google Cloud Run – hosts the services.
Secret Manager – stores API keys securely.
GitHub Actions – CI/CD pipeline using Workload Identity Federation.

The Hard Parts

FunctionTool didn’t work on newer models

I originally planned to have the MC use ADK’s FunctionTool to programmatically call comedians on stage. This works on the 09‑2025 native‑audio model, but on the 12‑2025 model the model writes “thinking” text about calling the tool without ever emitting a function call.

Solution: Switch to natural‑language detection. The MC says the comedian’s name naturally during the show, and a dynamic regex on the transcription text triggers the session switch. This proved more reliable and sounded more natural.

Veo extension requires exact original URIs

Veo 3.1 can generate 8‑second clips, and you can extend them. However, the extension API requires the exact original file URI that Veo generated.

The download URI (with :download?alt=media suffix) doesn’t work.
Re‑uploading through the Files API also fails because Veo metadata gets lost.

Fix: Strip the download suffix and pass the clean original URI. This behavior isn’t documented anywhere; I discovered it through trial and error.

Model version matters for comedy

gemini-3.1-flash-lite-preview was too conservative for roast comedy—its safety filtering produced mild output. Switching to gemini-2.5-flash-lite yielded noticeably funnier and more daring observational roasts while still respecting my custom content guidelines (no jokes about body type, skin color, disability, or sexuality).

Comedy prompt engineering is its own thing

Generic instructions like “be funny” produce generic results. I had to specify concrete comedy techniques for each persona:

Persona	Technique
Razor (Precision Striker)	“Start with something small you see, blow it up to absurd proportions, then stack 2‑3 more jokes on the same target.”
Frost (Deadpan Intellectual)	“Matter‑of‑fact devastation, rhetorical questions, polite savagery.”
Pops (Dad Joke Assassin)	“Weaponized puns, fake proverbs, wholesome‑to‑savage swerves.”

The roast intensity was calibrated from an initial 70 % roast / 30 % praise ratio up to 95 % roast / 5 % praise.

Rate limits with concurrent sessions

Running an MC session plus multiple comedian sessions means several simultaneous Gemini API calls. I implemented a dual‑key fallback:

If the primary API key receives a 429 response, the request automatically retries with a secondary key.
Veo uses round‑robin across all available clients.

10 personas, infinite combinations

The persona system (personas.py) is the single source of truth for all ten comedians, enabling endless permutations of line‑ups and interactions.

(The original post truncated here; the remainder of the description of the persona system follows the same structure.)

Gemini Live Agent Challenge – Comedy Show Demo

Overview

A comedy‑show‑style Gemini Live Agent that lets users pick a host (the MC) and three comedians. The MC introduces each act, and the comedians roast or boost the user with jokes, puns, and witty observations.

How It Works

Select a host – the MC (e.g., Samantha the Sassy Host).
Choose three comedians from a curated roster (e.g., Bob the Bawdy Bard, Lila the Lyrical Laughter‑Lord, Moe the Meta‑Mirth‑Man).
The MC introduces each act and hands the floor to the chosen comedian.
Each comedian delivers a roast (or a boost) using a unique Gemini voice from a pool of ten Live API voices.
The MC wraps up with a closing line and a call‑to‑action.

Note: The MC’s prompt is built dynamically from the three selected comedians, so every show is different.

Persona Definition (one entry per comedian)

Element	Description
Stage name & comedy style	e.g., Bob the Bawdy Bard – “shakespeare‑savvy, raunchy wordplay.”
Unique Gemini voice	Chosen from the 10‑voice Live API pool (e.g., Voice‑3 – Warm & Witty).
System prompts	• Roast mode – prompt that tells Gemini to generate sharp, playful insults. • Boost mode – prompt that tells Gemini to generate uplifting, confidence‑building jokes.
UI color & icon	Hex color (e.g., `#FF6F61`) and an emoji/icon (e.g., 🎭).
Comedy‑technique instructions	Specific guidelines (e.g., “use puns”, “reference pop‑culture”, “keep it PG‑13”).

Each show randomly selects a combination of these personas, guaranteeing that no two performances are identical.

What’s Next

The feature I’m most excited about is Podcast Roast Mode:

Two comedian agents hold a free‑form conversation about the user (similar to NotebookLM’s podcast format).
The user can jump in at any point.
This would require multi‑voice support within a single Live API session.

If multi‑voice support becomes available, it will enable a far more dynamic and immersive format.

Try It

Live app: Deployed on Google Cloud Run
Code:

Built with Google ADK, Gemini Live API, Gemini 2.5 Flash, Gemini Multi‑Speaker TTS, Veo 3.1, Google Cloud Run, and Google Cloud Secret Manager for the Gemini Live Agent Challenge.

#GeminiLiveAgentChallenge