使用 Gemini 3 + Decart AI 构建实时视频重塑代理
发布: (2025年12月17日 GMT+8 06:39)
3 min read
原文: Dev.to
Source: Dev.to
概览
Google 的 Gemini 3(于 2025 年 11 月 18 日发布)提供多模态推理和工具使用能力,帮助构建响应准确的 AI 应用。结合 Decart AI 的 Mirage LSD——首个支持 24 FPS 零延迟视频重风格化的实时流扩散模型,你可以创建一个实时重风格化视频的代理。
processor = decart.RestylingProcessor(
initial_prompt="Change the video style to a cute animated movie with vibrant colours",
model="mirage_v2"
)
llm = gemini.LLM(model="gemini-3-pro-preview")
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Story teller", id="agent"),
instructions="You will use the Decart processor to change the style of the video and the user's background.",
llm=llm,
tts=elevenlabs.TTS(voice_id="N2lVS1w4EtoT3dr4eOWO"),
stt=deepgram.STT(),
processors=[processor],
)
@llm.register_function(
description="Changes the prompt of the Decart processor, which in turn changes the video style and background."
)
async def change_prompt(prompt: str) -> str:
await processor.update_prompt(prompt)
return f"Prompt changed to {prompt}"
return agent
async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
"""Join the call and start the agent."""
await agent.create_user()
call = await agent.create_call(call_type, call_id)
logger.info("🤖 Starting Agent...")
with await agent.join(call):
logger.info("Joining call")
logger.info("LLM ready")
await agent.finish() # Run until the call ends
if __name__ == "__main__":
cli(AgentLauncher(create_agent=create_agent, join_call=join_call))
运行代理
将你的 API 凭证存放为环境变量(或放在项目根目录的 .env 文件中):
export GOOGLE_API_KEY=your_google_key
export DECART_API_KEY=your_decart_key
export ELEVENLABS_API_KEY=your_elevenlabs_key
export DEEPGRAM_API_KEY=your_deepgram_key
export STREAM_API_KEY=your_stream_key
export STREAM_API_SECRET=your_stream_secret
然后启动脚本:
uv run main.py
浏览器会打开一个视频通话界面并自动加入。授予摄像头和麦克风权限后,使用诸如 “Make my video Studio Ghibli.” 的口令进行操作。视频流会实时转换。
示例交互
You: "Make it Neon Nostalgia."
Agent: "OK, I've updated the video style to Neon Nostalgia."
You: "Make it a War Zone."
Agent: "OK, I've updated the video style to a War Zone."
资源
- Vision Agents – official site | GitHub repository
- Decart AI – homepage | Decart plugin for Vision Agents
- Gemini 3 documentation –
- Stream (WebRTC) –
动手试一试,玩转各种风格——后末日巴黎、梵高的《星夜》或任何你能想象的画面。 🎨