Video Generation with AI Gateway
Source: Vercel Blog
How Video Generation Differs from Image Generation
- Prompts can include motion cues – camera moves, object actions, timing.
- Audio direction can be added (optional).
- Each provider exposes different capabilities through provider‑specific options that unlock fundamentally different generation modes.
See the [Provider‑Specific Options Documentation] for details.
Types of Video Generation Supported by AI Gateway
| Generation Mode | Description | Typical Use‑Case |
|---|---|---|
| Text‑to‑Video | Describe what you want; the model handles visuals, motion, and optionally audio. Great for hyper‑realistic, production‑quality footage from a simple text prompt. | Ad creatives, explainer videos, social content |
| Programmatic Video (API) | Generate videos on demand for your app, platform, or content pipeline. No licensing fees or production required – just prompts and outputs. | Scalable programmatic video at scale |
| Image‑to‑Video | Turn a simple prompt (or a starting image) into polished video clips for social media, ads, or storytelling with natural motion and cinematic quality. | Creative content generation, product animation |
| Reference‑to‑Video | Provide reference images or videos of a person/character; the model extracts appearance & voice to generate new scenes starring them with a consistent identity. | Spokesperson content, consistent brand characters |
| First‑and‑Last‑Frame | Define start and end states (two images); the model generates a seamless transition between them. | Before/after reveals, time‑lapse, outfit swaps |
| Video Editing / Style Transfer | Supply a source video URL and describe the desired transformation; the model applies a new style while preserving original motion. | Watercolor‑style videos, artistic re‑renders |
Example Models & Their Typical Workflows
| Model (Provider) | Generation Mode | Example Prompt / Use‑Case |
|---|---|---|
| klingai/kling‑v2.6‑t2v | Text‑to‑Video | “Generate a 30‑second cinematic travel video of a sunrise over the Alps, with gentle camera pans and ambient orchestral music.” |
| google/veo‑3.1‑generate‑001 | Text‑to‑Video (high‑fidelity) | “Create a photorealistic kitchen scene with a chef chopping vegetables, realistic lighting, and synchronized sound effects.” |
| klingai/kling‑v2.6‑i2v | Image‑to‑Video | Provide a product photo URL + “Add a slow 360° rotation and subtle lighting changes.” |
| klingai/kling‑v3.0‑i2v‑imagelastFrame | First‑and‑Last‑Frame | Upload “before” and “after” product images → “Generate a smooth transition showing the product assembling.” |
| alibaba/wan‑v2.6‑r2v‑flash | Reference‑to‑Video | Supply two reference images of a dog → “Create a short video of the dog playing fetch in a park, preserving its identity.” |
| xai/grok‑imagine‑video | Video Editing / Style Transfer | Source video URL + “Apply a watercolor painting style while keeping the original motion.” |
Tip: For multi‑reference generation (e.g., multiple characters), include tags like
character1,character2, etc., in the prompt. See the [Wan Prompt Guide] for best practices.
Model‑Creator Capabilities Overview
| Provider | Text‑to‑Video | Image‑to‑Video | First‑&‑Last‑Frame | Reference‑to‑Video | Audio Generation | Video Editing |
|---|---|---|---|---|---|---|
| xAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Alibaba Wan | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| Kling | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ |
| Google Veo | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
Getting Started
1. Programmatic Access (One API, One Auth Flow)
AI SDK 6 lets you generate videos programmatically using the same interface you already use for text and images.
• One API endpoint
• Unified authentication
• Central observability dashboard for your entire AI pipeline
# Example: Generate a 10‑second video from a text prompt
curl -X POST https://api.ai-gateway.com/v1/video \
-H "Authorization: Bearer " \
-H "Content-Type: application/json" \
-d '{
"model": "klingai/kling-v2.6-t2v",
"prompt": "A futuristic city skyline at dusk, drone fly‑through, synthwave soundtrack",
"duration_seconds": 10,
"aspect_ratio": "16:9"
}'
2. No‑Code Playground
Each model page includes an embedded, configurable playground where you can:
- Compare providers side‑by‑side
- Tweak prompts and provider options in real time
- Download results without writing a single line of code
Access the playground via AI Gateway → Model List → Video Generation.
Provider Spotlights
| Provider | Strengths | Notable Model(s) |
|---|---|---|
| xAI – Grok Imagine | Fast, strong instruction following; video editing & style transfer in seconds. | xai/grok-imagine-video |
| Alibaba – Wan | Reference‑based generation, multi‑shot storytelling, identity preservation across scenes. | alibaba/wan-v2.6-r2v-flash |
| Kling | Excellent image‑to‑video, native audio, new 3.0 models support multishot video with automatic scene transitions. | klingai/kling-v3.0-i2v-imagelastFrame |
| Google – Veo | Highest visual fidelity, realistic physics, native audio generation with cinematic lighting. | google/veo-3.1-generate-001 |
Documentation & Resources
- [Video Generation Documentation] – Full reference guide.
- [Video Generation Quick‑Start] – Step‑by‑step tutorials and sample scripts.
- Changelogs – Detailed examples and prompt updates for each model.
Quick Reference Tables
Generation Types
| Type | Required Inputs | Optional | Typical Output |
|---|---|---|---|
| Text‑to‑Video | Text prompt | Aspect ratio, duration, audio cues | Full‑length video |
| Image‑to‑Video | Image URL (or upload) | Text prompt for motion, audio | Animated clip |
| First‑and‑Last‑Frame | Two images | Prompt for transition style | Seamless transition video |
| Reference‑to‑Video | Images or video clips of a character | Prompt describing new scenes | Video starring the referenced entity |
| Video Editing | Source video URL | Style description, audio overlay | Stylized video |
Model‑Creator Capabilities
| Provider | Text‑to‑Video | Image‑to‑Video | First‑&‑Last‑Frame | Reference‑to‑Video | Audio | Video Editing |
|---|---|---|---|---|---|---|
| xAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Wan | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| Kling | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ |
| Veo | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
Next Steps
- Read the full docs – familiarize yourself with provider‑specific options.
- Pick a model – start with the playground to experiment.
- Integrate via API – use the sample cURL request (or SDK) to embed video generation into your product.
Happy creating! 🚀
- deo
- image-to-video
- audio