Google's Gemini Omni can generate 'anything from any input,' starting with video
Source: Engadget

Gemini Omni Announcement
Google unveiled Gemini Omni during its latest round of Gemini announcements at Google I/O. The company describes the new model as capable of “creating anything from any input — starting with video.” The first iteration, Gemini Omni Flash, is rolling out today to the Gemini app, Google Flow, and YouTube Shorts.
Key Features
- Multimodal Input – Users can combine images, audio, video, and text as input and generate high‑quality videos grounded in Gemini’s real‑world knowledge.
- Conversational Editing – Video edits are performed through natural language instructions, with each command building on the previous one to keep characters and elements consistent.
- Expanded Capabilities Over Veo 3.1 – Unlike the earlier Veo 3.1, which was limited to prompts and images, Omni accepts a broader range of inputs and can modify existing footage. For example, you can ask Omni to change the action, add new characters or objects, or transform a scene’s environment, angle, style, or specific details.
- Physical‑World Understanding – The model better grasps forces such as gravity, kinetic energy, and fluid dynamics, resulting in more realistic scenes.
- Contextual Storytelling – Gemini’s knowledge of history, science, and culture helps bridge photorealism with meaningful storytelling, enabling the creation of explanatory videos from short prompts.
- Voice‑Based Avatars – Users can generate a digital avatar that looks and sounds like them by providing their own voice.
- Safety Measures – Google states it has clear policies to protect users and is testing audio‑editing features responsibly. All generated videos carry an imperceptible SynthID digital watermark to verify their origin.
Availability
Gemini Omni Flash is now available to all Google AI Plus, Pro, and Ultra subscribers worldwide. It is also rolling out to users of YouTube Shorts and the YouTube Create app starting this week.