Gemini Omni: Now Create and Edit Videos Using Voice Commands—Google Launches New AI Tool..

By Shikha Saxena | May 21, 2026, 13:15 IST

Google has introduced a new video generation tool, "Gemini Omni," to its Gemini AI family. The company states that this is an AI model capable of creating anything from any given input. Currently, it is launching with video generation capabilities.

Gemini Omni combines Gemini's understanding and reasoning abilities with creative AI features. This means it will not merely generate videos; it will also comprehend what logically needs to happen next within a specific scene.

Gemini Omni: Accepts Commands in Multiple Formats
The first model in this new Omni family is Gemini Omni Flash. It is launching with video creation capabilities. You can provide images, audio, video clips, or text as input. This model is available starting today on the Gemini app, Google Flow, and YouTube Shorts. In the future, it will also be capable of generating standalone images and audio.

Create and Edit Videos Using Voice Commands
The most significant feature of Gemini Omni Flash is its conversational video editing capability. Users can edit videos by issuing commands in natural language—such as changing the background, adding new characters, removing objects, altering camera angles, or modifying styles and effects. The company notes that each new instruction builds upon the previous one, thereby maintaining continuity in the video's storyline, characters, and scenes.

AI Now Understands the Narrative of a Video
According to Google, Gemini Omni will not simply generate photorealistic videos; it will also leverage real-world knowledge. It has been endowed with an understanding of subjects such as history, science, cultural contexts, and physics. This means the AI will not just create aesthetically pleasing scenes, but will produce videos that are more logical and grounded in reality.

Support for Text, Photos, Videos, and Audio—All Included
Gemini Omni is launching with video creation at its core. You can provide images, audio clips, existing videos, or text as input. You can start from whatever assets you currently possess—whether it is a single photograph, an old video clip, or just a piece of text—and Omni will combine them all to produce a stunning final output. Currently, only voice input is supported for audio. However, other audio inputs will be added soon. You can also easily incorporate your preferred style and motion.

Disclaimer: This content has been sourced and edited from Amar Ujala. While we have made modifications for clarity and presentation, the original content belongs to its respective authors and website. We do not claim ownership of the content.

Gemini Omni: Now Create and Edit Videos Using Voice Commands—Google Launches New AI Tool..

Around The Web

You May Also Like

Trending