Google's New AI: Now Edit Studio-Level VFX on Your Phone Using Voice Commands—You Too Can Create Bollywood-Style Videos..
Taking a major stride in the realm of artificial intelligence (AI), Google has launched Gemini Omni. This represents a new series of AI models adept at simultaneously understanding and generating not only text but also multimedia content—specifically photos, audio, and video. Google's ultimate objective is to achieve Artificial General Intelligence (AGI), and this model marks a significant step in that direction.
The first tool in this series is Gemini Omni Flash. It is being directly integrated into the Gemini app, Google Flow, and YouTube Shorts. Currently, its primary focus lies on video creation; however, Google states that in the future, it will also be capable of directly generating images and audio.
How does Gemini Omni work?
Gemini Omni functions as a multimodal engine. Simply put, it does not first convert various types of video or audio into text; instead, it comprehends all of them directly and simultaneously. According to Google, it operates through three primary mechanisms:
1. Video Editing via Voice or Text Commands
You no longer need complex or cumbersome software to edit videos. You can now edit videos simply by speaking or typing commands in your natural, everyday language. The standout feature of this system is its "smart memory," which retains context from your previous interactions.
Consequently, when you issue a series of consecutive instructions, the AI maintains complete accuracy and consistency regarding the characters, backgrounds, and camera angles within the video. You can instruct the tool to modify specific elements within a video, remove backgrounds, introduce new characters, or alter the entire visual style of the video. A glimpse of this capability was shared on X by Google CEO Sundar Pichai ahead of Google I/O 2026, demonstrating just how streamlined the future of video creation is set to become.
2. Leveraging Physics for Motion
Gemini Omni does not merely mimic visual patterns; it also possesses a deep understanding of the underlying physics governing a scene. It calculates gravity, fluid dynamics, and the laws of motion. The motion generated by it appears completely lifelike. Furthermore, it is capable of synthesizing information derived from various distinct sources.
You can provide it with inputs such as a photograph of a character, a textual description of a location, and a video clip featuring a specific artistic style; it then skillfully blends all these contextual elements to produce a stunning, unified video.
3. You Can Also Create Your Own Digital Avatar
One of Gemini Omni's most exciting capabilities is the ability to create your own digital avatar—one that looks exactly like you and speaks in your own voice. However, remaining mindful of the growing risks associated with deepfakes and misinformation, Google has currently restricted public access to its voice-editing features to allow for thorough testing.
Additionally, demonstrating its commitment to safety, Google has integrated 'SynthID' technology into the system. Every video generated by this AI contains a hidden, invisible digital watermark. While imperceptible to the human eye, this watermark allows Google Search or the Gemini app to easily detect whether a video was created by AI or is authentic.
Who Will Get Access to This Feature?
Google is beginning to roll out Gemini Omni Flash in phases starting this week. As part of this rollout, the feature is now available on the Gemini app and Google Flow for premium users subscribed to Google AI Plus, Pro, or Ultra plans.
Meanwhile, for general users, this service will be made available completely free of charge later this week via YouTube Shorts and the YouTube Create app. Furthermore, within the coming weeks, this technology will also become accessible to developers and corporate clients through APIs.
Disclaimer: This content has been sourced and edited from Amar Ujala. While we have made modifications for clarity and presentation, the original content belongs to its respective authors and website. We do not claim ownership of the content.

