Google's Gemini Omni is no longer just a control layer—it's now a video creation environment, with the ability to turn text, image, and video prompts into what Google calls "cinematic, high-quality" outputs directly inside the Gemini app. The update, rolling out starting May 19, 2026 to Google AI Plus, Pro, and Ultra subscribers worldwide, marks the first time Google has positioned a Gemini model as a front-end, consumer-facing video tool with explicit cinematic ambitions—not a research demo, not an API embedded in someone else's product.
For filmmakers and media pros, the important question isn't whether Google can generate video. It's where this capability lives and what it connects to—and that's where this update is worth paying attention to.
What Gemini Omni Can Do with Video: A multimodal input → video generation and conversational editing workflow, with AI avatar creation built in. What's Still Missing: Output specs, usage limits, licensing terms, and watermarking details are all absent from Google's announcement. Why It Matters: Gemini is bridging its consumer app and professional infrastructure—a pattern we've been tracking across Lyria 3, the Avid partnership, and the AI pointer work.
From the Source Material: Google says Gemini Omni now handles video through four headline capabilities:
Multimodal prompting: Feed it text, images, or existing video and it generates video outputs. That covers text-to-video, image-to-video, and video-to-video transformations—though Google provides no details on duration, resolution, or frame rates in its blog post.
Conversational editing: Adjust your footage in natural language. Google specifically calls out cinematic zooms, background swaps, and built-in templates as available edit types.
Custom AI avatars: Record or upload material of yourself and Gemini Omni builds a talking avatar version of you for video outputs—one that "looks and sounds like" you, according to Google.
Built-in templates: Pre-built formats for faster output, though the blog doesn't describe what those templates contain or whether they're editable.
Prompt to Picture: Gemini Omni's multimodal input means you're not limited to describing a shot in text—you can hand it a still from a location scout, a reference frame, or existing footage and work from there.
The three input modes—text, image, and video—open up a few practical use cases even before knowing the technical ceiling:
Generate a rough visual from a script page or treatment
Animate a still reference into a moving shot to communicate tone
Transform or reframe existing material via prompt
The catch is that Google's blog gives no technical floor or ceiling: no supported resolutions, no maximum clip duration, no frame rate options. Until those details surface, it's difficult to assess whether outputs are useful beyond 1080p social content.
Talking Through the Timeline: Conversational editing is the feature with the most direct relevance to production workflows—if it works reliably.
The ability to request a cinematic zoom or background swap via natural language, rather than jumping into a timeline, has real value as a concepting and previsualization tool. Consider:
Drafting multiple framing options from a single generated shot, without touching a node graph or timeline
Swapping location backgrounds to pitch client options before committing to a scout or VFX pass
Roughing in camera moves for a storyboard or animatic review
What the blog doesn't clarify is how precise this control gets. Can you target a specific frame? Specify focal length or easing on a push-in? Maintain continuity across multiple shots in a sequence? Without those answers, "conversational editing" describes a broad range of capability—from highly useful to barely sketched.
The Avatar Question: The custom AI avatar feature is the one that will land differently depending on your job title.
For solo creators and internal communications teams, the ability to scale a single person's on-camera presence—training videos, internal updates, marketing—without booking studio time has real efficiency value.
For anyone working with talent, on-camera hosts, or brand campaigns, the feature runs directly into contract, consent, and union territory. Google's blog doesn't address:
How much source material is required to build an avatar
Whether outputs carry any form of watermarking (visible or metadata-based, like Lyria 3's SynthID approach)
What disclosures or consent flows exist for third-party likenesses
What license users hold over generated avatar performances
Until those questions have answers from Google, most production and agency environments will treat this as an internal-use feature only—not something that touches broadcast or client-facing delivery.
Where Omni Fits in Google's Creative Stack: We've covered Gemini's expanding role in media workflows across several fronts—and this update connects to all of them.
Our coverage of Google's AI pointer focused on Gemini Omni as a control interface—a way to talk to your computer and direct existing software. This update moves Omni from directing tools to being the tool. That's a meaningful shift in how Google is positioning the assistant.
On the professional infrastructure side, Google Cloud's multi-year partnership with Avid brings Gemini models into Media Composer workflows via Vertex AI—firmly in the high-end post environment. Gemini Omni's new video features sit at the other end of that spectrum: consumer-tier, app-based, accessible to anyone on a Google AI subscription.
Google is building toward a stack where the same underlying model family powers both professional editing infrastructure and a chat-style creation app. This update is a step toward closing that gap at the creative layer.
Before You Build a Workflow: Several questions need answers before Gemini Omni's video tools belong in any professional pipeline:
Output specs: What resolutions and aspect ratios are supported? Are there per-clip duration limits or monthly generation caps by subscription tier?
Commercial rights: Are outputs licensed for broadcast, streaming, or paid campaigns? Google's blog is silent on this.
Watermarking and disclosure: Are generated videos or avatars tagged—visibly or via metadata—the way Lyria 3's audio outputs carry SynthID?
Collaboration: Can teams share and iterate on projects? Is there any connection to Google Drive, Workspace, or the Avid integration already in place via Google Cloud?
None of these are answered in Google's announcement. For a production, agency, or studio environment, none of them are optional.
The Long Exposure: Gemini Omni's video tools are worth tracking because of where they sit in Google's ecosystem, not just what they do on day one.
Google has shown a pattern with its professional creative tooling: Lyria 3 launched with SynthID watermarking and explicit commercial-use framing built in from the start. The Avid partnership was structured around enterprise-grade workflow integration. If Gemini Omni's video features follow the same arc—adding licensing clarity, watermarking, and professional controls over time—what starts as a consumer sandbox could evolve into a legitimate previs and concepting environment.
For now, the most honest framing is this: Gemini Omni's new video tools are a capable-sounding ideation layer with too many unanswered questions to slot into a delivery pipeline. Use it to sketch, pitch, and explore. Hold on production-grade commitments until Google fills in the technical and legal details.


