We Ran Google Omni Against Runway Aleph 2 on the Same Shots

We got back from Google I/O with hands-on access to Omni, Google's new video world model, and we pitted it against Runway Aleph 2 on identical source footage. On the Denoised podcast, we break down what Omni is actually good at (hint: not Veo's job), test the avatar, and work through Flow's agentic editor, Google Pics, and Demis Hassabis on AGI timing.

Quick Take

Google dropped a stack of updates at I/O, and the framing that clicked was simple: Omni is not Veo 4. It is the video version of Nano Banana Pro, a world model built for modifying existing video rather than generating cinematic shots from scratch. That distinction changes how you use it, who it competes with, and why Runway Aleph 2 landed into a much harder comparison.

What We Tested: Omni as a Video-to-Video Model, Not a Veo Replacement

The first instinct on getting Omni access is to compare it to Kling, Seedance, and Veo for cinematic generation. That comparison falls flat. As Joey put it, "this is not Veo 4 and it's a completely different model. This is the video version of Nano Banana Pro."

What Omni is built for:

Video-to-video edits (Addy's preferred term over "editing," which still means cutting in our world)
Inpainting on video with object swaps that preserve motion and shadows
World understanding prompts that reason about objects, physics, and continuity
Sharper text rendering inside generated video

We ran a video-to-video test with a clip of someone walking a dog in Venice. Prompt: turn the dog into a robot. The dog became a matte-plastic robot dog while the leash, the hand grip, the pan, and the background transparency all held. As Addy noted, the model picked a quadruped robot rather than a biped: "that's reasoning. Like it's going through some sort of intelligence."

A second pass turned the dog into a chimpanzee and looked even more grounded once metals and reflections were off the table. A third test on a phone clip of a Google building, prompted "make the building take off like a spaceship," kept the building, added rocket exhaust, and busted out of the ground. We covered Omni's launch in the Gemini app separately for production pros.

The world understanding test: We prompted Omni to create single shots of vintage objects whose first letters spell DENOISED, with each letter visible on the object. Omni produced one clip stringing together a dial phone, newspaper, oil can, ink holder, stopwatch, and an Edison fan. Two letters drifted, but one prompt cut a coherent multi-object sequence with readable letterforms. That is what the "world model" label is pointing at.

What We Tested: The Avatar Feature That Buries Sora's

The avatar tool is buried in the Gemini app UI, and the first round of image-to-video tests with random photos was rough. Likeness fell apart. The actual workflow: calibrate on your phone like a Sora avatar, turn your head left and right, read off numbers. Google ties the avatar to your account.

Addy's take: "I think this is way better than Sora's avatar was." On the champagne shot and the Miami gray t-shirt shot, "that's 98% Joey."

Two caveats:

The model amplifies. Sharper jawline, fuller hair, more jacked. Addy called it the "common denominator handsome" tendency. Google could dial it down if they wanted.
Voice is the weak link. We calibrated in a hotel room on phone audio. The output didn't sound like him. Better mic input likely fixes it, but there is no Adobe Podcast Audio style cleanup happening here.

You can only build one avatar (yourself), and you can't grant permission for others to use it the way Sora characters work. We asked. The answer: maybe later, not on the roadmap. Addy's pitch for the real use case: faceless YouTube channels that want a synthetic talking head. For that audience, the current quality bar is already over the line.

What We Explored: Flow's Agentic Sidebar and a Vibe-Coded Tool Builder

Google Flow picked up the new Omni models, a character system, and a Gemini sidebar that operates as an agentic chat layer over the timeline.

Full breakdown in our coverage of Flow's agent update.

We tested it by asking for "10 different shots of the same person walking in the same field." Flow generated a reference image of the person, generated a reference image of the field, then used both as consistency inputs for the 10 shots. One vague prompt, three steps under the hood.

On the character system: We pressed Google on whether there is a special Omni variant powering Flow's character consistency. The answer was no. Same model the API will expose, with structuring under the hood. As Addy summarized, "there is no grand master plan to tie everything underneath with like a master workflow."

The tool builder: Flow added a vibe-coded mini-app builder inside the product. Describe a tool, get an applet. Examples included motion tracking. A public directory of community tools is already remixable. Open question: do creatives think in terms of building their own tools, or does this become a gateway to Unreal Engine or Google's Antigravity?

What We Explored: Google Pics Turns Nano Banana Into Editable Layers

Google Pics was the sleeper of the trip. Canva-shaped, built on Nano Banana. The trick: generated flyers and composites stay element-addressable after creation. Hover any piece of a generated design, click it, reprompt it, and only that element changes.

The demo: a generated "Rooted Future" flyer. We clicked the text and asked for it fancier. We clicked the left figure and put them in an astronaut spaceship suit. We clicked the right figure and put them in safari gear. The text restyled, both characters updated outfits while keeping likeness, the tape detail at the top stayed in place, the background held.

Addy's workflow: pair it with another image model for the first pass, then bring the result into Pics for layer-by-layer iteration. We could not confirm whether Pics accepts uploaded images for layered editing, but the inpainting and edge blending hold cohesion in a way we have not seen elsewhere.

What We Questioned: A 3-Year AGI Timeline and an AI Job Market Split

The Demis Hassabis fireside chat put AGI at three years out. Addy's read: the realistic window pushes further than the talking point suggests: "If they're saying AGI is now 3 years out, I'm guessing it's more like 5 to 6 years out, and ASI is probably decades away."

Demis spent significant time on AI for science, now its own Google division, with AlphaFold as the reference point. The pitch: if AI cures diseases, the conversation about job displacement looks different.

One Demis benchmark for AGI: give a model world knowledge up to roughly 1910 and see if it derives what Einstein and others did. Addy's connection: Yann LeCun's work abstracting away tokens in favor of vector-based, multimodal-by-default networks that keep absorbing inputs after training rather than freezing at the snapshot.

Gemini 3.5 Flash also dropped at I/O. It is the text model for coding and fast actions in the Gemini app, not a video model, and it beats benchmarks Google's own 3.1 Pro hit. The naming continues to confuse everyone, including us.

On the job market split, Addy's read: a hard fence between AI and non-AI roles: oversupply on the non-AI side with thousands of applicants per opening, undersupply on the AI side. ClickUp's CEO framed layoffs around expectations that remaining staff become 10x to 100x more effective. Meta reportedly cut roughly 10% while Zuck told staff the company is training on its own workforce.

What We Questioned: Waymo's Fringe Cases Got Real

Demis also talked about using Google's Genie world model to spin up simulated environments for Waymo fringe-case testing. His example: a Waymo surrounded by a forest fire. One-in-a-billion scenarios that never show up in real driving data.

Then the timing turned. Atlanta Waymos drove full-send into flooded streets, and the service has since paused. Joey: "they need to spin up some more Genie models to test out the fringe cases." Addy's stress test: drifting a car into a Waymo's lane in Santa Monica. The Waymo slowed, then honked when he got aggressive. More responsive than expected.

The broader Labs point: Google ships a lot because not every product survives. They know the kill-product meme. Trade-off is real velocity on creative tools with no guarantee of longevity.

What We Tested: Aleph 2 vs Omni on the Same Source Footage

Runway Aleph 2 shipped alongside Omni, which is brutal timing. We ran the same two source clips through both: the LACMA Metropolis sculpture pan and the dog-to-robot conversion from Venice.

What Aleph 2 added: a frame-picker slider. Pick any frame in the source video, edit that frame as guidance, propagate. Upload image references or generate a new first frame with Nano Banana Pro and feed it in. Real workflow improvement over text-only change descriptions.

What the head-to-head showed:

Metropolis sculpture: Aleph 2 was softer, less detailed than Omni. Camera tracking held, but texture work lost specificity.
Dog-to-robot: Aleph 2 produced a robot that still animated like a dog. Omni's robot dog had a gait rigged like a quadruped robot. As Addy described Aleph 2's approach: "it's like they're feeding it through OpenPose and just extracting the anchor points and then just attaching it to the new dog." Omni "fundamentally translated that animation into a completely robotic animation."

Aleph 2's inpainting on background and crowd held up. Shadows on the dog swam slightly. Addy's closing point: a small company hanging with Google and OpenAI on a release cadence Cristóbal Valenzuela's team worked on for six months to a year before shipping the same week as Omni. Direction is right. Timing is unforgiving.

We covered the prior Runway and Google update cycle for context on Runway's release cadence.

For context on Google's earlier cinematic model, see Google Veo 3.

Bottom Line: Omni Is a New Category, Not a Veo Upgrade

Five stories from I/O, one frame:

Omni is the video-to-video world model in the Gemini lineup. Stop comparing it to Veo. Compare it to Aleph 2, and on identical source footage Omni handled object reasoning more cleanly.
The avatar feature beats Sora's on visual likeness from a phone calibration but ships with weak voice synthesis and no sharing. Faceless YouTube is the obvious wedge.
Flow got an agentic chat layer and a remixable mini-tool builder. No secret model under it, just smarter scaffolding around the same API.
Google Pics turned Nano Banana into a layered, addressable design surface. Element-level inpainting holds cohesion in a way Canva-style AI editors have not.
AGI in three years, Waymos in floodwater, layoffs as 10x bets: the macro story is Google leaning into AI-for-science legitimacy while the industry sorts which jobs survive.

Omni and Aleph 2 landed together. The model that understands the world wins on physics. The one that ships better workflow scaffolding might still win on day-to-day use.

Links from This Episode

Tools & Platforms:

Projects:

Project Genie public access, the world-model engine Demis cited for Waymo fringe-case testing

News & Analysis:

Nano Banana Pro coverage, the image-model framing for Omni
Comfy Cloud, AI camera control, Nano Banana 2 roundup
Google Flow original launch
Google Veo 3 launch

Companies:

Google DeepMind (Demis Hassabis, AGI timeline)
Runway (Cristóbal Valenzuela, Aleph 2)
Meta (layoffs, leaked training audio)
Waymo (Atlanta flood incidents)