Netflix drops its first AI model

In partnership with

Welcome to VP Land! Sorry for the unexpected silence but we are back on schedule...and gearing up for NAB. If you'll be at the show, hit reply and let us know!

In today's edition:

Netflix open-sources its first AI model
GPT-Image-2 surfaces on Arena
Google's best open model, free to use
Ten AI tool releases worth knowing

Netflix Releases Its First AI Model — And Open-Sources It

Netflix has released VOID, its first publicly available AI model, and made it free and open source. VOID does one thing: it removes objects from video footage while simulating what would physically happen in their absence.

— # (#)

That second part is what makes it different from existing tools. Standard video inpainting software fills the removed region with a patch of background pixels. VOID understands causality. Remove a person holding a guitar and the guitar falls. Remove a car from a collision and the other vehicle continues undisturbed. Remove a diver entering a pool and the splash disappears with them. The model reasons about what the scene would look like if the object had never been there.

For post-production teams, this addresses a problem that currently requires either a clean plate shot on set or significant manual compositing work. Object removal is one of the most common VFX requests on productions that can't afford a full VFX budget, and the results have historically been limited by what the inpainting software can plausibly reconstruct.

How it works. Users provide the video clip, a text description of what to remove, and a segmentation mask. The model runs a two-pass pipeline: base removal followed by an optional refinement pass for temporal consistency across frames.
Hardware requirement. Minimum 40GB VRAM. This is not a consumer tool yet. It runs on A100-class hardware, which means cloud GPU rental rather than a local workstation for most users.
Performance. In user preference studies, VOID outperformed Runway (64.8% vs. 18.4%) and several other inpainting tools on both synthetic and real footage.
Available now. Model weights, training code, a Colab notebook, and a live demo are all on Hugging Face.

Worth noting: Netflix built this on top of an open-source video model from Alibaba (CogVideoX-Fun), trained it on physics simulation datasets, and released the whole stack. For a studio that has historically kept its AI work internal, this is a meaningful shift in posture.

Read more → Full breakdown: how the quadmask pipeline works, hardware requirements, and how it compares to Runway and ProPainter.

SPONSOR MESSAGE

Hiring in 8 countries shouldn't require 8 different processes

This guide from Deel breaks down how to build one global hiring system. You’ll learn about assessment frameworks that scale, how to do headcount planning across regions, and even intake processes that work everywhere. As HR pros know, hiring in one country is hard enough. So let this free global hiring guide give you the tools you need to avoid global hiring headaches.

Download the free guide today

GPT-Image-2 Surfaced on Arena — And the Early Results Are Significant

OpenAI's next image generation model appeared briefly on AI Arena under three codenames: maskingtape-alpha, gaffertape-alpha, and packingtape-alpha. The models were pulled within hours, but not before users captured a large volume of comparison outputs.

The early examples point to a real capability jump over GPT-Image-1, particularly in two areas:

Text rendering. Handwritten text, UI screens, signs, and complex logos generated cleanly. Text coherence in AI image generation has been a persistent weak point across the category. The Arena outputs suggest GPT-Image-2 handles it significantly better.
Photorealism. Side-by-side comparisons show sharper detail and more coherent lighting in complex scenes.
Three variants. The three codenames likely represent different safety or quality tuning configurations being tested in parallel, not meaningfully different architectures.

No official release date has been announced. Models don't typically surface in blind Arena testing without a release following shortly after.

Read more → Full breakdown: the three Arena codenames, what the early outputs show, and what to expect from the official release.

Google Releases Gemma 4 — Its Most Capable Open Model, Apache 2.0

Google has released Gemma 4, its most capable open-weight model to date, under an Apache 2.0 license. The model is multimodal (text and image understanding), available from 1B to 27B parameters, and designed to run on consumer hardware including laptops.

Performance. Gemma 4 27B outperforms significantly larger models on standard benchmarks, including several proprietary models.
Multimodal. The full model family supports image understanding alongside text, a first for the Gemma line.
Apache 2.0. Fully permissive license. Commercial use, fine-tuning, and redistribution are all allowed without restrictions.

The license is the headline. Apache 2.0 on a 27B multimodal model puts a genuinely capable model in the hands of developers who need commercial flexibility without the restrictions that come with Meta's Llama terms.

Read more → Full breakdown: benchmark performance, model sizes, and how it compares to Llama and Mistral for local deployment.

Ten AI Tool Releases Worth Knowing

A heavy release over the past few weeks across video, 3D, speech, and image generation. Lipsync gets a major upgrade for complex footage, two new tools push the boundaries of 3D scene reconstruction, and Alibaba's latest image model bets on reasoning over raw aesthetics. Here's what's worth knowing.

Lipsync and real-time video

Sync-3 — Sync Labs' new lipsync model targets the scenarios that break other tools: multi-speaker scenes, low-light footage, overlapping dialogue, and long continuous takes.
PikaStream1.0 — Pika's real-time video generation system enables live video chat with AI characters at sub-second latency, designed for AI agent workflows.

Video generation

PixVerse V6 — New video generation model with cinematic camera control, VFX compositing, and multi-shot storytelling support.
LTX 2.3 with Reasoning — Community testing shows strong facial detail improvements, especially combined with VBVR LoRA.

3D reconstruction and scene understanding

SpatialLM — Open-source language model that converts raw 3D point clouds into structured scene descriptions with precise metric coordinates. Outperforms Meta's SceneScript on layout estimation.
Seen2Scene — Completes partial 3D scans into full coherent scenes, filling the gaps that photogrammetry and LiDAR always leave behind.
FreeOrbit4D — Generates bullet-time orbital video effects from a single input video. No training required.

Image generation

Wan 2.7 Image — Alibaba's new image model adds a "thinking mode" reasoning step before generation for better prompt adherence on complex scenes. Up to 4096×4096, nine reference images, text in 12 languages. API-only.
Phota by PhotaLabs — AI photo editor focused on portrait enhancement with granular control over lighting, skin, and facial features.

Speech-to-text

Willow Atlas 1 — New real-time dictation model claiming to outperform ElevenLabs, Deepgram, and OpenAI. Built on human-generated training data rather than web-scraped audio. Independent benchmarks pending.

THIS is the Biggest Thing Since CGI — A tight argument for why generative AI is a structural shift, not a tool upgrade. The CGI comparison is more specific than you'd expect. Worth 10 minutes.

Stories, projects, and links that caught our attention from around the web:

🏢 Sony is winding down Pixomondo — current projects will run to completion, some staff are being absorbed into Sony Pictures Imageworks, and R&D is rolling into Sony Group.

💻 Apple has discontinued the Mac Pro with no successor announced, ending its highest-end desktop line for creative professionals.

🤫 Reports describe an unreleased Anthropic model called Mythos as a "step change" in capabilities, far ahead of anything currently public.

✍️ The WGA and AMPTP have reached a tentative four-year agreement on the 2026 Minimum Basic Agreement — pending member ratification. The deal increases employer contributions to the writers' health plan and runs through 2030.

🛡️ Iran has issued threats against Stargate, the OpenAI and SoftBank AI data center project, raising new geopolitical risks for the initiative.

This week on Denoised: Joey and Addy test Phota by PhotaLabs live (identity-focused image generation built on Nano Banana), run through Wan 2.7, PixVerse V6, and LTX 2.3's new reasoning LoRA for facial performance. Then Quilty — the AI script coverage platform that landed a Variety splash — and why Joey calls it "the Tilly Norwood of AI development stories." The tools aren't new. The PR is.

Read the show notes or watch the full episode.

Watch/Listen & Subscribe

Spotify | Apple Podcast | YouTube