Netflix has open-sourced VOID (Video Object and Interaction Deletion), a model that removes objects from video while cleaning up the physics those objects leave behind. It is the first public AI model Netflix has ever released, and it points directly at how the company is thinking about AI-assisted post-production.

VOID is not a general-purpose generative video tool. It solves a specific, persistent problem in VFX and editing: when you remove something from a shot, you also need to remove everything that thing was doing to the scene. Shadows, reflections, motion blur, surface contact, collision effects. Existing tools handle the paint-out but leave the physics intact, which means manual cleanup or uncanny artifacts. VOID targets that gap.

How It Works

The model operates in two phases. First, a vision-language model analyzes the scene to identify regions causally affected by the object being removed. If a ball is bouncing off a surface, the model identifies the compression point, the trajectory distortion of nearby objects, the shadow path. These regions get encoded into what the team calls a "quadmask," which then guides a video diffusion model to generate a physically plausible version of the scene where the object was never there. The technical details are laid out in the arXiv paper.

The second phase handles shape stability. When the first pass introduces geometric distortion on remaining objects, a refinement step uses flow-warped noise from the initial generation to lock down object geometry along corrected trajectories. The result is a clean removal that holds up across frames without the wobble or smearing that plagues current inpainting methods.

Training data came from two sources: Kubric, a synthetic dataset of paired triplets (input video, quadmask, ground-truth counterfactual output), and HUMOTO, a real-world dataset of human motion with paired counterfactual removal examples. The team evaluated VOID against ProPainter, DiffuEraser, Runway, MiniMax-Remover, ROSE, and Gen-Omnimatte.

Why This Matters for Post

Object removal is one of the most labor-intensive tasks in post-production. Rig removal, wire removal, cleaning up tracking markers, removing crew reflections, pulling branded items from a shot. Every production handles dozens or hundreds of these fixes per episode. The work is tedious, time-consuming, and expensive, but it is also technically demanding because physics-aware cleanup requires experienced compositors.

A model that handles interaction deletion, not just pixel replacement, compresses that pipeline significantly. If VOID or models like it reach production quality, the impact lands squarely on the VFX vendor layer that handles high-volume cleanup work. Studios could internalize more of this work or reduce the hours billed per episode.

The physics-awareness is the key differentiator. Current commercial tools from Runway and others can remove objects from video, but they treat the problem as inpainting: fill in the missing pixels with something plausible. VOID treats it as counterfactual simulation: what would this scene look like if the object had never been there at all? That distinction matters when the removed object was interacting with its environment.

Netflix as an AI Publisher

This is Netflix entering a new lane. The company has built significant internal AI capabilities, from recommendation systems to encoding optimization, but has never released a model publicly. VOID signals a shift toward open research in production tooling.

The team behind the model includes Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan, and Ta-Ying Cheng. The paper is on arXiv (2604.02296), the code is on GitHub under Netflix's org, and a demo is live on Hugging Face. This is a full open-source release, not a research preview.

Netflix publishing production-relevant AI research changes the competitive dynamics around studio tooling. Until now, the major model releases targeting video editing have come from AI companies (Runway, Pika, Luma) or research labs. A studio releasing its own model suggests Netflix sees strategic value in shaping the tools layer rather than waiting for vendors to build what it needs. It also creates a recruiting signal: Netflix is doing publishable AI research that ships into production workflows.

What to Watch

VOID is narrowly scoped, which is a strength. It does one thing and does it with more physical intelligence than the alternatives. The question is whether Netflix continues publishing models that target specific post-production bottlenecks, or whether VOID is a one-off.

If this is the start of a pattern, expect other studios to respond. Disney, Warner Bros. Discovery, and Amazon all have internal AI teams, but none have published production-focused models. Netflix releasing VOID publicly puts pressure on the rest of the industry to either contribute to shared tooling or explain why they are keeping theirs closed.

Reply

Avatar

or to participate

Keep Reading