Nano Banana Pro, Meta's SAM 3D, and World Labs' Marble

In this week’s episode of Denoised, Addy and Joey break down a slate of AI and creative-tech updates that matter to filmmakers: Google’s Nano Banana Pro, Meta’s SAM 3D for converting photos into editable 3D assets, and World Labs’ Marble for quickly generating navigable 3D scenes. Below are the highlights, test notes, and practical takeaways for production teams, VFX artists, and content creators.

Nano Banana Pro: 4K outputs, improved text, and real-world reasoning

Google’s Nano Banana Pro adds two headline features that jumped out during testing: native 2K model architecture with support for 4K outputs and tighter integration with Gemini’s advanced reasoning. Joey emphasizes that the model is more than image denoising; it appears to combine visual generation with multimodal understanding, which helps when the task is more than “make a pretty image.”

The big practical wins for filmmakers: crisp, accurate text rendering for diagrams and in-frame sign copy, stronger character consistency across shots, and a relighting capability that respects scene lighting motivations rather than just pasting new backgrounds.

Why that matters

Relighting and spatial understanding are core problems in VFX and virtual production. When an image generator can convincingly move a subject into a different lighting environment—removing hotspots or shifting rim light to diffuse cloud cover—it reduces manual compositing and color work. For productions that need believable turnaround imagery for previs, concept boards, or quick client demos, Nano Banana Pro raises the bar.

Sample outputs and side-by-side tests

Joey ran a series of film-focused tests comparing Nano Banana Pro with Nano Banana 1 and Seedream. The tests were deliberately cinematic: a Whiplash frame with JK Simmons, a Star Wars–style cantina style transfer, and a Starbucks barista relight and place. The Pro version consistently outperformed the earlier models on relighting fidelity and small detail retention.

Examples that stood out:

A Whiplash shot relit to overcast conditions: light hotspots on the head disappear, while facial pose and expression remain intact.
A cantina-style scene where reflections, material response, and ambient tinting were adjusted across foreground and background for a consistent mood.
A visual prompt-driven Starbucks scene that maintained the subject’s pose and window light motivation while convincingly adding espresso hardware and menu text that looked like real type rather than AI gibberish.

Character consistency, visual prompting, and spatial moves

Nano Banana Pro allows blending up to 14 reference images and claims consistent resemblance for up to five people. Joey notes that some community examples exceed that, but the stated limits give teams a reliable baseline.

Visual prompting in Freepik paired with Nano Banana Pro added another layer of control. The hosts tested reframes and reverse-angle generation—the sorts of tasks directors and storyboard artists often want when planning coverage. Nano Banana Pro provided stronger spatial understanding across several attempts, though extreme camera flips still require careful prompting or iterative passes.

Practical tip

When you need a reverse-angle or a partial over-the-shoulder result, include explicit camera instructions in the prompt (for example: "over her left shoulder, shoulder in right frame, partial foreground occlusion"). Combining that with Freepik’s 3D camera features—where the image becomes a rotatable block—improves success rates.

Integration with Gemini and the path toward world models

Joey points out repeated overlap between Nano Banana and Gemini 3 capabilities. Gemini’s advanced reasoning appears to be powering contextual tasks such as diagram creation, text accuracy, and procedural outputs like assembly instructions for complex toys. That combination nudges image models toward broader world modeling—systems that can understand and reproduce environments with physical consistency.

“All roads lead to robots,” that line captures the hosts’ lighthearted prediction: as image, language, and 3D engines converge, workflows will increasingly support agent-driven scene creation and interaction. For filmmakers, that means faster iterations when exploring location ideas, props, and camera coverage, and a tighter bridge between concept and virtual production stages.

Pricing and access

Cost considerations are practical: Nano Banana standard generations run around $0.04 per image, while Nano Banana Pro 4K is roughly $0.30 per image — about a 7x to 10x jump depending on rounding. Aggregators like Freepik temporarily offered more generous access windows, but those promos often come with limits (single generation at a time, reduced batch options).

Production leaders should weigh the need for 4K fidelity and high-fidelity relighting against budget. For a quick mood board or concept pass, standard generations might be enough. For client-facing comps or virtual production plates, the Pro tier can shave downstream post costs.

Meta’s SAM 3D: Photos to editable, rigged 3D assets

Meta released SAM 3D, a model that converts images into full 3D assets, including basic skeletal rigs and separate object extraction. Joey highlights samples where entire picnic spreads and more complex scenes turned into editable 3D assemblies that export to Blender or Unreal.

This is a core productivity leap for teams building metaverse spaces or rapid environments. Instead of hiring modelers for every prop, creatives can generate a bulk of assets from reference imagery and then refine in a 3D package. For virtual production, that shrinks turnarounds when dressing environments or testing camera moves on set-projected LED walls.

World Labs’ Marble: quick navigable 3D worlds

Marble creates small, navigable 3D scenes from text or image prompts. The product is aimed at virtual production and rapid prototyping: generate a location, project it on a wall, and shoot actors with interactive lighting and parallax. Marble exports as game-ready formats and offers an editor for refining the output.

Key production use cases include mood-blocking on a budget, background plates for short shots, and small-volume virtual production where full Unreal pipelines would be overkill. The caveat: these worlds still have limits in resolution and fidelity when you push far beyond the generated area, but they are already useful for many short-form and previsualization tasks.

What filmmakers should take away

This week’s updates point to two clear trends. First, image models are improving in scene understanding: relighting, text fidelity, and consistent multi-person outputs reduce the heavier technical lift in early-stage VFX and previs. Second, 3D generation is moving from isolated proof-of-concept to practical tooling: SAM 3D and Marble both lower the barrier to populate interactive worlds.

For production teams, practical next steps are:

Experiment with Nano Banana Pro for client-facing comps where relighting authenticity matters.
Use Freepik’s visual prompting and camera features when you need controlled reframes or over-the-shoulder angles.
Prototype sets and background elements with SAM 3D or Marble before committing to heavy Unreal builds.

Taken together, these tools don’t replace traditional pipelines but augment them—allowing faster creative iterations, tighter previs, and more accessible virtual production for smaller budgets. The pace of improvement suggests that incorporating these models into storyboarding, art direction, and early-stage VFX planning will deliver real time and cost savings.