Three new open-source image models, a major VFX acquisition, and a music generator that inferred our identities from almost no context. We test FireRed, Recraft V4, and ByteDance's BitDance, debate what Foundry's acquisition of Griptape means for Nuke pipelines, and discover that Google Lyria 3 knows more about you than you told it.
Quick Take
Open-source image models matter less for competing with API-based tools and more for what they teach the community. We test three new releases, debate why Foundry's acquisition of Griptape signals the future of VFX pipelines, and discover that Google Lyria 3 can infer context from almost nothing — a capability that felt "creepy" to us.
What We Tested: Three New Open-Source Image Models
The case for open source. Addy opened with a defense of open-source models that cuts through the quality conversation. "Not because the quality is the best," he said, but because these models "move an inch each one of us forward in terms of testing, learning, building other things and sort of modifying, working in Comfy." The real value isn't competing with Nano Banana Pro or API models — it's the ability to experiment, iterate, and build on top of existing work.
Joey added context from his own experience: token costs are real. Running large models through APIs adds up fast. Open-source alternatives that run locally eliminate that friction, even if they require beefier hardware.
FireRed: Image Editing Specialist
FireRed-Image-Edit-1.0 comes from a team with minimal public presence. Unlike the first wave of text-to-image models, FireRed targets a specific problem: editing existing images with high quality and consistency.
Addy tested it with his standard benchmark — a 1970s busy street in New York — and found it lands somewhere between Z Image and Nano Banana Pro in quality. "It's like maybe 75% there," he said, noting issues with text rendering, car coherence, and overall detail. For an open-source model, it's a meaningful step forward.
What stood out: FireRed can correct errors in images using world knowledge. Given a blue pencil with a red line and the prompt "correct the errors," it changes the line to blue. A tricycle with triangle wheels becomes a tricycle with round wheels. This suggests the model combines a visual language model to understand the image, an LLM to reason about corrections, and a diffusion model to regenerate. It's not just pixel-space editing — there's logic underneath.
The architecture question: Addy suspects FireRed may be built on frameworks shared by other Chinese companies. The model acknowledges Quen Image in its acknowledgments, suggesting possible shared architecture or team overlap. "Most of the chassis and the engine are already there," he said. "You're just building on top of that."
Recraft V4: The Instagram Model with SVG Output
Recraft V4 landed in ComfyUI and immediately impressed us. "It's the most Instagramable model to date," Addy said. "Everything just comes out cool."
The model excels at what it was designed for: professional, polished imagery. Sharp text, consistent humans, vibrant colors. Joey called it "the Hype model" — everything looks like it belongs in a high-end marketing campaign.
But the real innovation is the SVG output. Recraft V4 can generate vector graphics directly, not just raster images. Joey flagged this immediately: "I've not seen anyone. It's really hard to find a model that will just do a transparent PNG output, let alone I have not seen anything that could do a vector output."
Addy noted that most models convert from pixel space to vector space as a post-processing step. If Recraft has figured out native latent-to-vector generation, that's genuinely novel. For designers needing scalable assets — logos, icons, brand elements — this workflow eliminates a manual step.
The enterprise angle: Recraft V4 is positioned as enterprise-grade, which means higher cost per generation but more customization. It's purpose-built for professional design work, not general-purpose image generation.
ByteDance's BitDance: Text Rendering as a Dead Giveaway
ByteDance dropped a new open-source image model called BitDance — separate from their proprietary Seed models. The model generates coherent images with readable text, but Addy spotted the problem immediately: the text is "too texty."
"It looks like you took an image of a whiteboard and went into Word and then typed in a marker font," Joey said. "It doesn't look like someone actually wrote that with whiteboard." A farmer's market chalkboard sign looks like a chalk font, not actual chalk. The flowers conform too perfectly to the lines.
These are the dead giveaways. "A normal eye can pick these up," Addy said. "Yeah, something's off there."
The broader pattern: As image models improve, the tells shift. Early models failed at hands and faces. Now they nail anatomy but stumble on text rendering and material authenticity. Each generation reveals what the next generation needs to solve.
What We Debated: Foundry Acquires Griptape
Foundry acquired Griptape, an enterprise-grade AI orchestration platform, and the strategic logic is straightforward.
Nuke is already node-based. VFX artists think in nodes. The natural extension is AI nodes — nodes that generate backgrounds, fill elements, or handle other discrete tasks within the existing Nuke workflow. Griptape provides that orchestration layer, allowing studios to call different models and agents from within Nuke itself.
Why this matters: Griptape is built on Python with scripting and custom node support, designed for the kind of granularity and control that VFX studios demand. It's not a consumer tool — it's enterprise infrastructure. Foundry's integration means AI becomes another tool in the compositor's arsenal, not a separate workflow.
"It's a good move," Addy said. We had predicted Comfy UI might get acquired, but Foundry's play is smarter. They're not buying a community tool — they're buying orchestration infrastructure to embed AI into their flagship product.
The three lanes of AI filmmaking: The acquisition prompted Addy to articulate an emerging pattern in AI-assisted production:
Performance-driven hybrid: Real actors and cameras, but backgrounds and environments generated or replaced with AI. The most practical near-term approach.
Fully synthetic: Everything is art-directed with AI. People, backgrounds, all generated. Still experimental but improving rapidly.
AI-assisted animation: Keyframes and pose control remain manual, but in-between frames are generated. Combines human creative control with AI efficiency.
Foundry's move supports all three lanes by making AI orchestration a native part of the VFX pipeline.
What We Tested: Google Lyria 3 and the Creepy Context Inference
Google Lyria 3 generates 30-second music tracks from text, images, slide decks, or video inputs. It's fast — "much quicker than Suno or 11 Labs," Addy said — and the quality is solid for non-music professionals.
The standout feature isn't the music generation itself. It's the input flexibility. Most music generators take text prompts. Lyria 3 also accepts images, slides, and videos as inspiration. Joey tested it with our podcast cover art. Addy tested it with a slide deck from a NotebookLM session about diffusion models.
But the moment that stuck with us came when Addy generated a song with minimal context. "I just said make a song about Joey and Addie doing a podcast," he explained. "I didn't say anything about filmmaking or AI and it's literally nothing else."
The result: a song about us on the airwaves, talking about the future, AI making movies, figuring it all out. The model inferred our entire context from almost nothing.
"It is sentient after all," Addy joked. But the implication was real: Gemini's world models are sophisticated enough to fill in gaps with remarkable accuracy. Joey wondered if Gemini's memory had previous context saved. Addy confirmed it was a brand new session.
The commercial question: Lyria 3 is embedded in Gemini, not a standalone product. Users won't know they're using "Lyria" — they'll just ask for music and get it. It's the same strategy Google used with Nano Banana and other models: hide the product name, make the capability invisible.
The bigger question is rights and ownership. "Is it commercially safe if you generate the track, do you own it?" Addy asked. Google has embedded SynthID watermarking in all generated tracks for verification, but the legal framework for commercial use remains unclear.
We previously covered Lyria 3's launch with more detail on its capabilities and competitive positioning.
Bottom Line: Open Source Matters, Nuke Gets Smarter, Music Gets Weird
Three different approaches to AI in creative work, each at a different stage of maturity. Open-source models like FireRed, Recraft V4, and BitDance matter not because they beat API models, but because they let the community learn, experiment, and build. Foundry's acquisition of Griptape signals that AI is moving from experimental add-on to core infrastructure in professional VFX pipelines. And Lyria 3 demonstrated that AI's ability to infer context from minimal input is advancing faster than most people realize.
Key takeaways:
Open-source image models are becoming specialized. FireRed targets editing, Recraft targets professional design (with SVG output), ByteDance targets general purpose. The "best model at everything" era is ending.
Nuke's AI integration through Griptape makes sense architecturally. Node-based tools get AI nodes. VFX studios keep their existing workflows while adding AI capabilities.
Lyria 3's context inference is the real story. The music quality is fine, but the model's ability to understand what you're doing from almost no input is the capability worth watching.
Token costs matter. Open-source models running locally eliminate API friction, which changes the economics of AI-assisted workflows.
Links from This Episode
Tools & Models:
FireRed-Image-Edit-1.0 — Open-source image editing model
Recraft V4 in ComfyUI — Professional image generation with SVG output
Google Lyria 3 — AI music generation in Gemini
News & Acquisitions:
Foundry acquires Griptape — AI orchestration platform integrating with Nuke
VP Land Coverage:





