In this episode of Denoised, hosts Addy and Joey break down Coca-Cola’s latest holiday ad, how Secret Level stitched together an AI-first pipeline using tools like Comfy and Veo 3, and why this spot matters for filmmakers and production teams.
What the ad is and why people are talking
The commercial leans into the classic Coca-Cola palette and holiday tropes: trucks rolling through varied environments and a parade of furry animals looking on. At first glance, it reads like a modern CG spot, but the production credits and behind-the-scenes snippets point to a heavy AI-driven workflow. That combination—an iconic brand experimenting with generative tools—sparked online debate about quality, authorship, and jobs.
Behind the scenes: the pipeline and the United Nations of models
Secret Level, the studio credited with the spot, assembled multiple tools across the pipeline rather than relying on a single model. The credits and BTS frames mention Comfy, Veo 3, Runway-style toolsets, Sora, and several upscalers. The studio reportedly started with character sketches and artist-driven first frames, then used image-to-video engines to add motion and compositing passes.
This hybrid approach—artist-led concept art feeding generative tools—was visible in the BTS. Frames showed inpainting passes, depth extraction, manual logo compositing, and high-quality upscaling. The result is a spot that looks familiar to traditional post houses but was executed with a much smaller crew than a comparable CG production.
Key tools and roles observed
Image-to-video models for turning static character art into short animated clips.
Inpainting and compositing to refine frames and protect brand elements like logos and typography.
Upscalers to move from draft resolutions to broadcast-ready frames.
Node-based orchestration to coordinate generators, iterations, and selection workflows.
Visible wins and technical friction
The ad demonstrates several technical strengths. Fur rendering and close-up detail improved compared to earlier AI attempts. Many shots have convincing depth, consistent lighting, and pleasing bokeh, which makes the animal characters feel tactile and photoreal enough to work in a glossy commercial environment.
At the same time, there are telltale weaknesses that underscore where human craft still matters. Stylized cursive text—specifically the Coca-Cola logo—remains challenging for video generators. The logo sometimes appears distorted or applied as an afterthought, suggesting manual compositing or corrective layers were used rather than trusting raw model output.
Another conscious design choice: there are no speaking characters. Dialogue remains a tough problem for image-to-video models because of lip sync and natural speech performance. The spot relies on narration, music, and a jingle, keeping character animation safely nonverbal and minimizing uncanny-valley moments.
The 70,000-clip claim and what that might actually mean
One of the most-discussed figures was the claim that the team generated 70,000 clips using Comfy. That number raised eyebrows—and for good reason. For a roughly 20-shot spot, traditional iteration counts would point at a few hundred or a few thousand outputs, not tens of thousands.
There are possible explanations:
Massive first-frame image generation to sweep styles and character variations, followed by fewer image-to-video passes.
An agentic or automated pipeline that brute-forced many generations and then selected the best results.
Counting every micro-variation, crop, and version as a separate clip for audit or approval tracking.
Regardless of the exact number, the broader point stands: generative tools change how teams allocate time. Where a traditional CG spot might require a 150–200 person pipeline across environments, characters, and animation, this project reportedly ran with a far smaller crew—estimates in public discussion claimed around 20 core people.
Comparing budgets and team dynamics
Traditional environment and character builds scale with headcount: environment artists, character modelers, riggers, animators, texture painters, and a full VFX and editorial post group. Those resources add up quickly—often to multi-million-dollar budgets for high-end spots.
The AI workflow submitted here reallocated work: conceptual art and compositing remained human-led, while bulk motion and environment generation leaned on machine outputs. That model reduces some fixed costs while increasing iteration speed, enabling multiple cuts or alternate creatives for the same overall spend.
Broader implications for the industry
Large brands are experimenting with generative tools not just to save money but to scale creative output. The ability to produce multiple versions of a holiday universe—alternate endings, regional variants, social-friendly cuts—shifts how campaigns can be conceived and distributed.
We need to keep moving forward and pushing the envelope. The genie is out of the bottle and you're not going to put it back in.
That quote from a Coca-Cola generative AI lead captures the tone inside many agencies: the technology is here, practitioners are learning where it helps most, and the next few years will see hybrid pipelines become mainstream.
Final takeaways
The Coca-Cola holiday spot is a snapshot of a transitional moment. It shows how generative models, when combined with traditional craft, can deliver broadcast-quality work with a smaller core team. It also highlights persistent technical limits—logo fidelity, small-text readability, and believable speech animation—that still demand manual intervention.
For filmmakers and studio leaders, the takeaway is pragmatic: evaluate where AI can reduce friction without compromising brand standards. Keep artists involved at the conceptual and finishing stages. Build tooling that turns massive model outputs into curated creative choices. And prepare to produce more creative variations with the same budget, because that capability is what will reshape production strategies in the coming seasons.





