Pika's New Speech to AI Characters

In partnership with

Welcome to VP Land! RIP AOL dial-up 🪦 The company is discontinuing service on Sep 30th. Here's a quick chart on what the iconic connection sounds meant.

SIGGRAPH kicked off this weekend. When asked about AI, Pixar co-founder Ed Catmull said, "We don't know where it will go. But it's not going away. So artists need to engage with the technology." We'll have more SIGGRAPH updates later this week.

Last week, we asked which tool updates you’re most excited about, and the results came back evenly split between Lightcraft Spark and OpenAI GPT-5.

In today's edition:

Pika launches audio-driven AI character performances
NVIDIA unveils physics-aware AI models for video generation
Disguise's AI assistant now executes virtual production tasks
MetaPuppet details multi-model AI film workflow

Pika's Fast Audio-Driven Character Performance

Pika announced a new audio-driven performance model that generates HD video with hyper-real facial expressions in just around 6 seconds, with no limit on video duration. The company claims 20x faster and cheaper performance compared to previous approaches.

— # (#)

The model maps your voice to facial expressions and generates any-length video in roughly 6 seconds, potentially eliminating the linear wait times that plague dialogue-heavy content.
Pika positions this as a live-leaning tool that works without specialized motion capture rigs, targeting creators who need rapid expressive shots for TikTok, Reels, or narrative beats.
The speed and cost improvements build on Pika's existing Turbo tier, which already offered 3x faster and 7x cheaper generation compared to standard modes.
You can combine the audio-driven performance with Pika's video-to-video tools like Pikaswaps and Pikadditions to voice-drive expressive faces and then apply stylization or scene edits in the same workflow.
The model extends Pika's roadmap beyond stylized video generation into real-time performance capture, competing directly with avatar-focused tools.

SPONSOR MESSAGE

Fine tuning your AI models? Use data you can trust.

Training AI? Let’s talk. If you're actively building or fine-tuning AI models, Shutterstock offers enterprise-grade training data across images, video, 3D, audio, and more—fully rights-cleared and enriched with 20+ years of human-reviewed metadata.

Our 600M+ asset library powers faster iteration, simplified procurement, and improved model accuracy.

We’re offering a $100 Amazon gift card to qualified AI decision-makers who join a 30-minute discovery call with our team. We’ll walk you through how our multimodal datasets, flexible licensing, and premium metadata can help de-risk development and accelerate time to deployment.

Book a call

_{For complete terms and conditions, see the offer page.}

Disguise AI Assistant Now Automates Complex Tasks

Disguise just upgraded its AI assistant Ask AId3n to execute tasks instead of just answering questions. The new Gemini-powered tool can now sequence timelines, configure LED screens, and build custom automation tools from simple text commands.

You can tell Ask AId3n to set up screens for specific venues and it pulls the right resolution and dimensions from its knowledge base automatically.
The AI creates custom tools when you give it complex instructions, then stores them in your Disguise Cloud account for future projects.
It handles repetitive tasks like creating video layers for awards ceremonies, setting up keyframes for every musical beat, or configuring networked machines.
When Ask AId3n makes mistakes, you can either edit the Python code it generated or just tell it to "fix it" and let the AI correct itself.
The assistant works as a Designer Pro plugin and knows the complete Disguise User Guide for instant troubleshooting.

NVIDIA unveiled new Cosmos world models designed for physical AI applications, including Cosmos Reason, a 7-billion-parameter vision language model that understands physics and spatial relationships. The platform can generate up to 30 seconds of continuous, physics-aware video from multimodal inputs.

Cosmos Reason acts as a reasoning model for robots and AI agents, allowing them to understand physical environments and plan next steps through memory and physics understanding.
Cosmos Transfer-2 accelerates synthetic data generation from 3D simulation scenes, converting structured inputs like depth maps and lidar scans into photorealistic video sequences.
The models integrate with Omniverse and open-source simulator CARLA, creating data pipelines that bridge simulation environments with real-world video for training perception systems.
NVIDIA trained these models on approximately 20 million hours of video across human interactions, environments, and robotics domains, providing broad physics priors for generalization.
Early adopters include robotics companies like Agility Robotics and Figure AI, plus autonomous vehicle developers, signaling industry movement toward world models for synthetic data generation.

MetaPuppet's Multi-Model Production Workflow

MetaPuppet's latest AI short Eve & Adam demonstrates a systematic multi-model workflow that combines structured prompting, multiple AI video generators, and traditional editing to create cinematic narrative content. The creator shared a detailed step-by-step process.

Step 1. Structured prompting starts the workflow by describing each shot to Gemini, which generates JSON-formatted prompts that provide organized categories for lighting, composition, and subject details that are easier to revise than natural language text.

Step 2. The iterative generation process involves running initial Veo 3 generations as placeholders, then going back shot-by-shot to regenerate and refine each clip hundreds of times until the vision matches the output.

Step 3. Multi-model coordination uses Runway Aleph to fix character inconsistencies and change lighting conditions while Veo 3 handles the primary cinematic shots, with both integrated into a Premiere timeline for pacing and flow.

Step 4. Audio workflow integration splits Veo 3's built-in 8-second audio tracks using LALAL, then uploads clips to Suno as inspiration seeds for longer, consistent soundtracks that bridge between shots.

Step 5. Creative asset recycling extends the pipeline by uploading original music to Suno and generating stylistic variations that serve as character themes throughout the narrative.

Cinematographer Matt Ryan shows how to film a jungle car scene with virtual production and image-based lighting at Aputure's APEX event.

Stories, projects, and links that caught our attention from around the web:

📣 Runway’s CEO hinted at an adaptive story system coming by the end of the year.

⚙️ Custom workstation builder Puget Systems returns to SIGGRAPH with live demos showcasing AI, VFX, and 3D workflows optimized for production professionals.

🎥 ARRI is potentially exploring a sale.

🎬 Runway’s 48-hour Gen:48 contest comes back August 23–25 as the Aleph Edition

🎨 ElevenLabs launched the Chroma Awards, positioning it as the “Olympics” of AI film, music, and games to spotlight creative work made with AI tools.

📝 Anthropic’s Claude update adds conversation memory, allowing the model to recall earlier discussions for smoother follow-ups and context retention.

🧰 Replicate introduced a remote MCP server, allowing access to all their API models through a simple chat interface in Claude.