Animating Characters with Wan 2.2 AND Wan 2.5

In this episode of Denoised, hosts Addy and Joey break down how Wan 2.5 and Wan 2.2 Animate pair with Comfy Cloud and local ComfyUI to turn still frames into moving characters, and what that means for filmmakers, VFX artists, and content creators.

Wan 2.5 on Comfy Cloud: audio-driven animation

Wan 2.5 is available as an API node on Comfy Cloud. The workflow is straightforward: supply a reference image, optionally attach an audio file, set duration and resolution, and run. Because the model runs in the cloud, it is accessible to producers who do not have Nvidia GPUs locally.

Key characteristics:

Audio-first — motion is driven by the audio file rather than a mocap stream.
Minimal node tree — the Comfy Cloud template for 2.5 is compact and easy to configure.
Resolution and duration limits — common options include 480p, 720p, and 1080p with short durations (the demo used 5 seconds).

The practical result is surprisingly natural performances for short clips: lip sync, head movement, and hand gestures often feel coherent with the audio. Wan 2.5 leans toward realistic deliveries rather than the exaggerated UGC-style motion some other audio-driven services produce.

This screenshot shows Space Cat used as a reference image for a Wan 2.5 run.

What Wan 2.5 cannot do (and when to avoid it)

Because Wan 2.5 is audio driven, it cannot accept a separate motion capture file as a one-to-one performance input. Prompting can nudge movement (e.g., exaggerated gestures"), but controlling frame-by-frame physical performance via freeform prompts is unreliable. For granular, timed gestures, prompt scripting or a JSON timeline might be attempted, but that quickly becomes cumbersome compared to delivering a driving video.

Wan 2.2 Animate: pose transfer in ComfyUI (cloud and local)

Wan 2.2 Animate is a more complex, local-friendly solution for pose transfer and full-frame character animation. It excels at mapping a driving video to a new character while offering fine-grain controls for which body parts to track.

Pose estimator output that feeds body, hand, and face poses into the animate node.

Core features to know:

Two modes — mix mode (retain background and replace character) and move mode (replace the entire frame). For full-frame character replacement use move mode.
Pose estimator — extracts body, hand, and facial keypoints to drive the generation node, effectively performing mocap-like tracking without suits.
Local or cloud — some Wan models are cloud-only, but the Wan 2.2 template can be run locally if the machine meets the VRAM and compute requirements.

Practical workflow choices

Two real-world approaches emerged from the testing:

Use the Comfy Cloud-Wan 2.5 template for quick audio-driven, short-form animations when you have an image and dialogue but no performance capture.
Use Wan 2.2 Animate in ComfyUI for pose-driven replacements—ideal when you have a performer video and want to map that performance onto a stylized character or modified live-action reference image.

Preparing assets: reference images and performance clips

Good inputs make all the difference.

Reference image — clean, well-lit stills work best. When doing costume or background replacement, pre-edit the reference using tools like Nano Banana or Freepik's generation tools to remove distracting props and insert desired costume and set elements.
Performance video — shoot at the intended final frame rate and perspective. Wan 2.2 works better when the driving video and reference image share a similar camera angle. Vertical video works, but consistent framing helps.
Frame rate and resolution — match the source clip frame rate and resolution. Wan 2.2 workflows can default to 16 fps, which produces choppier motion; it is better to feed the original 30 fps or whatever the source uses.

Small physics details, like a dog tag swinging, can appear automatically when the pose estimator maps motion to a generated image.

Performance, hardware, and runtime expectations

Running Wan 2.2 Animate locally requires a capable GPU. In testing, an RTX 3090 handled a short 2-3 second clip in roughly 5-7 minutes at 1280x720. Cloud runs for Wan 2.5 completed in under a few minutes for short segments, with costs tied to API executions.

Other operational notes:

Precision compatibility — pick matching model and clip encoder precisions to avoid runtime mismatches.
Maximum frame window — some workflows are limited to ~81 frames per inference. For longer clips, chain inference nodes to concatenate segments while passing the last frame forward.
GPU load — expect VRAM-heavy operations; node-based UIs show active nodes as they run, which is useful for performance debugging.

The workflow exposes a global resolution parameter that propagates through the node tree.

Video extend nodes allow chaining multiple inference runs for clips longer than the model’s frame limit.

Common pitfalls and how to address them

Proportion mismatches — when mapping human motion to stylized or nonhuman characters, proportions may stretch incorrectly. Solutions include using a reference performer with closer proportions or iterating the reference image to better match the driving performer.
Face fidelity — face details can get distorted; refine the reference image and experiment with face-tracking options in the pose estimator.
Switching modes — some ComfyUI templates require deleting node connections to switch from mask (mix) to full-frame (move) mode; a simpler template built by community members can make this switch less error-prone.

Use cases and where these tools fit into production

These workflows are not yet a drop-in replacement for feature film VFX, but they are highly useful for:

Short films and proof-of-concept sequences where rapid iteration is valuable.
YouTube channels and short-form content that benefit from stylized characters with low production overhead.
Virtual production experiments and previs where quick character animation drives blocking, timing, and compositing tests.

Final takeaways

Wan 2.5 and Wan 2.2 Animate each solve different parts of the character animation puzzle. Wan 2.5 gives producers a fast route to audio-driven animation in the cloud. Wan 2.2 Animate brings pose transfer and local flexibility for mapping real performance onto new characters and frames. Together, they expand what small teams and individual creators can prototype without large mocap rigs or expensive render farms.

For filmmakers and VFX artists, the practical approach is clear: choose the model based on whether audio or motion should be the primary driver, prepare clean reference assets, and match frame rate and resolution to maintain VFX pipeline compatibility.