Meta AI, in partnership with the University of Tel Aviv, has introduced VideoJAM, a new framework designed to enhance motion generation in video models.
This approach aims to address a common challenge in video generation: the tendency of models to prioritize visual appearance over realistic motion and dynamics.
VideoJAM is our new framework for improved motion generation from @AIatMeta
We show that video generators struggle with motion because the training objective favors appearance over dynamics.
VideoJAM directly adresses this **without any extra data or scaling**๐๐งต
โ Hila Chefer (@hila_chefer)
2:57 PM โข Feb 4, 2025
VideoJAM encourages video generators to learn a joint appearance-motion representation, improving motion coherence without requiring additional data or model scaling.
The framework introduces two key components: an extended training objective and an "Inner-Guidance" mechanism for inference.
During training, the model is tasked with predicting both generated pixels and their corresponding motion from a single learned representation.
The Inner-Guidance feature steers generation towards coherent motion by using the model's evolving motion prediction as a dynamic guidance signal during inference.
VideoJAM can be applied to existing video models with minimal adaptations, making it a versatile solution for improving motion generation.
The framework has demonstrated state-of-the-art performance in motion coherence, surpassing competitive proprietary models while also enhancing the perceived visual quality of generated videos.
VideoJAM's approach to balancing appearance and motion in video generation represents a significant step forward in creating more realistic and coherent video content.
This development could have far-reaching implications for various applications in film production, visual effects, and content creation, potentially offering filmmakers and producers more sophisticated tools for generating high-quality video assets.
Reply