Stability AI announced the release of Stable Video Diffusion, its first foundational model for generative video. For now, Stable Video Diffusion is exclusively available in research preview and is not yet intended for real-world or commercial applications. Feedback on the research preview will be used to refine the model and ensure its safety and quality before its full release. The code for Stable Video Diffusion is available in Stability's GitHub repository and the weights can be found on its Hugging Face page. Furthermore, details regarding the technical capabilities of the model are outlined in the research paper.
According to the announcement, the video model is suitable for several tasks, including multi-view synthesis from a single image once the model is fine-tuned on multi-view datasets. Stability is planning to launch a series of models that build on and extend the foundational model's capabilities to foster the construction of an ecosystem similar to the one that developed around the foundational image generation models in the Stable Diffusion family. It certainly seems that Stable Video Diffusion is off to a great start: released in the form of two image-to-video models, it can generate 14 and 25 frames at customizable frame rates between 3 and 30 frames per second. Moreover, Stability claims to have found via external user preference studies that Stable Video Diffusion is already surpassing the leading closed models.
The company also announced it was opening a waitlist to access a text-to-video web-based experience that will showcase the video models' capabilities. Those interested can sign up here.