Stable Video 4D showcases Stability AI's research into multi-angle video generation

Stable Video 4D is built on the foundation of Stable Diffusion Video, Stability AI's flagship image-to-video generation model, to deliver multi-angle videos covering eight novel-view videos from a single uploaded video. These videos can generate a comprehensive 3D view of the original video subject. After uploading their input video, users pick their preferred 3D camera poses and have the model generate the eight novel-view videos of different perspectives incorporating the selected camera positions. Then, these videos are combined into a dynamic 3D representation of the subject.

According to Stability AI, the model can generate 5-frame videos for each of the 8 novel views in approximately 40 seconds, with the full 3D optimization taking around 20-25 minutes. Potential applications for the technology include game development, video editing, and virtual reality. Stable Video 4D differentiates its approach by requiring a single input video instead of a combination of image, video, and multi-view diffusion model sampling. Moreover, since the model creates all eight views simultaneously, this implies that temporal and spatial consistency is improved. The simplified process also represents a lighter 4D optimization framework.

Stable Video 4D, the company's first video-to-video generation model, is available on Hugging Face. Stability AI plans to keep optimizing the model, specifically to extend the range of videos it can handle. Since Stable Video 4D is in its research phase, Stability AI has released it in conjunction with a technical report.