Stability AI is growing its generative AI model portfolio today with the release of Stable Video 3D (SV3D).
As the name implies, the new model is a gen AI video tool for rendering 3D video. Stability AI has been developing video capabilities with its Stable Video technology that enables users to generate short video from an image or text prompt. SV3D builds upon Stability AI’s previous Stable Video Diffusion model, adapting it for the task of novel view synthesis and 3D generation.
With SV3D, Stability AI is adding new depth to its video generation model with the ability to create and transform multi-view 3D meshes from a single input image.
SV3D is now available for commercial use with a Stability AI Professional Membership ($20 per month for creators and developers with less than $1 million in annual revenue). For non-commercial purposes, users can download the model weights from Hugging Face.
Here’s an example video I generated quickly. As you’ll see, despite some slight distortions, the forms of all the objects in the video remain markedly coherent and solid even as the camera rotates around them.
Game creation, e-commerce cited as target use cases :
“By adapting our Stable Video Diffusion image-to-video diffusion model with the addition of camera path conditioning, Stable Video 3D is able to generate multi-view videos of an object,” the company wrote in a blog post detailing the new model.
“Stable Video 3D is a valuable tool for generating 3D assets, especially within the gaming sector,” Varun Jampani, lead researcher at Stability AI told VentureBeat. “Additionally, it enables the production of 360-degree orbital videos, which are useful in e-commerce, providing a more immersive and interactive shopping experience.”
From Stable Zero123 to SV3D :
Stability AI is perhaps best known for its Stable Diffusion text-to-image gen AI models which include SDXL and the Stable Diffusion 3.0, the latter still in early research preview. Stable Diffusion 1.5 is an open source image generation model that forms the basis of many other AI image generation and video products, including Runway and Leonardo AI.
Back in December 2023, the Stable Zero123 model was released, offering new capabilities for building 3D images. At the time, Emad Mostaque, founder and CEO of Stability AI told VentureBeat that Stable Zero123 would be the first of a series of 3D models.
The SV3D technology is taking a different approach to 3D generation than Stable Zero123.
“Stable Video 3D can be seen as a successor and as an improvement to our previous offering Stable Zero123,” Jampani said. “Stable Video 3D is a novel view synthesis network that takes a single image as input, and outputs novel view images.
Jampani explained that Stable Zero123 is based on Stable Diffusion and outputs one image at a time. Stable Video 3D is based on Stable Video Diffusion models and outputs multiple novel views simultaneously. Stable Video 3D provides much better quality novel views, and thus can help in generating better 3D meshes from a single image.
Coherent views from any given angle :
In a research paper, Stability AI researchers detail some of the techniques used to enable 3D from a single image using latent video diffusion.
“Recent work on 3D generation proposes techniques to adapt 2D generative models for novel view synthesis (NVS) and 3D optimization,” the report stated. “However, these methods have several disadvantages due to either limited views or inconsistent NVS, thereby affecting the performance of 3D object generation.”
One of the key strengths of SV3D lies in its ability to generate consistent novel multi-view images of an object. According to Stability AI, SV3D delivers coherent views from any given angle.
The research paper on SV3D highlights this advancement noting that, “. …unlike previous approaches that often grapple with limited perspectives and inconsistencies in outputs, Stable Video 3D is able to deliver coherent views from any given angle with proficient generalization.”
In addition to its novel view synthesis capabilities, SV3D also takes aim at optimizing 3D meshes. By leveraging its multi-view consistency, SV3D can generate high-quality 3D meshes directly from the novel views it produces.
“Stable Video 3D leverages its multi-view consistency to optimize 3D Neural Radiance Fields (NeRF) and mesh representations to improve the quality of 3D meshes generated directly from novel views,” Stability AI wrote in its announcement post.
Two Powerful Variants: SV3D_u and SV3D_p SV3D comes in two variants, each designed for specific use cases.
SV3D_u generates orbital videos based on single image inputs without the need for camera conditioning. Camera conditioning in generative AI refers to a technique where an additional input, often in the form of an image or a set of parameters related to camera perspectives or positions, is used to guide the generation process of new images or content.
On the other hand, SV3D_p extends this capability by accommodating both single images and orbital views, allowing users to create 3D video along specified camera paths.