OpenAI becomes the center of attention with the Sora early announcement
OpenAI has spurred a frenzy of reactions after sharing its research advances on Sora, a SOTA diffusion model for video generation that can generate up to 60 seconds of realistic or animated video. Sora is currently available to red teams and invited creatives only.
The diffusion model can create videos up to 60 seconds long from scratch or extend generated videos to make them longer. Like the GPT models, Sora has a transformer architecture that processes videos and images made into collections of data units called patches, comparable to GPT's tokens. Since patches enable the unification of scattered image and video data, the model can be trained using diverse sources, durations, resolutions, and aspect ratios. Additionally, Sora can perform image-to-video generation by accurately animating the images' content with remarkable attention to detail. OpenAI expects that Sora will become the starting point for "models that can understand and simulate the real world," which it considers a necessary step towards artificial general intelligence.
Sora is currently available for red teamers for adversarial testing and invited visual artists, designers, and filmmakers for feedback. OpenAI acknowledges that the generated videos still present weaknesses, such as the creation of physically implausible motion or spontaneous apparition of entities. Moreover, it has stated that it is still working on new safety measures, including a detection classifier. OpenAI also plans to subject Sora to some of the same security measures and standards in place for DALL-E, such as C2PA metadata. These safety measures will ensure that OpenAI can release Sora to the general public as safely as possible.
Sora's early announcement sparked many reactions on social media, most rightfully praising OpenAI's latest achievement. For instance, some users have taken the prompts behind Sora's generated videos to try them out on other video-generation platforms:
But yet others have remained more critical of OpenAI's announcement, remarking that Sora will probably excel at drone and cat footage because of the sheer availability of scrappable examples, which seems like an essential point to make, even without getting into the obvious questions about copyright and data ownership:
Finally, there are the ones that have commented on a pervasive feature of AI-generated synthetic media: it just looks, well, empty and lifeless. For some, AI-generated media is becoming a mockery of real life and the meaning we tend to attach to things as simple as shaky, low-quality footage of our pets. Many of the questions raised by the Sora announcement are not unique to this model but to the current status of AI-powered media generation. But revisiting the conversation over and over is a healthy thing to do, and by no means diminishes the importance of a groundbreaking achievement such as Sora.