News

OpenAI becomes the center of attention with the Sora early announcement

OpenAI has spurred a frenzy of reactions after sharing its research advances on Sora, a SOTA diffusion model for video generation that can generate up to 60 seconds of realistic or animated video. Sora is currently available to red teams and invited creatives only.

by Ellie Ramirez-Camara

Updated February 19, 2024

OpenAI becomes the center of attention with the Sora early announcement — Screenshot from a Sora-generated video depicting a wintery Tokyo scene. | Credit: OpenAI

The diffusion model can create videos up to 60 seconds long from scratch or extend generated videos to make them longer. Like the GPT models, Sora has a transformer architecture that processes videos and images made into collections of data units called patches, comparable to GPT's tokens. Since patches enable the unification of scattered image and video data, the model can be trained using diverse sources, durations, resolutions, and aspect ratios. Additionally, Sora can perform image-to-video generation by accurately animating the images' content with remarkable attention to detail. OpenAI expects that Sora will become the starting point for "models that can understand and simulate the real world," which it considers a necessary step towards artificial general intelligence.

Sora is currently available for red teamers for adversarial testing and invited visual artists, designers, and filmmakers for feedback. OpenAI acknowledges that the generated videos still present weaknesses, such as the creation of physically implausible motion or spontaneous apparition of entities. Moreover, it has stated that it is still working on new safety measures, including a detection classifier. OpenAI also plans to subject Sora to some of the same security measures and standards in place for DALL-E, such as C2PA metadata. These safety measures will ensure that OpenAI can release Sora to the general public as safely as possible.

Sora's early announcement sparked many reactions on social media, most rightfully praising OpenAI's latest achievement. For instance, some users have taken the prompts behind Sora's generated videos to try them out on other video-generation platforms:

SORA vs Pika vs RunwayML vs Stable Video.

I tried the same prompt as in one of @OpenAI's examples on the other GenAI video models. All of them produce much shorter videos around 5 seconds.

IMHO SORA's doing prompt enrichment just like Dall-E, will try that next. pic.twitter.com/dZZrra8DHd
— Gabor Cselle (@gabor) February 16, 2024

But yet others have remained more critical of OpenAI's announcement, remarking that Sora will probably excel at drone and cat footage because of the sheer availability of scrappable examples, which seems like an essential point to make, even without getting into the obvious questions about copyright and data ownership:

Sora will be able to generate countless cat videos and drone flights, all of which look eerily familiar and haunted by absence. Where Sora will struggle is when asked for footage of things it hasn’t been able to scrape a billion examples from us, secretly and without consent. https://t.co/62u7Qh1YdX
— James Tindall (@atomless) February 18, 2024

Finally, there are the ones that have commented on a pervasive feature of AI-generated synthetic media: it just looks, well, empty and lifeless. For some, AI-generated media is becoming a mockery of real life and the meaning we tend to attach to things as simple as shaky, low-quality footage of our pets. Many of the questions raised by the Sora announcement are not unique to this model but to the current status of AI-powered media generation. But revisiting the conversation over and over is a healthy thing to do, and by no means diminishes the importance of a groundbreaking achievement such as Sora.

by Ellie Ramirez-Camara

Updated February 19, 2024