Sora is based on the past research done in DALL-E and GPT models. (Express Image)
After stunning the world with its sensational AI chatbot ChatGPT, OpenAI is back with yet another creation. The Sam Altman-led AI start-up has introduced a new software that can create hyper realistic one-minute videos based on text prompts. Called Sora, the software is currently in the red teaming phase, where the company is working towards identifying flaws in the system. OpenAI is also reportedly working with visual artists, designers, and filmmakers to gather feedback on the model.
Sam Altman, the CEO of OpenAI took to his X account to introduce Sora, the company’s video generation model. Altman went on to share a host of videos on his profile to showcase the efficiency and visual capabilities of the new AI model. While the model is currently in the red teaming phase, OpenAI has not shared any information regarding its wider launch.
You have exhausted your monthly limit of free stories.
Read more stories for free with an Express account.
According to OpenAI, Sora is a text-to-video model that generates one-minute-long videos while “maintaining the visual quality and adherence to the user’s prompt.” OpenAI claims that Sora is capable of generating complex scenes with numerous characters with specific types of motion and accurate details of the subject and background. According to the company, the model can understand not only what the user’s prompt, but also be able to comprehend how these things will reflect in the real world.
Following the introduction of the model, Altman shared creations of Sora based on prompts requested by his followers. From cycling dolphins to a squirrel riding a dragon, here are some sample videos that showcase the versatility of Sora.
Sora is essentially a diffusion model that is capable of generating entire videos all at once or extending generated videos to make them longer. The model uses a transformer architecture that unlocks superior scaling performance much similar to GPT models. The AI model shows videos and images as collections of smaller units of data which are known as patches. Each of these patches is similar to tokens in GPT. OpenAI stated that Sora is built upon past research conducted for DALL-E and GPT Models. It borrows the recapturing technique from DALL-E 3 which includes generating descriptive captions for visual training data.
Apart from generating videos from prompts in natural language, the model is capable of taking an existing image and generating a video from it. According to OpenAI, It will essentially animate the image’s components accurately. It is also capable of extending existing videos by filling in missing frames
Capabilities and limitations
OpenAI claims that Sora has an in-depth understanding of language which allows it to interpret prompts with accuracy and create characters that showcase vibrant emotions. Interestingly, Sora is also capable of creating multiple shots within a single generated video with persisting visual style and characters.
Story continues below this ad
The company also highlighted that Sora has its own set of limitations. At present, the model may struggle with creating the “physics of a complex scene” with accuracy. It may also struggle to understand specific instances of cause and effect. The company illustrated by stating a scenario where a person may take a bite out of a cookie, however, the cookie may not have the bite mark. Similarly, Sora may also confuse spatial details in a prompt such as it may confuse left and right, and may struggle with precise descriptions of events that take place over time.
Is Sora safe?
On its official website, OpenAI has stated that it has been taking several safety measures before making Sora accessible in its products. The company went on to assert that they are working with a team of domain experts specific to misinformation, hateful content, and bias. These experts will be adversarially testing Sora. Besides, the company is also building tools like a detection classifier that can detect misleading content and tell if a video was generated by Sora.
“We’ll be engaging policymakers, educators, and artists around the world to understand their concerns and to identify positive use cases for this new technology. Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time,” reads the official website.
OpenAI also said that it will be including C2PA metadata in the future if we deploy the model in an OpenAI product. In simple terms, C2PA is an open technical standard that allows publishers, companies, and others to embed metadata in media to verify its origin and related information. The company has also said that it is leveraging the existing safety measures that have been built into its products that use DALL E-3.
Story continues below this ad
Besides, the text classifier deployed by OpenAI will keep a check and even reject prompts that violate the company’s usage policy which include, requests of extreme violence, sexual content, hateful imagery, celebrity likeness, or IP of others. The company also has strong image classifiers that will review the frames of every video to ensure that they align with the company’s usage policy.
OpenAI’s Sora comes at a time when text-to-video models from the likes of Stability AI have shown the astounding capabilities of AI video generation. The Sam Altman-led company has its eyes set on Artificial General Intelligence and sees Sora as a step further in that direction. From what we are seeing, Sora is clearly miles ahead of existing generative AI video creation models. Google introduced a similar model in October 2022, named Imagen Video, however, there is no model that has been accessed by the public from the tech giant. Google has also worked on Phenaki, its text-to-video model, and Meta too had its stint with the Make-A-Video tool. However, OpenAI seems to have surpassed all.
Bijin Jose, an Assistant Editor at Indian Express Online in New Delhi, is a technology journalist with a portfolio spanning various prestigious publications. Starting as a citizen journalist with The Times of India in 2013, he transitioned through roles at India Today Digital and The Economic Times, before finding his niche at The Indian Express. With a BA in English from Maharaja Sayajirao University, Vadodara, and an MA in English Literature, Bijin's expertise extends from crime reporting to cultural features. With a keen interest in closely covering developments in artificial intelligence, Bijin provides nuanced perspectives on its implications for society and beyond. ... Read More
Technology on smartphone reviews, in-depth reports on privacy and security, AI, and more. We aim to simplify the most complex developments and make them succinct and accessible for tech enthusiasts and all readers. Stay updated with our daily news stories, monthly gadget roundups, and special reports and features that explore the vast possibilities of AI, consumer tech, quantum computing, etc.on smartphone reviews, in-depth reports on privacy and security, AI, and more. We aim to simplify the most complex developments and make them succinct and accessible for tech enthusiasts and all readers. Stay updated with our daily news stories, monthly gadget roundups, and special reports and features that explore the vast possibilities of AI, consumer tech, quantum computing, etc.