Journalism of Courage
Advertisement
Premium

OpenAI unveils o1, a new AI model trained for ‘reasoning’: Here’s what it can do better

OpenAI’s newly launched o1 model can do a better job of writing code and solving maths problems, but what makes it special?

6 min read
OpenAI o1The o1 and o1-mini can be accessed by ChatGPT Plus and Team users starting today. (Image Source: OpenAI)

OpenAI has released a new AI model called o1 that is capable of solving complex problems faster than a human can, owing to its reasoning capabilities that are better than other AI models previously developed by the Microsoft-backed firm.

It has also released a smaller, cheaper version of the o1 called o1-mini.

“We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognise their mistakes,” OpenAI said in its blog post published on Thursday, September 12.

Is o1 part of the much-hyped ‘Project Strawberry’?

Yes, it is. In November 2023, Reuters reported that OpenAI was working on a project called Q* that specifically sought to improve the reasoning capabilities of its AI models. Internally, this reasoning project began to be referred to as ‘Strawberry’ some time around July this year. Soon, rumours swirled that the secretive AI model in the works would be dramatically smarter and capable of “deep research”.

But, when the time came, OpenAI decided to name this widely anticipated AI model o1 to reportedly indicate “resetting the counter back to 1.”

OpenAI researcher Noam Brown confirmed that o1 is a part of Project Strawberry in a post on X. “I’m excited to share with you all the fruit of our effort at OpenAI to create AI models capable of truly general reasoning,” Brown wrote.

What can o1 do and how good is it?

According to OpenAI, the o1 model showcases a brand-new class of capabilities. Thanks to improved reasoning, it can solve more challenging science and math problems. It is also better at coding.

Story continues below this ad

The o1 model scored 83 per cent on the qualifying exam for the International Mathematics Olympiad, OpenAI said in its blog post. This is a vast improvement from the 13 per cent scored by its predecessor, GPT-4o.

With an 83 per cent score, o1 places among the top 500 students in the US in the math olympiad qualifier known as AIME.

To evaluate its coding abilities, OpenAI put the o1 model to a simulated test resembling an online programming contest known as Codeforces, and it ended up earning a rank in the 89th percentile of the competition.


OpenAI further claimed that the next update of this model will perform “similarly to PhD students on challenging benchmark tasks in physics, chemistry and biology.”

Is o1 smarter than someone with a PhD?

To compare the level of intelligence, OpenAI said that it recruited experts with PhDs and made them answer questions at the level of GPQA diamond, which is a difficult intelligence benchmark test to gauge expertise in physics, chemistry, and biology.

Story continues below this ad

“We found that o1 surpassed the performance of those human experts, becoming the first model to do so on this benchmark,” OpenAI said.

“These results do not imply that o1 is more capable than a PhD in all respects — only that the model is more proficient in solving some problems that a PhD would be expected to solve,” it added.

What makes the o1 model special?

The o1 series of models were able to accomplish these scores because they have been trained using two techniques: reinforcement learning and chain-of-thought reasoning.

OpenAI’s previous GPT models were simply taught to provide answers by detecting patterns in training data. But, in the case of o1, researchers at the organisation first taught the model using a system of rewards and penalties. Then, o1 was taught to process user queries by breaking them down and going through them one step at a time.


With this new training methodology, the o1 model’s approach to solving a problem is reportedly closer to how humans would go about it.

Story continues below this ad

“I think you’ll see there are lots of ways where it feels kind of alien, but there are also ways where it feels surprisingly human […] The model is given a limited amount of time to process queries, so it might say something like, “Oh, I’m running out of time, let me get to an answer quickly.” Early on, during its chain of thought, it may also seem like it’s brainstorming and say something like, “I could do this or that, what should I do?”” Bob McGrew, the chief research officer at OpenAI, was quoted as saying by The Verge.

Currently, the o1 can neither browse the internet nor process files and images. It also lacks factual information about recent world events.

Though the o1 model is said to provide more accurate answers, McGrew admitted that the problem of information hallucination persists. As for other such threats, OpenAI claimed that the o1 model is safer because it is able to reason the company’s safety rules better.

“On one of our hardest jailbreaking tests, GPT-4o scored 22 (on a scale of 0-100) while our o1-preview model scored 84,” the AI firm said, adding that it recently entered into agreements with US and UK regulators for pre-testing AI models before they are deployed.

Story continues below this ad

How to preview the o1 and o1-mini?

The o1 preview and o1-mini models can be accessed by ChatGPT Plus and Team users starting today, by selecting the preferred model using the model picker on the platform. At launch, OpenAI said that the weekly rate limits will be 30 messages for o1-preview and 50 messages for the o1-mini. The company is currently working on increasing these rates and enabling ChatGPT to automatically opt the right model for a given prompt or query. Meanwhile, ChatGPT Enterprise and Edu users will receive access to the models next week.

The API charges for developers to access the o1 preview model is $15 (Rs 1,200 approx) per 1 million input tokens and $60 (Rs 5,000 approx) per 1 million output tokens.

Technology on smartphone reviews, in-depth reports on privacy and security, AI, and more. We aim to simplify the most complex developments and make them succinct and accessible for tech enthusiasts and all readers. Stay updated with our daily news stories, monthly gadget roundups, and special reports and features that explore the vast possibilities of AI, consumer tech, quantum computing, etc.on smartphone reviews, in-depth reports on privacy and security, AI, and more. We aim to simplify the most complex developments and make them succinct and accessible for tech enthusiasts and all readers. Stay updated with our daily news stories, monthly gadget roundups, and special reports and features that explore the vast possibilities of AI, consumer tech, quantum computing, etc.

Tags:
  • artificial intelligence Openai
Edition
Install the Express App for
a better experience
Featured
Trending Topics
News
Multimedia
Follow Us
Trump’s gamble in IranImplications for the US, its allies, and a weakened Tehran
X