Premium

DeepSeek unveils DeepSeek-R1, a reasoning model that beats OpenAI-o1

The latest LLM from Chinese AI lab DeepSeek has surpassed OpenAI-o1 in several benchmarks.

The new AI model from DeepSeek is a state-of-the-art reasoning model designed to enhance problem solving. (Express Image/FreePik).

Chinese AI lab DeepSeek, which recently launched DeepSeek-V3, is back with yet another powerful reasoning large language model named DeepSeek-R1. The new model has the similar mixture-of-experts architecture and matches the performance of OpenAI’s frontier model o1 in tasks like math, coding and general knowledge. The DeepSeek-R1 is reportedly 90-95 per cent more affordable than o1.

DeepSeek-R1 can be imagined as an AI that not only answers your questions but also reasons through different problems just like humans. The new open-source reasoning model is developed by Chinese AI startup DeepSeek, which made waves earlier this month owing to its incredibly powerful, free, and open-source AI model DeepSeek-V3 that outperformed models by Meta and OpenAI while being developed at a fraction of their cost.

What is DeepSeek-R1?

The new AI model from DeepSeek is a state-of-the-art reasoning model designed to enhance problem solving and analytical capabilities of AI systems. Based on the research paper, the new model comprises two core versions – DeepSeek-R1-Zero and DeepSeek-R1.

Story continues below this ad

Also Read | How China’s DeepSeek-V3 AI model challenges OpenAI’s dominance

The DeepSeek-R1-Zero version is trained entirely via reinforcement learning (RL) without any supervised fine-tuning. The DeepSeek-R1 builds on the foundation laid by the R1-Zero. It incorporates a cold-start phase with carefully curated data and multi-stage RL which ensures enhanced reasoning capabilities and readability.

How does the model perform?

The DeepSeek-R1 has showcased some remarkable performance across benchmarks. When it comes to mathematics (AIME 2024), the model scored 79.8 per cent (Passs@1) which is comparable to OpenAI’s o1. Another benchmark on mathematics, MATH-500, the DeepSeek-R1 model achieved a 93 per cent accuracy, surpassing most of the benchmarks.

DeepSeek-R1 benchmarks.

Codeforces, a benchmark for coding, saw the model secure a rank in the 96.3rd percentile of human participants. This goes on to demonstrate expert-level coding abilities in the model. On General Knowledge, benchmarks such as MMLU And GPQA Diamond, DeepSeek-R1 scored 90.8 per cent and 71.5 per cent accuracy respectively. AlpacaEval 2.0, a benchmark that tests the AI models writing and question answering, DeepSeek-R1 secured an 87.6 per cent win rate.

What are its use cases?

Since DeepSeek-R1 is capable of solving complex reasoning and mathematical problems, the AI model can likely be a great tool for advanced education or tutoring systems. Considering its stellar benchmarks in coding, it can be employed for software development as it excels in code generation and debugging tasks. Based on its strong capabilities in long-context understanding, and question answering, the model can be valuable in research.

Technology on smartphone reviews, in-depth reports on privacy and security, AI, and more. We aim to simplify the most complex developments and make them succinct and accessible for tech enthusiasts and all readers. Stay updated with our daily news stories, monthly gadget roundups, and special reports and features that explore the vast possibilities of AI, consumer tech, quantum computing, etc.on smartphone reviews, in-depth reports on privacy and security, AI, and more. We aim to simplify the most complex developments and make them succinct and accessible for tech enthusiasts and all readers. Stay updated with our daily news stories, monthly gadget roundups, and special reports and features that explore the vast possibilities of AI, consumer tech, quantum computing, etc.

Tags:
artificial intelligence