Premium

How DeepSeek’s origins explain its AI model overtaking US rivals like ChatGPT

DeepSeek’s success has wobbled the widely held belief that pouring billions of dollars into AI chip investments guarantees dominance.

Despite achieving significant milestones in a short span of time, DeepSeek is reportedly focused on AI research and has no immediate plans to commercialise its AI models. (Image: Reuters)

A little known AI research lab backed by a hedge fund in China has caught global attention and sent shockwaves through Silicon Valley.

In what some are calling a “Sputnik moment”, DeepSeek has seemingly leapfrogged the likes of OpenAI, Google, and Meta in the high-stakes AI arms race. The lab’s recently unveiled open-source, reasoning model, DeepSeek R1, has been said to outperform the tech industry’s leading AI models, such as OpenAI’s o1, on key math and reasoning benchmarks.

On Monday, January 27, the Chinese lab’s AI mobile app (powered by its DeepSeek-V3 model) overtook rival ChatGPT to become the No. 1 free app available on Apple’s App Store in the US.

Story continues below this ad

DeepSeek’s AI models have not only given Western AI giants a run for their money but also sparked fears that the US may struggle to maintain its AI primacy in the face of a brewing tech cold war with China.

Its success has wobbled the widely held belief that pouring billions of dollars into chip investments guarantees dominance, causing technology shares to tumble after US and European markets opened on Monday.

So, what is the story behind DeepSeek? Does it explain why DeepSeek has emerged as a disruptive force in the AI landscape?

What is DeepSeek, and how did it start?

In 2015, Liang Wenfeng founded a Chinese quantitative hedge fund called High-Flyer. Quantitative or ‘quant’ hedge funds rely on trading algorithms and statistical models to find patterns in the market and automatically buy or sell stocks, according to a report by Wall Street Journal.

Story continues below this ad

To analyse troves of financial data and support complex operations, Liang established a deep-learning research branch under High-Flyer called Fire-Flyer and stockpiled on Graphics Processing Units (GPUs) in order to build supercomputers.

Then, in 2023, Liang decided to redirect the fund’s resources into a new company called DeepSeek with the goal of developing foundational AI models and eventually crack artificial general intelligence (AGI).

Instead of hiring experienced engineers who knew how to build consumer-facing AI products, Liang tapped PhD students from China’s top universities to be a part of DeepSeek’s research team even though they lacked industry experience, according to a report by Chinese tech news site QBitAI.

Also Read | DeepSeek: Is this China’s ChatGPT moment and a wake-up call for the US?

“Our core technical positions are mostly filled by people who graduated this year or in the past one or two years,” Liang told 36Kr, another Chinese news outlet.

Story continues below this ad

Liang’s approach to building a team that focused on high-investment, low-profit research is believed to have contributed to DeepSeek’s success.

“The whole team shares a collaborative culture and dedication to hardcore research,” Zihan Wang, a former DeepSeek employee, was quoted as saying by MIT Technology Review.

How is DeepSeek different from other AI players?

Despite achieving significant milestones in a short span of time, DeepSeek is reportedly focused on AI research and has no immediate plans to commercialise its AI models.

“I wouldn’t be able to find a commercial reason [for founding DeepSeek] even if you ask me to,” Liang was quoted as saying by 36Kr. “Basic science research has a very low return-on-investment ratio. When OpenAI’s early investors gave it money, they sure weren’t thinking about how much return they would get. Rather, it was that they really wanted to do this thing,” he said.

Story continues below this ad

DeepSeek does not rely on funding from tech giants like Baidu, Alibaba, and ByteDance. It is solely backed by High-Flyer. It has a partnership with chip maker AMD which allows its models like DeepSeek-V3 to be powered using AMD Instinct GPUs and ROCM software, according to a report by Forbes.

DeepSeek is also one of the leading AI firms in China to embrace open-source principles.

Though their definition has been debated, open-source AI models are made available for anyone to download, modify, and reuse. Besides earning the goodwill of the research community, releasing AI models and training datasets under open-source licences can attract more users and developers, helping the models grow more advanced.

Also Read | Can open-source AI models be trusted? Experts debate licensing, risks, and more

However, open-source AI models also come with certain safety risks as they can be misused to create AI-generated, non-consensual sexual imagery and child sexual abuse material (CSAM) by simply removing in-built safeguards.

Story continues below this ad

What AI models has DeepSeek released so far?

DeepSeek’s AI models have reportedly been optimised by incorporating a Mixture-of-Experts (MoE) architecture and Multi-Head Latent Attention as well as employing advanced machine-learning techniques such as reinforcement learning and distillation. Here are a few open-source AI models developed by DeepSeek:

– DeepSeek Coder: An open-source AI model designed for coding-related tasks.
– DeepSeek LLM: An AI model with a 67 billion parameter count to rival other large language models (LLMs).
– DeepSeek-V2: A low-cost AI model that boasts of strong performance.
– DeepSeek-Coder-V2: An AI model with 236 billion parameters designed for complex coding challenges.
– DeepSeek-V3: A 671 billion parameter AI model that can handle a range of tasks such as coding, translating, and writing essays and emails.
– DeepSeek-R1: An AI model designed for reasoning tasks, with capabilities that challenge OpenAI’s marquee o1 model.
– DeepSeek-R1-Distill: An AI model that has been fine-tuned based on synthetic data generated by DeepSeek R1.

What lies ahead for DeepSeek?

The innovation behind DeepSeek’s AI models is driven by scarcity. Since 2022, the US government has announced export controls that have restricted Chinese AI companies from accessing GPUs such as Nvidia’s H100. While DeepSeek had stockpiled on over 10,000 H100 GPUs prior to the restrictions, its imited resources meant that it had to use them more efficiently.

The AI research lab reworked its training process to reduce the strain on its GPUs, former DeepSeek employee Wang told MIT Technology Review.

Story continues below this ad

Although DeepSeek has been able to develop and deploy powerful AI models without access to the latest hardware, it may need to bridge the compute gap at some point in order to more effectively compete against US companies with access to abundant computing resources.

Several users on social media have also pointed out that DeepSeek’s AI chatbot has been modified to censor answers to sensitive questions about China and its government. The chatbot’s purported censorship restrictions could pose a challenge to its widespread global adoption.