Premium

Deepseek: How open-source AI is disrupting big tech’s monopoly

Chinese AI company DeepSeek has been creating ripples across the world. Here’s a look at what’s making it the most cost-efficient alternative to AI giants from the US.

deepseekThe logo of DeepSeek is displayed alongside its AI assistant app on a mobile phone, in this illustration picture taken January 28, 2025. REUTERS/Florence Lo/Illustration

DeepSeek AI Chinese Startup: On Monday, the stock market opened with a massive dip, especially the tech-heavy Nasdaq, which dropped by about 3 per cent. This is its worst performance in the last two years. This drop has been attributed to the meteoric rise of Chinese AI startup DeepSeek, which has in the last few weeks grabbed global attention after it unveiled its AI models — DeepSeek-V3 and DeepSeek-R1, a reasoning model.

short article insert The AI models from the Chinese startup went on to gain widespread acceptance, eventually surpassing ChatGPT as the most downloaded app on the App Store. DeepSeek-V3 and DeepSeek-R1 rival OpenAI’s cutting-edge models o1 and o3, as the Chinese lab achieved this feat only with a fraction of their investments.

What is DeepSeek?

DeepSeek is a Chinese AI company based out of Hangzhou founded by entrepreneur Liang Wenfeng. He is also the CEO of quantitative hedge fund High Flyer. Wenfeng reportedly began working on AI in 2019 with his company, High Flyer AI, dedicated to research in this domain. DeepSeek has Wenfeng as its controlling shareholder, and according to a Reuters report, HighFlyer owns patents related to chip clusters that are used for training AI models.

Story continues below this ad

What sets DeepSeek models apart is their performance and open-sourced nature with open weights, which essentially allows anyone to build on top of them. The DeepSeek-V3 has been trained on a meager $5 million, which is a fraction of the hundreds of millions pumped in by OpenAI, Meta, Google, etc., into their frontier models.

What is different about DeepSeek AI models?

Owing to its optimal use of scarce resources, DeepSeek has been pitted against US AI powerhouse OpenAI, as it is widely known for building large language models. DeepSeek-V3, one of the first models unveiled by the company, earlier this month surpassed GPT-4o and Claude 3.5 Sonnet in numerous benchmarks.

DeepSeek-V3 stands out because of its architecture, known as Mixture-of-Experts (MOE). The MOE models are like a team of specialist models working together to answer a question, instead of a single big model managing everything. The DeepSeek-V3 model is trained on 14.8 trillion tokens, which includes large, high-quality datasets that offer the model greater understanding of language and task-specific capabilities. Additionally, the model uses a new technique known as Multi-Head Latent Attention (MLA) to enhance efficiency and cut costs of training and deployment, allowing it to compete with some of the most advanced models of the day.

Even as the AI community was marveling at the DeepSeek-V3, the Chinese company launched its new model, DeepSeek-R1. The new model comes with the ability to think, a capability that is also known as test-time compute. The R1 model has the same MOE architecture, and it matches, and often surpasses, the performance of the OpenAI frontier model in tasks like math, coding, and general knowledge. R1 is reportedly 90-95 per cent more affordable than OpenAI-o1.

Story continues below this ad

The R1, an open-sourced model, is powerful and free. While O1 is a thinking model that takes time to mull over prompts to produce the most appropriate responses, one can see R1’s thinking in action, meaning the model, while producing the output to the prompt, also shows its chain of thought.

R1 arrives at a time when industry giants are pumping billions into AI infrastructure. DeepSeek has essentially delivered a state-of-the-art model that is competitive. Moreover, the company has invited others to replicate their work by making it open-source. The release of R1 raises serious questions about whether such massive expenditures are necessary and has led to intense scrutiny of the industry’s current approach.

How is it cheaper than its US peers?

It is commonly known that training AI models requires massive investments. But DeepSeek has found a way to circumvent the massive infrastructure and hardware cost. DeepSeek was able to dramatically reduce the cost of building its AI models by using NVIDIA H800, which is considered to be an older generation of GPUs in the US. While American AI giants used advanced AI GPU NVIDIA H100, DeepSeek relied on the watered-down version of the GPU—NVIDIA H800, which reportedly has lower chip-to-chip bandwidth.

In 2022, US regulators put in place rules that prevented NVIDIA from selling two advanced chips, the A100 and H100, citing national security concerns. These chips are essential for developing technologies like ChatGPT. Following the rules, NVIDIA designed a chip called the A800 that reduced some capabilities of the A100 to make the A800 legal for export to China. DeepSeek engineers reportedly relied on low-level code optimisations to enhance memory usage. And this reportedly ensured that the performance was not affected by chip limitations. In simple words, they worked with their existing resources.

Story continues below this ad

Another key aspect of building AI models is training, which is something that consumes massive resources. Based on the research paper, the Chinese AI company has only trained necessary parts of its model employing a technique called Auxiliary-Loss-Free Load Balancing.

Bijin Jose, an Assistant Editor at Indian Express Online in New Delhi, is a technology journalist with a portfolio spanning various prestigious publications. Starting as a citizen journalist with The Times of India in 2013, he transitioned through roles at India Today Digital and The Economic Times, before finding his niche at The Indian Express. With a BA in English from Maharaja Sayajirao University, Vadodara, and an MA in English Literature, Bijin's expertise extends from crime reporting to cultural features. With a keen interest in closely covering developments in artificial intelligence, Bijin provides nuanced perspectives on its implications for society and beyond. ... Read More

Latest Comment
Post Comment
Read Comments
Advertisement
Advertisement
Advertisement
Advertisement