January 27, 2025
4 min read
Why DeepSeek’s AI Model Just Became the Top-Rated App in the U.S.
A Chinese start-up has stunned the technology industry—and financial markets—with a cheaper, lower-tech AI assistant that matches the state of the art
DeepSeek’s artificial intelligence assistant made big waves on Monday, becoming the top-rated app in Apple’s App Store and sending tech stocks into a downward tumble. What’s all the fuss about?
DeepSeek, a Chinese start-up, surprised the tech industry with a new model that rivals the abilities of OpenAI’s most recent one—with far less investment and reduced-capacity chips. The U.S. bans exports of state-of-the-art computer chips to China and limits sales of chip-making equipment. DeepSeek, based in the eastern Chinese city of Hangzhou, reportedly had a stockpile of high-performance Nvidia A100 chips that it had acquired prior to the ban—so its engineers could have used those chips to develop the model. But in a key breakthrough, the start-up says it instead used much lower-powered Nvidia H800 chips to train the new model, dubbed DeepSeek-R1.
“We’ve seen, up to now, that the success of large tech companies working in AI was measured in how much money they raised, not necessarily in what the technology actually was,” says Ashlesha Nesarikar, CEO of the AI company Plano Intelligence. “I think we’ll be paying a lot more attention to what tech is underpinning these companies’ different products.”
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
On common AI tests in mathematics and coding, DeepSeek-R1 matched the scores of Open AI’s o1 model, according to VentureBeat. U.S. companies don’t disclose the cost of training their own large language models (LLMs), the systems that undergird popular chatbots such as ChatGPT. But OpenAI CEO Sam Altman told an audience at the Massachusetts Institute of Technology in 2023 that training the company’s LLM GPT-4 cost more than $100 million. In contrast, DeepSeek says it made its new model for less than $6 million. DeepSeek-R1 is free for users to download, while the comparable version of ChatGPT costs $200 a month.
DeepSeek’s $6-million number doesn’t necessarily reflect how much money would have been needed to build such an LLM from scratch, Nesarikar says. The reported cost of DeepSeek-R1 may represent a fine-tuning of its latest version. Nevertheless, she says, the model’s improved energy efficiency would make AI more accessible to more people in more industries. The increase in efficiency could be good news when it comes to AI’s environmental impact because the computational cost of generating new data with an LLM is four to five times higher than a typical search engine query.
Because it requires less computational power, the cost of running DeepSeek-R1 is a tenth of that of similar competitors, says Hancheng Cao, an incoming assistant professor of information systems and operations management at Emory University. “For academic researchers or start-ups, this difference in the cost really means a lot,” Cao says.
DeepSeek achieved its model’s efficiency in several ways, says Anil Ananthaswamy, author of Why Machines Learn: The Elegant Math behind Modern AI. DeepSeek-R1 has about 670 billion parameters, or variables it learns from during training, making it the largest open-source LLM yet, Ananthaswamy explains. But the model uses an architecture called “mixture of experts” so that only a relevant fraction of these parameters—tens of billions instead of hundreds of billions—are activated for any given query. This cuts down on computing costs. The DeepSeek LLM also uses a method called multihead latent attention to boost the efficiency of its inferences. And instead of predicting an answer word by word, it generates multiple words at once.
The model further differs from others such as o1 in how it reinforces learning during training. While many LLMs have an external “critic” model that runs alongside them, correcting errors and nudging the LLM toward verified answers, DeepSeek-R1 uses a set of rules that are internal to the model to teach it which of the possible answers it generates is best. “DeepSeek has streamlined that process,” Ananthaswamy says.
Another important aspect of DeepSeek-R1 is that the company has made the code behind the product open-source, Ananthaswamy says. (The training data remain proprietary.) This means that the company’s claims can be checked. If the model is as computationally efficient as DeepSeek claims, he says, it will probably open up new avenues for researchers who use AI in their work to do so more quickly and cheaply. It will also enable more research into the inner workings of LLMs themselves.
“One of the big things has been this divide that has opened up between academia and industry because academia has been unable to work with these really large models or do research in any meaningful way,” Ananthaswamy says. “But something like this, it’s within the reach of academia now, because you have the code.”
Editor’s Note (1/28/25): This article was edited after posting to correct Hancheng Cao’s given name and the name of Apple’s App Store and to better clarify the figures for DeepSeek-R1’s reported cost and total parameters.