How DeepSeek Built One of the World’s Most Powerful AI Systems with Fewer Computer Chips
How are A.I. technologies built?
The leading A.I. technologies are based on what scientists call neural networks, mathematical systems that learn their skills by analyzing enormous amounts of data. The most powerful systems spend months analyzing just about all the English text on the internet as well as many images, sounds, and other multimedia. That requires enormous amounts of computing power.
About 15 years ago, A.I. researchers realized that specialized computer chips called graphics processing units, or GPUs, were an effective way of doing this kind of data analysis. Companies like the Silicon Valley chipmaker Nvidia originally designed these chips to render graphics for computer video games. But GPUs also had a knack for running the math that powered neural networks.
How was DeepSeek able to reduce costs?
It did many things. Most notably, it embraced a method called "mixture of experts." Companies usually created a single neural network that learned all the patterns in all the data on the internet. This was expensive, because it required enormous amounts of data to travel between GPU chips.
With the mixture of experts method, researchers tried to solve this problem by splitting the system into many neural networks: one for poetry, one for computer programming, one for biology, one for physics, and so on. There might be 100 of these smaller "expert" systems. Each expert could concentrate on its particular field.
How is this efficient?
Many companies have struggled with this method, but DeepSeek was able to do it well. Its trick was to pair those smaller "expert" systems with a "generalist" system. The experts still needed to trade some information with one another, and the generalist – which had a decent but not detailed understanding of each subject – could help coordinate interactions between the experts.
And that is more efficient?
Much more. But that is not the only thing DeepSeek did. It also mastered a simple trick involving decimals that anyone who remembers his or her elementary school math class can understand.
There is math involved in this?
Remember your math teacher explaining the concept of pi. Pi, also denoted as π, is a number that never ends: 3.14159265358979 …. You can use π to do useful calculations, like determining the circumference of a circle. When you do those calculations, you shorten π to just a few decimals: 3.14. If you use this simpler number, you get a pretty good estimation of a circle’s circumference.
DeepSeek did something similar – but on a much larger scale – in training its A.I. technology. The math that allows a neural network to identify patterns in text is really just multiplication – lots and lots and lots of multiplication. We’re talking months of multiplication across thousands of computer chips.
That’s it?
Well, they added another trick. After squeezing each number into 8 bits of memory, DeepSeek took a different route when multiplying those numbers together. When determining the answer to each multiplication problem – making a key calculation that would help decide how the neural network would operate – it stretched the answer across 32 bits of memory. In other words, it kept many more decimals. It made the answer more precise.
So any high school student could have done this?
Well, no. The DeepSeek engineers showed in their paper that they were also very good at writing the very complicated computer code that tells GPUs what to do. They knew how to squeeze even more efficiency out of these chips.
Then why didn’t they do this already?
Some A.I. labs may be using at least some of the same tricks already. Companies like OpenAI do not always reveal what they are doing behind closed doors. But others were clearly surprised by DeepSeek’s work. Doing what the start-up did is not easy. The experimentation needed to find a breakthrough like this involves millions of dollars – if not billions – in electrical power.
Conclusion
DeepSeek’s innovative approach to building A.I. systems has the potential to significantly reduce the cost of building A.I. technologies. By embracing the mixture of experts method and using simple mathematical tricks, the start-up has shown that it is possible to build powerful A.I. systems with fewer computer chips. This breakthrough could have far-reaching implications for the development of A.I. technologies.
FAQs
Q: Can any high school student do this?
A: No, the DeepSeek engineers have a deep understanding of computer code and were able to write the complicated code that tells GPUs what to do.
Q: Why didn’t other A.I. labs do this already?
A: The experimentation needed to find a breakthrough like this involves millions of dollars – if not billions – in electrical power. It requires enormous amounts of risk, which can be a barrier to innovation.
Q: Will this breakthrough lead to cheaper A.I. systems?
A: Yes, DeepSeek’s innovative approach has the potential to significantly reduce the cost of building A.I. systems.

