Date:

DeepSeek’s Model Rivals OpenAI

A Young Group of Geniuses Eager to Prove Themselves

According to Liang, when he put together DeepSeek’s research team, he was not looking for experienced engineers to build a consumer-facing product. Instead, he focused on PhD students from China’s top universities, including Peking University and Tsinghua University, who were eager to prove themselves. Many had been published in top journals and won awards at international academic conferences, but lacked industry experience, according to the Chinese tech publication QBitAI.

“Our core technical positions are mostly filled by people who graduated this year or in the past one or two years,” Liang told 36Kr in 2023. The hiring strategy helped create a collaborative company culture where people were free to use ample computing resources to pursue unorthodox research projects. It’s a starkly different way of operating from established internet companies in China, where teams are often competing for resources.

The Advantage of Youth

Liang said that students can be a better fit for high-investment, low-profit research. “Most people, when they are young, can devote themselves completely to a mission without utilitarian considerations,” he explained. His pitch to prospective hires is that DeepSeek was created to “solve the hardest questions in the world.”

The fact that these young researchers are almost entirely educated in China adds to their drive, experts say. “This younger generation also embodies a sense of patriotism, particularly as they navigate US restrictions and choke points in critical hardware and software technologies,” explains Zhang. “Their determination to overcome these barriers reflects not only personal ambition but also a broader commitment to advancing China’s position as a global innovation leader.”

Innovation Born out of a Crisis

In October 2022, the US government started putting together export controls that severely restricted Chinese AI companies from accessing cutting-edge chips like Nvidia’s H100. The move presented a problem for DeepSeek. The firm had started out with a stockpile of 10,000 H100’s, but it needed more to compete with firms like OpenAI and Meta.

DeepSeek had to come up with more efficient methods to train its models. “They optimized their model architecture using a battery of engineering tricks—custom communication schemes between chips, reducing the size of fields to save memory, and innovative use of the mix-of-models approach,” says Wendy Chang, a software engineer turned policy analyst at the Mercator Institute for China Studies.

DeepSeek has also made significant progress on Multi-head Latent Attention (MLA) and Mixture-of-Experts, two technical designs that make DeepSeek models more cost-effective by requiring fewer computing resources to train. In fact, DeepSeek’s latest model is so efficient that it required one-tenth the computing power of Meta’s comparable Llama 3.1 model to train, according to the research institution Epoch AI.

The Impact of Open Source Models

DeepSeek’s willingness to share these innovations with the public has earned it considerable goodwill within the global AI research community. For many Chinese AI companies, developing open source models is the only way to play catch-up with their Western counterparts, because it attracts more users and contributors, which in turn help the models grow.

“They’ve now demonstrated that cutting-edge models can be built using less, though still a lot of, money and that the current norms of model-building leave plenty of room for optimization,” Chang says. “We are sure to see a lot more attempts in this direction going forward.”

Conclusion

DeepSeek’s innovative approach to AI research, coupled with its commitment to open source models, has earned it a reputation as a leader in the field. Despite the challenges posed by US export controls, the company has demonstrated that it is possible to build cutting-edge models using fewer resources. As the global AI research community continues to evolve, DeepSeek is likely to remain at the forefront, pushing the boundaries of what is possible.

FAQs

Q: What is DeepSeek’s approach to AI research?
A: DeepSeek’s approach to AI research focuses on hiring young PhD students from top universities in China and providing them with ample computing resources to pursue unorthodox research projects.

Q: How has DeepSeek overcome the challenges posed by US export controls?
A: DeepSeek has overcome the challenges posed by US export controls by optimizing its model architecture and developing more efficient methods to train its models.

Q: What is the impact of DeepSeek’s open source models?
A: DeepSeek’s open source models have earned the company considerable goodwill within the global AI research community and have demonstrated that cutting-edge models can be built using fewer resources.

Q: What is the future outlook for DeepSeek?
A: As the global AI research community continues to evolve, DeepSeek is likely to remain at the forefront, pushing the boundaries of what is possible.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here