AI Pioneer Cerebras Crushed with Demand for New Large Language Model
When you are 50 or 70 times faster than the competition, you can do things they can’t do at all. – Cerebras CEO Andrew Feldman
AI computer pioneer Cerebras Systems has been "crushed" with demand to run DeepSeek’s R1 large language model, says company co-founder and CEO Andrew Feldman.
The Impact of DeepSeek on AI Economics
The impact of DeepSeek on the economics of AI is significant, Feldman indicated. But the more profound result is that it will spur even larger AI systems.
Cerebras’s Edge in Speed
Cerebras’s edge is speed. According to Feldman, running inference on the company’s CS-3 computers achieves output 57 times faster than other DeepSeek service providers.
The Challenge for Hosting DeepSeek
The challenge for anyone hosting DeepSeek is that DeepSeek, like other so-called reasoning models, such as OpenAI’s GPTo1, uses much more computing power when it produces output at inference time, making it harder to deliver results at the user prompt in a timely fashion.
Cerebras’s Solution
Cerebras followed one standard procedure for companies wanting to run DeepSeek inference: download the R1 neural parameters — or weights — on Hugging Face, then use the parameters to train a smaller open-source model, in this case, Meta Platforms’s Llama 70B, to create a "distillation" of R1.
The Results
"We were able to do that extremely quickly, and we were able to produce results that are just plain faster than everybody else — not by a little bit, by a lot," said Feldman.
Conclusion
The breakthrough has several implications. One, it’s a big victory for open-source AI, Feldman indicated, by which he means AI models that post their neural parameters for download. Many of a new AI model’s advances can be replicated by researchers when they have access to the weights, even without having access to the source code. Private models such as GPT-4 do not disclose their weights.
FAQs
Q: What is the impact of DeepSeek on the economics of AI?
A: The impact of DeepSeek on the economics of AI is significant, Feldman indicated. But the more profound result is that it will spur even larger AI systems.
Q: What is Cerebras’s edge in speed?
A: Cerebras’s edge is speed. According to Feldman, running inference on the company’s CS-3 computers achieves output 57 times faster than other DeepSeek service providers.
Q: What is the challenge for hosting DeepSeek?
A: The challenge for anyone hosting DeepSeek is that DeepSeek, like other so-called reasoning models, such as OpenAI’s GPTo1, uses much more computing power when it produces output at inference time, making it harder to deliver results at the user prompt in a timely fashion.
Q: What is Cerebras’s solution to running DeepSeek?
A: Cerebras followed one standard procedure for companies wanting to run DeepSeek inference: download the R1 neural parameters — or weights — on Hugging Face, then use the parameters to train a smaller open-source model, in this case, Meta Platforms’s Llama 70B, to create a "distillation" of R1.
Q: What are the results of Cerebras’s solution?
A: "We were able to do that extremely quickly, and we were able to produce results that are just plain faster than everybody else — not by a little bit, by a lot," said Feldman.

