Cerebras Makes Breakthrough in AI with Chain of Thought
Cerebras Systems Announces Major Advancement in Generative AI
Cerebras Systems has made a significant announcement at the annual NeurIPS conference on AI, demonstrating the ability to make a smaller AI model equal to or better than a larger model using the increasingly popular "chain of thought" approach in generative AI. The company achieved this breakthrough by training Meta’s Llama 3.1 open-source AI model, which uses only 70 billion parameters, to match the accuracy of the much larger 405-billion parameter version of Llama.
Chain of Thought for Explainable AI
The concept of chain of thought is based on making the AI model detail the sequence of calculations performed to arrive at a final answer, aiming to achieve "explainable" AI. This approach could potentially increase human confidence in AI’s predictions by revealing the basis for the answers. OpenAI has popularized the chain-of-thought approach with its recently released "o1" large language model.
Cerebras’s Solution: CePO
Cerebras’s response to o1, dubbed "Cerebras Planning and Optimization" (CePO), operates by requiring Llama to produce a plan to solve the given problem step-by-step, execute the plan repeatedly, analyze the responses to each execution, and then select a "best of" answer. Unlike a traditional LLM, CePO looks at its own code and checks for syntax errors, logic, and whether it accomplishes what the user asks for, and then runs a logic loop of plan execution and cross-checking multiple times.
Breakthrough Results
The company was able to match or exceed the 405B model of Llama 3.1 in various benchmark tests, including the CRUX test of complex reasoning tasks and the LiveCodeBench for code generation challenges. Additionally, Cerebras was able to take the latest Llama version, 3.3, and make it perform at the level of frontier large language models such as Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4 Turbo.
Advantages of Cerebras’s Approach
The chain-of-thought version of 3.1 70B is the only reasoning model that runs in real-time on Cerebras’s CS-3 machines. In contrast, OpenAI’s o1 runs in minutes. Cerebras claims that its CS-3 machines are 16 times faster than the fastest service on GPU chips, processing 2,100 tokens per second.
Training a Trillion-Parameter Model
Cerebras also announced that it has shown "initial" training of a large language model with one trillion parameters in a research project conducted with Sandia National Laboratories. The work was done on a single CS-3 machine, combined with Cerebras’s purpose-built memory computer, the MemX. The MemX was boosted to 55 terabytes of memory to hold the parameter weights of the model, which were then streamed to the CS-3 over Cerebras’s dedicated networking computer, the SwarmX.
Programming and Memory Efficiency
The CS-3 system, Cerebras claims, would replace 287 of Nvidia’s top-of-the-line "Grace Blackwell 200" combo CPU and GPU chips needed to access equivalent memory. The combination of the one CS-3 and the MemX takes up two standard telco racks of equipment, using less than one percent of the space and power of the equivalent GPU arrangement. The MemX device uses commodity DRAM, known as DDR-5, in contrast to the GPU cards that have more expensive "high-bandwidth memory," or HBM.
Conclusion
Cerebras’s breakthrough in chain of thought demonstrates the potential for smaller AI models to equal or surpass larger models, while also showcasing the company’s innovative approach to programming and memory efficiency. The implications of this technology are significant, as it could potentially increase human confidence in AI’s predictions and reduce the need for large, complex models.
FAQs
Q: What is chain of thought in AI?
A: Chain of thought is an approach in generative AI that makes the AI model detail the sequence of calculations performed to arrive at a final answer, aiming to achieve "explainable" AI.
Q: What is Cerebras’s response to OpenAI’s o1?
A: Cerebras’s response is called "Cerebras Planning and Optimization" (CePO), which operates by requiring Llama to produce a plan to solve the given problem step-by-step, execute the plan repeatedly, analyze the responses to each execution, and then select a "best of" answer.
Q: What are the advantages of Cerebras’s approach?
A: Cerebras’s chain-of-thought version of 3.1 70B is the only reasoning model that runs in real-time on Cerebras’s CS-3 machines, and the company claims that its CS-3 machines are 16 times faster than the fastest service on GPU chips.

