How Cerebras Boosted Meta’s Llama to Frontier Model Performance

Cerebras Makes Breakthrough in AI with Chain of Thought

Cerebras Systems Announces Major Advancement in Generative AI

Cerebras Systems has made a significant announcement at the annual NeurIPS conference on AI, demonstrating the ability to make a smaller AI model equal to or better than a larger model using the increasingly popular "chain of thought" approach in generative AI. The company achieved this breakthrough by training Meta’s Llama 3.1 open-source AI model, which uses only 70 billion parameters, to match the accuracy of the much larger 405-billion parameter version of Llama.

Chain of Thought for Explainable AI

The concept of chain of thought is based on making the AI model detail the sequence of calculations performed to arrive at a final answer, aiming to achieve "explainable" AI. This approach could potentially increase human confidence in AI’s predictions by revealing the basis for the answers. OpenAI has popularized the chain-of-thought approach with its recently released "o1" large language model.

Cerebras’s Solution: CePO

Cerebras’s response to o1, dubbed "Cerebras Planning and Optimization" (CePO), operates by requiring Llama to produce a plan to solve the given problem step-by-step, execute the plan repeatedly, analyze the responses to each execution, and then select a "best of" answer. Unlike a traditional LLM, CePO looks at its own code and checks for syntax errors, logic, and whether it accomplishes what the user asks for, and then runs a logic loop of plan execution and cross-checking multiple times.

Breakthrough Results

The company was able to match or exceed the 405B model of Llama 3.1 in various benchmark tests, including the CRUX test of complex reasoning tasks and the LiveCodeBench for code generation challenges. Additionally, Cerebras was able to take the latest Llama version, 3.3, and make it perform at the level of frontier large language models such as Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4 Turbo.

Advantages of Cerebras’s Approach

The chain-of-thought version of 3.1 70B is the only reasoning model that runs in real-time on Cerebras’s CS-3 machines. In contrast, OpenAI’s o1 runs in minutes. Cerebras claims that its CS-3 machines are 16 times faster than the fastest service on GPU chips, processing 2,100 tokens per second.

Training a Trillion-Parameter Model

Cerebras also announced that it has shown "initial" training of a large language model with one trillion parameters in a research project conducted with Sandia National Laboratories. The work was done on a single CS-3 machine, combined with Cerebras’s purpose-built memory computer, the MemX. The MemX was boosted to 55 terabytes of memory to hold the parameter weights of the model, which were then streamed to the CS-3 over Cerebras’s dedicated networking computer, the SwarmX.

Programming and Memory Efficiency

The CS-3 system, Cerebras claims, would replace 287 of Nvidia’s top-of-the-line "Grace Blackwell 200" combo CPU and GPU chips needed to access equivalent memory. The combination of the one CS-3 and the MemX takes up two standard telco racks of equipment, using less than one percent of the space and power of the equivalent GPU arrangement. The MemX device uses commodity DRAM, known as DDR-5, in contrast to the GPU cards that have more expensive "high-bandwidth memory," or HBM.

Conclusion

Cerebras’s breakthrough in chain of thought demonstrates the potential for smaller AI models to equal or surpass larger models, while also showcasing the company’s innovative approach to programming and memory efficiency. The implications of this technology are significant, as it could potentially increase human confidence in AI’s predictions and reduce the need for large, complex models.

FAQs

Q: What is chain of thought in AI?
A: Chain of thought is an approach in generative AI that makes the AI model detail the sequence of calculations performed to arrive at a final answer, aiming to achieve "explainable" AI.

Q: What is Cerebras’s response to OpenAI’s o1?
A: Cerebras’s response is called "Cerebras Planning and Optimization" (CePO), which operates by requiring Llama to produce a plan to solve the given problem step-by-step, execute the plan repeatedly, analyze the responses to each execution, and then select a "best of" answer.

Q: What are the advantages of Cerebras’s approach?
A: Cerebras’s chain-of-thought version of 3.1 70B is the only reasoning model that runs in real-time on Cerebras’s CS-3 machines, and the company claims that its CS-3 machines are 16 times faster than the fastest service on GPU chips.

Post Views: 40

How Cerebras Boosted Meta’s Llama to Frontier Model Performance

Generate single title from this title IBM launches AI platform Bob to regulate SDLC costs in 100 -150 characters. And it must return only...

With a swipe of a magnet, microscopic “magno-bots” perform complex maneuvers | MIT News

Generate single title from this title When AI does the work, who does the learning? in 100 -150 characters. And it must return only...

Robotically assembled building blocks could make construction more efficient and sustainable | MIT News

Generate single title from this title Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE in 100 -150 characters. And it must return only...

Generate single title from this title IBM launches AI platform Bob to regulate SDLC costs in 100 -150 characters. And it must return only...

With a swipe of a magnet, microscopic “magno-bots” perform complex maneuvers | MIT News

Generate single title from this title When AI does the work, who does the learning? in 100 -150 characters. And it must return only...

Robotically assembled building blocks could make construction more efficient and sustainable | MIT News

Generate single title from this title Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE in 100 -150 characters. And it must return only...

Generate single title from this title Three ways school districts can build a sustainable AI framework in 100 -150 characters. And it must return...

SmartThings Blog

Generate single title from this title 3 ways students can use AI tools to improve their literacy skills in 100 -150 characters. And it...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title IBM launches AI platform Bob to regulate SDLC costs in 100 -150 characters. And it must return only...

With a swipe of a magnet, microscopic “magno-bots” perform complex maneuvers | MIT News

Generate single title from this title When AI does the work, who does the learning? in 100 -150 characters. And it must return only...

Categories

Useful Links

Our Newsletter