Date:

Google claims Gemma 3 reaches 98% of DeepSeek’s accuracy

The Economics of Artificial Intelligence: Google’s Gemma 3

A New Open-Source Large Language Model

The economics of artificial intelligence have been a hot topic of late, with startup DeepSeek AI claiming eye-opening economies of scale in deploying GPU chips. Two can play that game. On Wednesday, Google announced its latest open-source large language model, Gemma 3, came close to achieving the accuracy of DeepSeek’s R1 with a fraction of the estimated computing power.

Gemma 3: A Sweet Spot

Google’s balance of compute and Elo score is a "sweet spot," the company claims. Gemma 3 delivers state-of-the-art performance for its size, outperforming Llama-405B, DeepSeek-V3, and o3-mini in preliminary human preference evaluations on LMArena’s leaderboard. This helps you create engaging user experiences that can fit on a single GPU or TPU host.

Gemma 3 vs. R1: A Comparison

Google’s model also tops Meta’s Llama 3’s Elo score, which it estimates would require 16 GPUs. Note that the numbers of H100 chips used by the competition are Google’s estimate; DeepSeek AI has only disclosed an example of using 1,814 of Nvidia’s less-powerful H800 GPUs to serve answers with R1.

The Gemma 3 Models

Gemma 3 models, intended for on-device usage rather than data centers, have a vastly smaller number of parameters, or neural "weights," than R1 and other open-source models. Generally speaking, the greater the number of parameters, the more computing power is required.

Distillation and Quality Control

The main enhancement to make such efficiency possible is a widely used AI technique called distillation, whereby trained model weights from a larger model are extracted from that model and inserted into a smaller model, such as Gemma 3, to give it enhanced powers. The distilled model is also run through three different quality control measures, including Reinforcement Learning from Human Feedback (RLHF) to shape the output of GPT and other large language models to be inoffensive and helpful; as well as Reinforcement Learning from Machine Feedback (RLMF) and Reinforcement Learning from Execution Feedback (RLEF), which Google says improve the model’s math and coding capabilities, respectively.

Optimization Techniques

To optimize the smallest version, the 1 billion model, for mobile devices, Google uses four common AI engineering techniques: quantization, updating the "key-value" cache layouts, improved loading time of certain variables, and "GPU weight sharing."

Comparison with Gemini Models

Gemma 3 generally falls below the accuracy of Gemini 1.5 and Gemini 2.0, but Google calls the results noteworthy, stating that Gemma 3 is "showing competitive performance compared to closed Gemini models." The main advance of Gemma 3 over Gemma 2 is a longer "context window," the number of input tokens that can be held in memory for the model to work on at any given time.

Gemma 3 and Gemini 2 Comparison

Gemma 2 was only 8,000 tokens whereas Gemma 3 is 128,000, which counts as a "long" context window, better suited for working on whole papers or books. (Gemini and other closed-source models are still much more capable, with a context window of 2 million tokens for Gemini 2.0 Pro.)

Multi-Modal Capabilities and Language Support

Gemma 3 is also multi-modal, which Gemma 2 was not. This means it can handle image inputs along with text to serve up replies to queries such as, "What is in this photo?" Gemma 3 supports over 140 languages rather than just the English support in Gemma 2.

Conclusion

Gemma 3 is a significant step forward in the development of open-source large language models, offering a balance of compute and Elo score that is hard to match. It is suitable for on-device usage and supports a wide range of languages. With its multi-modal capabilities and long context window, Gemma 3 is an exciting development in the field of AI.

FAQs

Q: What is Gemma 3?
A: Gemma 3 is an open-source large language model developed by Google.

Q: How does Gemma 3 compare to R1?
A: Gemma 3 comes close to achieving the accuracy of R1 with a fraction of the estimated computing power.

Q: What is the main enhancement of Gemma 3?
A: The main enhancement of Gemma 3 is the use of distillation, a widely used AI technique to make efficiency possible.

Q: How does Gemma 3 compare to Gemini models?
A: Gemma 3 generally falls below the accuracy of Gemini 1.5 and Gemini 2.0, but is showing competitive performance compared to closed Gemini models.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here