Building AI Systems with Foundation Models: A Delicate Balance of Resources
Building AI systems with foundation models requires a delicate balancing of resources such as memory, latency, storage, compute, and more. One size does not fit all for developers managing cost and user experience when bringing generative AI capability to the rapidly growing ecosystem of AI-powered applications.
You Need Options for High-Quality, Customizable Models
You need options for high-quality, customizable models that can support large-scale services hosted and deployed across different computing environments, from data centers to edge computing and on-device use cases.
Google DeepMind’s Gemma 3: A New Range of Multimodal and Multilingual Open Models
Google DeepMind just announced Gemma 3, a new range of multimodal and multilingual open models. Gemma 3 consists of a 1B text-only small language model (SLM) and three image-text models in sizes 4B, 12B, and 27B. You can use the models from HuggingFace and demo the 1B model in the NVIDIA API Catalog.
Experiment and Prototype with Optimized Gemma 3 Models
Explore this model in the NVIDIA API Catalog where you can experiment with your own data and configure parameters such as max tokens and sampling values for temperature and top P. The preview also generates the code you’d need in Python, NodeJS, and Bash to integrate the model into your program or workflow. If you’re using LangChain for building agents, connecting external data, or chaining actions, you can use the reusable client generated using the NVIDIA LangChain library.
Next-Level AI for Next-Gen Robotics and Edge Solutions
Each Gemma 3 model can be deployed to the NVIDIA Jetson family of embedded computing boards used for robotics and edge AI applications. The smaller variants, 1B and 4B, can be used on a device as small as the Jetson Nano. The 27B model built for high-demand applications can be served on the Jetson AGX Orin, which supports up to 275 TOPS. For more information, see the latest Jetson Orin Nano Developer Kit announcement.
Ongoing Collaboration of NVIDIA and Google
Google DeepMind and NVIDIA have collaborated on each release of Gemma. NVIDIA has played a key role in optimizing models for GPUs, contributing to JAX, the Python machine learning library, Google’s XLA compiler, OpenXLA, and many more.
Advancing Community Models and Collaboration
NVIDIA is an active contributor to the open-source ecosystem and has released several hundred projects under open-source licenses. NVIDIA is committed to open models such as Gemma that promote AI transparency and let users broadly share work in AI safety and resilience. Using the NVIDIA NeMo platform, these open models can be customized and tuned on proprietary data for AI workflows across any industry.
Get Started Today
Bring your data and try out Gemma on the NVIDIA-accelerated platform at Gemma models in the NVIDIA API Catalog.
Conclusion
Building AI systems with foundation models requires a delicate balance of resources. With Gemma 3, you have options for high-quality, customizable models that can support large-scale services. By leveraging the power of NVIDIA and Google’s collaboration, you can advance community models and collaboration, and get started today.
FAQs
Q: What is Gemma 3?
A: Gemma 3 is a new range of multimodal and multilingual open models announced by Google DeepMind.
Q: What are the different variants of Gemma 3?
A: Gemma 3 consists of a 1B text-only small language model (SLM) and three image-text models in sizes 4B, 12B, and 27B.
Q: Can I use Gemma 3 on edge devices?
A: Yes, each Gemma 3 model can be deployed to the NVIDIA Jetson family of embedded computing boards used for robotics and edge AI applications.
Q: How can I get started with Gemma 3?
A: You can get started with Gemma 3 by creating a free account with the NVIDIA API Catalog, navigating to the Gemma 3 model card, choosing "Build with this NIM" and "Generate API Key," and saving the generated key as NVIDIA_API_KEY.

