Introducing IBM Granite Generation 3
Optimized Performance with Speculative Decoding
IBM has released the third generation of IBM Granite, a collection of open language models and complementary tools. The latest Granite models meet or exceed the performance of leading similarly sized open models across both academic and enterprise benchmarks.
The developer-friendly Granite 3.0 generative AI models are designed for function calling, supporting tool-based use cases. They were developed as workhorse enterprise models capable of serving as the primary building block of sophisticated workflows across use cases including text generation, agentic AI, classification, tool calling, summarization, entity extraction, customer service chatbots, and more.
Granite 3.0 Models
The Granite 3.0 release comprises of:
- Dense, text-only LLMs: Granite 3.0 8B, Granite 3.0 2B
- Mixture of Experts (MoE) LLMs: Granite 3.0 3B-A800M, Granite 1B-A400M
- LLM-based input-output guardrail models: Granite Guardian 8B, Granite Guardian 2B
Granite’s First MoE Models
IBM Granite Generation 3 also includes Granite’s first MoE models, Granite-3B-A800M-Instruct and Granite-1B-A400-Instruct. Trained on over 10 trillion tokens of data, the Granite MoE models are ideal for deployment in on-device applications or situations requiring extremely low latency.
Granite Guardian: Leading Safety Guardrails
The new Guardian 3.0 8B and Granite Guardian 3.0 2B are variants of their respective correspondingly sized base pre-trained Granite models, fine-tuned to evaluate and classify model inputs and outputs into various categories of risk and harm dimensions, including jailbreaking, bias, violence, profanity, sexual content, and unethical behavior.
Deploy Granite Models Anywhere with NVIDIA NIM
NVIDIA has partnered with IBM to offer the Granite family of models through NVIDIA NIM – a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing across clouds, data centers, and workstations.
Get Started
Experience the Granite models with free NVIDIA cloud credits. You can start testing the model at scale and build a proof of concept (POC) by connecting your application to the NVIDIA-hosted API endpoint running on a fully accelerated stack.
Conclusion
IBM Granite Generation 3 offers a new level of performance, safety, and scalability for enterprise AI applications. With its optimized architecture, speculative decoding, and MoE models, Granite 3.0 is poised to revolutionize the way businesses build and deploy AI models.
FAQs
Q: What is IBM Granite Generation 3?
A: IBM Granite Generation 3 is a collection of open language models and complementary tools that meet or exceed the performance of leading similarly sized open models across both academic and enterprise benchmarks.
Q: What are the key features of Granite 3.0?
A: Granite 3.0 models are designed for function calling, supporting tool-based use cases, and are trained on over 12 trillion tokens of data.
Q: What are the advantages of using MoE models?
A: MoE models are ideal for deployment in on-device applications or situations requiring extremely low latency, and can be trained on large datasets.
Q: What is the purpose of Granite Guardian models?
A: Granite Guardian models are designed to evaluate and classify model inputs and outputs into various categories of risk and harm dimensions, including jailbreaking, bias, violence, profanity, sexual content, and unethical behavior.
Q: How do I get started with Granite models?
A: You can start testing the model at scale and build a proof of concept (POC) by connecting your application to the NVIDIA-hosted API endpoint running on a fully accelerated stack, and visit the documentation page to download the models and deploy on any NVIDIA GPU-accelerated workstation, data center, or cloud platform.

