Small Yet Mighty: IBM’s New Generative AI Models

Introducing IBM Granite Generation 3

Optimized Performance with Speculative Decoding

IBM has released the third generation of IBM Granite, a collection of open language models and complementary tools. The latest Granite models meet or exceed the performance of leading similarly sized open models across both academic and enterprise benchmarks.

The developer-friendly Granite 3.0 generative AI models are designed for function calling, supporting tool-based use cases. They were developed as workhorse enterprise models capable of serving as the primary building block of sophisticated workflows across use cases including text generation, agentic AI, classification, tool calling, summarization, entity extraction, customer service chatbots, and more.

Granite 3.0 Models

The Granite 3.0 release comprises of:

Dense, text-only LLMs: Granite 3.0 8B, Granite 3.0 2B
Mixture of Experts (MoE) LLMs: Granite 3.0 3B-A800M, Granite 1B-A400M
LLM-based input-output guardrail models: Granite Guardian 8B, Granite Guardian 2B

Granite’s First MoE Models

IBM Granite Generation 3 also includes Granite’s first MoE models, Granite-3B-A800M-Instruct and Granite-1B-A400-Instruct. Trained on over 10 trillion tokens of data, the Granite MoE models are ideal for deployment in on-device applications or situations requiring extremely low latency.

Granite Guardian: Leading Safety Guardrails

The new Guardian 3.0 8B and Granite Guardian 3.0 2B are variants of their respective correspondingly sized base pre-trained Granite models, fine-tuned to evaluate and classify model inputs and outputs into various categories of risk and harm dimensions, including jailbreaking, bias, violence, profanity, sexual content, and unethical behavior.

Deploy Granite Models Anywhere with NVIDIA NIM

NVIDIA has partnered with IBM to offer the Granite family of models through NVIDIA NIM – a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing across clouds, data centers, and workstations.

Get Started

Experience the Granite models with free NVIDIA cloud credits. You can start testing the model at scale and build a proof of concept (POC) by connecting your application to the NVIDIA-hosted API endpoint running on a fully accelerated stack.

Conclusion

IBM Granite Generation 3 offers a new level of performance, safety, and scalability for enterprise AI applications. With its optimized architecture, speculative decoding, and MoE models, Granite 3.0 is poised to revolutionize the way businesses build and deploy AI models.

FAQs

Q: What is IBM Granite Generation 3?
A: IBM Granite Generation 3 is a collection of open language models and complementary tools that meet or exceed the performance of leading similarly sized open models across both academic and enterprise benchmarks.

Q: What are the key features of Granite 3.0?
A: Granite 3.0 models are designed for function calling, supporting tool-based use cases, and are trained on over 12 trillion tokens of data.

Q: What are the advantages of using MoE models?
A: MoE models are ideal for deployment in on-device applications or situations requiring extremely low latency, and can be trained on large datasets.

Q: What is the purpose of Granite Guardian models?
A: Granite Guardian models are designed to evaluate and classify model inputs and outputs into various categories of risk and harm dimensions, including jailbreaking, bias, violence, profanity, sexual content, and unethical behavior.

Q: How do I get started with Granite models?
A: You can start testing the model at scale and build a proof of concept (POC) by connecting your application to the NVIDIA-hosted API endpoint running on a fully accelerated stack, and visit the documentation page to download the models and deploy on any NVIDIA GPU-accelerated workstation, data center, or cloud platform.

Post Views: 41

Small Yet Mighty: IBM’s New Generative AI Models

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Generate single title from this title Nvidia shares fall as blockbuster results fail to dazzle in 100 -150 characters. And it must return only...

Generate single title from this title It exposed what was already broken in 100 -150 characters. And it must return only title i dont...

What is a Performance Review + Definition?

LEAVE A REPLY Cancel reply

Latest

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Categories

Useful Links

Our Newsletter