In the latest MLPerf Inference V5.0 benchmarks, which reflect some of the most challenging inference scenarios, the NVIDIA Blackwell platform set records — and marked NVIDIA’s first MLPerf submission using the NVIDIA GB200 NVL72 system, a rack-scale solution designed for AI reasoning.
Delivering on the promise of cutting-edge AI takes a new kind of compute infrastructure, called AI factories. Unlike traditional data centers, AI factories do more than store and process data — they manufacture intelligence at scale by transforming raw data into real-time insights. The goal for AI factories is simple: deliver accurate answers to queries quickly, at the lowest cost and to as many users as possible.
The complexity of pulling this off is significant and takes place behind the scenes. As AI models grow to billions and trillions of parameters to deliver smarter replies, the compute required to generate each token increases. This requirement reduces the number of tokens that an AI factory can generate and increases cost per token. Keeping inference throughput high and cost per token low requires rapid innovation across every layer of the technology stack, spanning silicon, network systems and software.
NVIDIA Blackwell Sets New Records
The GB200 NVL72 system — connecting 72 NVIDIA Blackwell GPUs to act as a single, massive GPU — delivered up to 30x higher throughput on the Llama 3.1 405B benchmark over the NVIDIA H200 NVL8 submission this round. This feat was achieved through more than triple the performance per GPU and a 9x larger NVIDIA NVLink interconnect domain.
NVIDIA Hopper AI Factory Value Continues Increasing
The NVIDIA Hopper architecture, introduced in 2022, powers many of today’s AI inference factories, and continues to power model training. Through ongoing software optimization, NVIDIA increases the throughput of Hopper-based AI factories, leading to greater value.
It Takes an Ecosystem
This MLPerf round, 15 partners submitted stellar results on the NVIDIA platform, including ASUS, Cisco, CoreWeave, Dell Technologies, Fujitsu, Giga Computing, Google Cloud, Hewlett Packard Enterprise, Lambda, Lenovo, Oracle Cloud Infrastructure, Quanta Cloud Technology, Supermicro, Sustainable Metal Cloud and VMware.
Conclusion
The NVIDIA Blackwell platform has set new records in the latest MLPerf Inference V5.0 benchmarks, delivering exceptional performance across the board. The Hopper architecture continues to power AI inference factories, with ongoing software optimization leading to greater value. The NVIDIA platform is available across all cloud service providers and server makers worldwide, reflecting the breadth of its reach.
FAQs
Q: What is the NVIDIA Blackwell platform?
A: The NVIDIA Blackwell platform is a rack-scale solution designed for AI reasoning.
Q: What is the NVIDIA Hopper architecture?
A: The NVIDIA Hopper architecture powers many of today’s AI inference factories, and continues to power model training.
Q: What is an AI factory?
A: An AI factory is a new kind of compute infrastructure that manufactures intelligence at scale by transforming raw data into real-time insights.
Q: What are the goals for AI factories?
A: The goal for AI factories is to deliver accurate answers to queries quickly, at the lowest cost and to as many users as possible.
Q: What is the NVIDIA GB200 NVL72 system?
A: The NVIDIA GB200 NVL72 system is a rack-scale solution that connects 72 NVIDIA Blackwell GPUs to act as a single, massive GPU.
Q: What is MLPerf Inference V5.0?
A: MLPerf Inference V5.0 is a peer-reviewed industry benchmark of inference performance that reflects some of the most challenging inference scenarios.

