Date:

NVIDIA Blackwell Achieves Massive MLPerf Inference v5.0 Performance Gains

Blackwell Sets the New Performance Standard in MLPerf

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency requirements, and, most recently, AI reasoning. At the same time, as AI adoption grows, the ability of an AI factory to serve as many users as possible, all while maintaining good per-user experiences, is key to maximizing the value it generates. Achieving high inference throughput and low inference latency on the latest models requires excellence across the entire technology stack – spanning silicon, network systems, and software.

MLPerf Inference v5.0

MLPerf Inference v5.0 is the latest in a long-running benchmark suite that measures inference throughput across a range of different models and use cases. First introduced in 2019, MLPerf Inference has been continually updated with new models and scenarios to ensure that it remains a useful tool for measuring the inference performance of AI computing platforms.

New Benchmarks

This round adds three new benchmarks:

  • Llama 3.1 405B: A 405-billion-parameter dense LLM. For the server scenario, the benchmark sets latency requirements of 6 seconds for time to first token (TTFT) and 175 ms for time per output token (TPOT).
  • Llama 2 70B Interactive: A 70-billion-parameter dense LLM. This workload is based on the same Llama 2 70B model that was first introduced in MLPerf Inference v4.0, but features more stringent latency constraints of 450 ms TTFT and 40 ms TPOT (25 tokens per second per user).
  • Relational Graph Attention Network (R-GAT): A graph neural network (GNN) benchmark. GNNs are applied in a wide range of domains, including social network analysis, drug discovery, fraud detection, and molecular chemistry.

NVIDIA Performance

NVIDIA submitted results on every benchmark in the data center category, delivering outstanding performance across the board, including new performance results on the newly-added Llama 3.1 405B, Llama 2 70B Interactive, and GNN tests. This round, NVIDIA also submitted many results on the Blackwell architecture, using both the NVIDIA GB200 NVL72 as well as NVIDIA DGX B200, which delivered substantial speedups over the prior-generation NVIDIA Hopper architecture.

Blackwell Sets the New Performance Standard in MLPerf

The NVIDIA Blackwell architecture, introduced at NVIDIA GTC 2024, is in full production, with availability from major cloud service providers and a broad number of server makers. Blackwell incorporates many architectural innovations – including second-generation Transformer Engine, fifth-generation NVLink, FP4 and FP6 precisions, and more – that enable dramatically higher performance for both training and inference.

Hopper Continues to Deliver Outstanding Performance

The Hopper platform, first introduced in March of 2022, continued to deliver outstanding inference performance on every benchmark in MLPerf Inference v5.0, including on the newly added Llama 3.1 405B and Llama 2 70B Interactive benchmarks.

Wrapping Up

The NVIDIA Hopper platform delivers leadership performance in both training, as shown in the most recent round of MLPerf Training, as well as in MLPerf Inference, as these results show. Hopper remains an industry-leading platform three years after it was first launched, and with continued full-stack optimization, it continues to deliver performance increases on existing AI use cases and support new ones, offering high longevity.

Acknowledgments

The work of many NVIDIA employees made these outstanding results happen. We would like to acknowledge the tireless efforts of Kefeng Duan, Shengliang Xu, Yilin Zhang, Robert Overman, Shobhit Verma, Viraat Chandra, Zihao Kong, Tin-Yin Lai, and Alice Cheng, among many others.

FAQs

Q: What are the new benchmarks in MLPerf Inference v5.0?
A: The new benchmarks include Llama 3.1 405B, Llama 2 70B Interactive, and Relational Graph Attention Network (R-GAT).

Q: How does NVIDIA’s Blackwell architecture perform in MLPerf Inference v5.0?
A: NVIDIA’s Blackwell architecture delivers outstanding performance, setting a new standard for performance and energy efficiency.

Q: How does NVIDIA’s Hopper architecture perform in MLPerf Inference v5.0?
A: NVIDIA’s Hopper architecture continues to deliver outstanding performance, achieving leadership results on every benchmark in MLPerf Inference v5.0.

Latest stories

Read More

Skin Deep Proves Doom 3 Engine’s Relevance

The Pros of Using Old Tech Unreal Engine and Unity...

The Meta AI Button

Meta AI Button in WhatsApp: What It Does and...

Rise of the Tech Giant

The Evolution of the Microsoft Brand Today is the 50th...

Artistic Fusions: Blurring Boundaries

Where and When to Experience Asses.Masses Visit the asses.masses website...

Instant Kindle Recap

Amazon Introduces Kindle Recaps to Help You Remember Book...

Midjourney V7

Midjourney Unveils V7 Image Generation Model for AI Community Midjourney...

Gemini 2.5 Pro: Google’s Most Expensive AI Model Yet

Google Releases API Pricing for Gemini 2.5 Pro Pricing Details On...

The Legend of Zelda

What was the most influential video game in history?...

LEAVE A REPLY

Please enter your comment!
Please enter your name here