Date:

NVIDIA Blackwell Achieves Next-Level MLPerf Training Performance

Generative AI applications that use text, computer code, protein chains, summaries, video, and even 3D graphics require data-center-scale accelerated computing to efficiently train the large language models (LLMs) that power them.

Leaps and Bounds With Blackwell

The first Blackwell training submission to the MLCommons Consortium highlights how the architecture is advancing generative AI training performance. For instance, the architecture includes new kernels that make more efficient use of Tensor Cores. Kernels are optimized, purpose-built math operations like matrix-multiplies that are at the heart of many deep learning algorithms.

Blackwell’s higher per-GPU compute throughput and significantly larger and faster high-bandwidth memory allows it to run the GPT-3 175B benchmark on fewer GPUs while achieving excellent per-GPU performance. Taking advantage of larger, higher-bandwidth HBM3e memory, just 64 Blackwell GPUs were able to run in the GPT-3 LLM benchmark without compromising per-GPU performance. The same benchmark run using Hopper needed 256 GPUs.

Relentless Optimization

NVIDIA platforms undergo continuous software development, racking up performance and feature improvements in training and inference for a wide variety of frameworks, models, and applications. In this round of MLPerf training submissions, Hopper delivered a 1.3x improvement on GPT-3 175B per-GPU training performance since the introduction of the benchmark.

NVIDIA also submitted large-scale results on the GPT-3 175B benchmark using 11,616 Hopper GPUs connected with NVIDIA NVLink and NVSwitch high-bandwidth GPU-to-GPU communication and NVIDIA Quantum-2 InfiniBand networking. NVIDIA Hopper GPUs have more than tripled scale and performance on the GPT-3 175B benchmark since last year. In addition, on the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA increased performance by 26% using the same number of Hopper GPUs, reflecting continued software enhancements.

Partnering Up

NVIDIA partners, including system makers and cloud service providers like ASUSTek, Azure, Cisco, Dell, Fujitsu, Giga Computing, Lambda Labs, Lenovo, Oracle Cloud, Quanta Cloud Technology, and Supermicro also submitted impressive results to MLPerf in this latest round. A founding member of MLCommons, NVIDIA sees the role of industry-standard benchmarks and benchmarking best practices in AI computing as vital.

Conclusion

The latest MLPerf results demonstrate the impressive performance and capabilities of NVIDIA’s Blackwell and Hopper platforms in training large language models. With continuous software development and optimization, NVIDIA’s platforms are well-positioned to meet the demands of the growing AI ecosystem.

FAQs

Q: What is the Blackwell platform?
A: The Blackwell platform is a new architecture that is advancing generative AI training performance.

Q: What is the GPT-3 175B benchmark?
A: The GPT-3 175B benchmark is a standardized test that measures the performance of large language models.

Q: What is the difference between Blackwell and Hopper platforms?
A: Blackwell has higher per-GPU compute throughput and larger and faster high-bandwidth memory compared to Hopper.

Q: What is the significance of the MLCommons Consortium?
A: The MLCommons Consortium is an industry organization that creates standardized, unbiased, and rigorously peer-reviewed testing for industry participants.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here