MLCommons releases MLPerf Training 4.0 results and unveils two new benchmarks

MLCommons released MLPerf Training v4.0 showcasing major performance gains. Remarkably, NVIDIA set new records using software optimizations on its Hopper architecture. MLPerf Training 4.0 introduces novel benchmarks for LoRA-based fine-tuning and graph neural network training.

MLCommons has released the latest MLPerf Training v4.0 benchmark suite results, showcasing substantial performance gains across machine learning workloads from various participants. This edition of the MLPerft Training counted with over 200 performance results from 17 submitters, including newcomers Juniper Networks, Oracle, SMC, and tiny corp. Unsurprisingly, nearly all of the most notable results belong to NVIDIA, with a couple of intermissions by Google's TPU-v5p, and a single appearance of Intel's Gaudi 2, and AMD's Radeon RX 7900 XTX.

Something remarkable about NVIDIA's feat is that in addition to claiming to have set new generative AI training performance records, it has done so using the Hopper architecture, the same core hardware platform it tested for the MLPerf Training 3.1 results submitted last year. Perhaps unexpectedly, the MLPerf Training became the ideal stage for NVIDIA to showcase its capabilities to deliver enhanced hardware performance by delivering continued software innovation. The software improvements involved in the latest MLPerf Training results include optimized FP8 kernels, a new FP8-aware distributed optimizer, an optimized FlashAttention implementation in cuDNN, more effective overlapped execution of math operations and GPU-to-GPU communication operations, and intelligent power allocation within the H100 GPUs to maximize Tensor Core throughput.

The latest edition of the MLPerf Training also saw the addition of two new benchmarks, one for low-rank adaptation (LoRA) fine-tuning, and one targeting graph neural networks (GNNs) training. The benchmark uses the 70B parameter LLama 2 model and the Scrolls dataset to evaluate document summarization quality using the ROUGE metric. The GNN benchmark evaluates performance on multi-label node classification with nearly 3,000 classes. MLPerf Training team submitted the R-GAT model underlying this benchmark to the Illinois Graph Benchmark (IGB) leaderboard, which the R-GAT model currently tops for accuracy, with a 72% rating.

Subscribe

MLCommons releases MLPerf Training 4.0 results and unveils two new benchmarks

Comments

Read Next

Prometheus raises $12B to build an AI to automate physical manufacturing processes

Niteshift raises $7M to build the cloud infrastructure layer for AI coding agents

PhysicsX raises $300M Series C at $2.4B valuation to scale AI for engineering and manufacturing

Suno raised a $400M Series D at a $5.4B valuation despite ongoing lawsuits

Codex now boasts plugins for white-collar work and other new features for Enterprise users