Mar 18, 2024 4 min read

NVIDIA Blackwell

NVIDIA Blackwell
Table of Contents

NVIDIA Unveils New Blackwell GPU Architecture

NVIDIA announced at GTC 2024, its newest GPU architecture NVIDIA Blackwell offering the capability to operate real-time generative AI on large language models with trillions of parameters at 25x less cost and energy consumption.

Contact Us for more details on how the latest NVIDIA technology can transform your organization's AI and computing capabilities.
NVIDIA Blackwell Chip

Innovations Driving the Future of AI and Computing

The Blackwell platform encompasses several technological innovations:

  • World’s Most Powerful Chip: With 208 billion transistors, these GPUs leverage a custom-built 4NP TSMC process. They feature a chip-to-chip link with a 10 TB/second throughput, forming a unified GPU structure.
  • Second-Generation Transformer Engine: Enhanced with micro-tensor scaling support and advanced algorithms, Blackwell doubles the compute and model sizes, introducing new AI inference capabilities.
  • Fifth-Generation NVLink: This latest version delivers 1.8TB/s bidirectional throughput per GPU, facilitating communication among up to 576 GPUs for complex model computations.
  • RAS Engine: Dedicated to reliability, availability, and serviceability, this engine uses AI for preventative maintenance, enhancing system uptime and reducing operating costs.
  • Secure AI: Offers advanced confidential computing to safeguard AI models and customer data, supporting new encryption protocols for industries requiring strict privacy measures.
  • Decompression Engine: This engine accelerates database queries, enhancing performance in data analytics and science, crucial for GPU-accelerated data processing.

The NVIDIA GB200 Grace Blackwell Superchip

The NVIDIA GB200 Grace Blackwell Superchip connects two B200 Tensor Core GPUs to the Grace CPU via a 900GB/s NVLink. When paired with NVIDIA's Quantum-X800 InfiniBand and Spectrum™-X800 Ethernet platforms, it achieves unparalleled AI performance.

Moreover, the GB200 is integral to the NVIDIA GB200 NVL72, a liquid-cooled, rack-scale system designed for intensive compute workloads. This system showcases a remarkable 30x performance increase for LLM inference workloads, significantly lowering cost and energy usage.

NVIDIA's HGX B200 server board, which links eight B200 GPUs, is optimized for x86-based generative AI platforms, supporting networking speeds of up to 400Gb/s. This innovation underscores NVIDIA's commitment to enhancing AI capabilities and efficiency across industries, promising a new era of computing.

The scalability of multi-GPU systems has received a significant boost with the fifth generation of NVLink. It allows a single NVIDIA Blackwell Tensor Core GPU to support up to 18 connections at 100 gigabytes per second each, culminating in a total bandwidth of 1.8 terabytes per second. This enhancement doubles the bandwidth available in the previous generation and surpasses PCIe Gen5 bandwidth by more than 14 times. Server platforms leveraging this technology, such as the GB200 NVL72, can now offer unprecedented scalability for complex large models.

The NVLink Switch Chip is a game-changer, fully enabling GPU-to-GPU connections with a 1.8TB/s bidirectional, direct interconnect. This setup scales multi-GPU input and output within a server, and the NVLink Switch chips link multiple NVLinks, facilitating all-to-all GPU communication at full NVLink speed both within and between racks. Each NVLink Switch also integrates engines for NVIDIA's Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™, accelerating in-network reductions and multicast operations, essential for high-speed collective tasks.


The NVIDIA DGX GB200 NVL72, connects two high-performance NVIDIA Blackwell Tensor Core GPUs and the NVIDIA Grace CPU with the NVLink-Chip-to-Chip (C2C) interface that delivers 900 GB/s of bidirectional bandwidth. With NVLink-C2C, applications have coherent access to a unified memory space.


The NVIDIA DGX GB200 NVL72 offers 36 Grace CPUs and 72 Blackwell GPUs within a single rack-scale design. The liquid-cooled, exaflop-per-rack solution delivers unprecedented real-time capabilities for trillion-parameter large language models (LLMs), setting a new benchmark in the industry.

NVIDIA DGX GB200 NVL72 vs HGX H100 GPT-MoE-1.8T Real-Time Throughput Source: NVIDIA

NVIDIA's GB200 NVL72 redefines what's possible with exascale computing, offering the largest NVLink® domain to date. This enables 130 terabytes per second (TB/s) of low-latency GPU communication, catering to the most demanding AI and high-performance computing (HPC) workloads.


SpecificationGB200 NVL72GB200 Grace Blackwell Superchip
Configuration36 Grace CPU : 72 Blackwell GPUs1 Grace CPU : 2 Blackwell GPU
FP4 Tensor Core1,440 PFLOPS40 PFLOPS
FP8/FP6 Tensor Core720 PFLOPS20 PFLOPS
INT8 Tensor Core720 POPS20 POPS
FP16/BF16 Tensor Core360 PFLOPS10 PFLOPS
TF32 Tensor Core180 PFLOPS5 PFLOPS
FP64 Tensor Core3,240 TFLOPS90 TFLOPS
GPU Memory | BandwidthUp to 13.5 TB HBM3e | 576 TB/sUp to 384 GB HBM3e | 16 TB/s
NVLink Bandwidth130TB/s3.6TB/s
CPU Core Count2,592 Arm® Neoverse V2 cores72 Arm Neoverse V2 cores
CPU Memory | BandwidthUp to 17 TB LPDDR5X | Up to 18.4 TB/sUp to 480GB LPDDR5X | Up to 512 GB/s

Contact us for more details on how the latest NVIDIA technology can transform your organization's AI and computing capabilities.