NVIDIA DGX Systems

The World’s First Portfolio of Purpose-Built AI Supercomputers

Developed to meet the demands of AI and analytics, NVIDIA DGX™ Systems are built on the revolutionary NVIDIA Volta™ GPU platform. Combined with innovative GPU-optimized software and simplified management tools, these fully-integrated solutions deliver groundbreaking performance and results. NVIDIA DGX Systems are designed to give data scientists the most powerful tools for AI exploration—from your desk to the data center to the cloud.

NVIDIA DGX Station™

NVIDIA DGX Station™ is the world’s fastest workstation for leading-edge AI development for data science teams. This fully integrated and optimized system enables your team to get started faster and effortlessly experiment with the power of a data center in your office.

Product Details

NVIDIA DGX-1™

Designed to accelerate your data center and streamline your deep learning workflow. Experiment faster, train larger models and get insights starting on day one. Now offered with NVIDIA Volta™, NVIDIA DGX-1 delivers industry-leading performance for AI and deep learning.

Product Details

NVIDIA DGX A100™

With the fastest I/O architecture, NVIDIA DGX A100 is the universal system for all AI infrastructure, from analytics to training to inference. It sets a new bar for compute density, packing 5 petaFLOPS of AI performance with a single, unified system that can do it all.

Product Details

Discover the Benefits

Powered by NVIDIA DGX Software Stack

  • Integrated suite of optimized deep learning software.
  • Simplified workload management

Greater Productivity

  • Save hundreds of thousands of dollars in engineering effort.
  • Avoids months of lost productivity spent on IT.

Accelerated ROI for AI

  • Get started in a day instead of weeks or months.
  • 140X faster deep learning training

Access to AI Expertise

  • Proven results faster.
  • Getting started made simpler.
  • Operational peace of mind.

The New NVIDIA DGX A100

Unprecedented compute density, performance and flexibility in the world’s most advanced AI system. NVIDIA DGX A100 features the world’s most advanced accelerator, the NVIDIA A100 Tensor Core GPU, enabling enterprises to consolidate training, inference, and analytics into a unified, easy-to-deploy AI infrastructure backed by integrated access to AI-fluent experts.

The NVIDIA A100 Tensor Core

Delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing (HPC) to tackle the world’s toughest computing challenges. As the engine of the NVIDIA data center platform, A100 can efficiently scale to thousands of GPUs or, with NVIDIA Multi-Instance GPU (MIG) technology, be partitioned into seven GPU instances to accelerate workloads of all sizes. And third-generation Tensor Cores accelerate every precision for diverse workloads, speeding time to insight and time to market.

Groundbreaking Innovations

NVIDIA AMPERE ARCHITECTURE

A100 accelerates workloads big and small. Whether using MIG to partition an A100 GPU into smaller instances, or NVLink to connect multiple GPUs to accelerate large-scale workloads, A100 can readily handle different-sized acceleration needs, from the smallest job to the biggest multi-node workload. A100’s versatility means IT managers can maximize the utility of every GPU in their data center around the clock.

THIRD-GENERATION TENSOR CORES

A100 delivers 312 teraFLOPS (TFLOPS) of deep learning performance. That’s 20X Tensor FLOPS for deep learning training and 20X Tensor TOPS for deep learning inference compared to NVIDIA Volta™ GPUs.

NEXT-GENERATION NVLINK

NVIDIA NVLink in A100 delivers 2X higher throughput compared to the previous generation. When combined with NVIDIA NVSwitch™, up to 16 A100 GPUs can be interconnected at up to 600 gigabytes per second (GB/sec) to unleash the highest application performance possible on a single server. NVLink is available in A100 SXM GPUs via HGX A100 server boards and in PCIe GPUs via an NVLink Bridge for up to 2 GPUs.

MULTI-INSTANCE GPU (MIG)

An A100 GPU can be partitioned into as many as seven GPU instances, fully isolated at the hardware level with their own high-bandwidth memory, cache, and compute cores. MIG gives developers access to breakthrough acceleration for all their applications, and IT administrators can offer right-sized GPU acceleration for every job, optimizing utilization and expanding access to every user and application.

HBM2

With 40 gigabytes (GB) of high-bandwidth memory (HBM2), A100 delivers improved raw bandwidth of 1.6TB/sec, as well as higher dynamic random-access memory (DRAM) utilization efficiency at 95 percent. A100 delivers 1.7X higher memory bandwidth over the previous generation.

STRUCTURAL SPARSITY

AI networks are big, having millions to billions of parameters. Not all of these parameters are needed for accurate predictions, and some can be converted to zeros to make the models “sparse” without compromising accuracy. Tensor Cores in A100 can provide up to 2X higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also improve the performance of model training.

IS YOUR DATA CENTER READY?

Scale up with DGX SuperPOD

Get your data center infrastructure ready for AI

The NVIDIA DGX SuperPOD™ with NVIDIA DGX™ A100 systems is the next generation artificial intelligence (AI) supercomputing infrastructure, providing the computational power necessary to train today’s state-of-the-art deep learning (DL) models and to fuel innovation well into the future. The DGX SuperPOD delivers groundbreaking performance and is designed to solve the world’s most challenging computational problems.

NVIDIA DGX SuperPOD™ is a reference architecture that incorporates best practices for compute, networking, storage, power, cooling, and more, in an integrated AI infrastructure design built on NVIDIA DGX A100. Most data centers aren’t designed with the growth and unique demands of AI in mind but AMAX can customize and deploy NVIDIA DGX POD for your data center to optimize infrastructure in meeting the rising tide of AI-infused applications.

Features

Component Technology Description
Compute Nodes NVIDIA DGX A100 System 1,120 DGX A100 SXM4 GPUs
45.6 TB of HBM2 memory
366 PFLOPS via Tensor Cores
140 TB System RAM
2.2 PB local NVMe
600 GBps NVLink bandwidth per GPU
4.8 TBps total NVSwitch bandwidth per node
Compute Network NVIDIA Mellanox Quantum QM8790 HDR InfiniBand Smart Switch Full fat-tree network built with eight connections per DGX A100 system
Storage Network NVIDIA Mellanox Quantum QM8790 HDR InfiniBand Smart Switch Fat-tree network with two connections per DGX A100 system
In-band Management Network NVIDIA Mellanox SN3700C switch One connection per DGX A100 system
Out-of-band Management
Network
NVIDIA Mellanox AS4610 switch One connection per DGX A100 system
Management Software DeepOps DGX POD Management Software Software tools for deployment and management of SuperPOD nodes and resource management
Key System Software NVIDIA Magnum IO™ Technology Suite of library technologies that optimize GPU communication performance
NVIDIA CUDA-X™ technology A collection of libraries, tools, and technologies that maximize application performance on
NVIDIA GPUs
User Runtime Environment NGC Containerized DL and HPC applications, optimized for performance
Slurm Orchestration and scheduling of multi-GPU and multi-node jobs

SuperPOD Architecture

The DGX A100 system supports Multi Instance GPU (MIG) partitioning of each DGX A100 GPU. This feature can enhance the productivity of the DGX SuperPOD by:

  • Providing AI research teams the ability to efficiently run thousands of smaller experiments in isolation for each other.
  • Providing enhanced AI inference by supporting thousands of simultaneous inference processes.

Learn more about NVIDIA DGX systems & support.

Contact us

Build work-at-home environments that simplify access to powerful computing resources

Work remotely with GPU accelerated computing

Build Flexibility for Every Industry

EDUCATION

Virtualized classroom help students and professors communicate and collaborate through distance learning.

Read case study ›

DATA SCIENCE

Data scientists can access and analyze massive datasets from work and continue the science at home.

Learn more ›

GOVERNMENT

Government employees can securely access resources using virtual machines.

Read case study ›

FINANCIAL SERVICES

Traders and bankers can continue working anywhere from virtual PCs.

Read case study ›

HEALTHCARE

Medical professionals can quickly access patient data and scans over virtual workstations.

Read case study ›

MANUFACTURING

Designers and engineers can access CAD software and datasets remotely to keep manufacturing schedules moving forward.

Read case study ›