AceleMax DGS-1216AS

Dual AMD EPYC Processors 16x NVIDIA A100 GPU Server

  • 16x NVIDIA A100 GPUs with 1,280 GB total GPU memory
  • 6x NVIDIA NVSwitches
  • 9.6 TB/s total aggregate bandwidth
  • 2nd Generation NVIDIA NVSwitch

Request a Quote

Purpose-Built for the Convergence of Simulation, Data Analytics, and AI

Massive datasets, exploding model sizes, and complex simulations require multiple GPUs with extremely fast interconnections. The NVIDIA HGX™ platform brings together the full power of NVIDIA GPUs, NVIDIA® NVLink®, NVIDIA Mellanox® InfiniBand® networking, and a fully optimized NVIDIA AI and HPC software stack from NGC™ to provide highest application performance. With its end-to-end performance and flexibility, NVIDIA HGX enables researchers and scientists to combine simulation, data analytics, and AI to advance scientific progress. With a new generation of A100 80GB GPUs, a single HGX A100 now has up to 1.3 terabytes (TB) of GPU memory and a world’sfirst 2 terabytes second (TB/s) of memory bandwidth, delivering unprecedented acceleration for emerging workloads, fueled by exploding model sizes and massive data-sets.


Third-Generation NVIDIA NVLink Creates a Single Super GPU

Scaling applications across multiple GPUs requires extremely fast movement of data. The third generation of NVIDIA NVLink in the NVIDIA A100 Tensor Core GPU doubles the GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/s), almost 10X higher than PCIe Gen4. Third-generation NVLink is available in four-GPU and eight-GPU HGX A100 servers from leading computer makers.


Multi-Instance GPU (MIG) Delivers Seven Accelerators in a Single GPU

Every AI and HPC application can benefit from acceleration, but not every application needs the performance of a full A100 Tensor Core GPU. With MIG, each A100 can be partitioned into as many as seven GPU instances, fully isolated at the hardware level with their own high-bandwidth
memory, cache, and compute cores. This allows HGX A100 systems to offer up to 112 GPU instances, giving developers access to breakthrough speed for every application, big and small, with guaranteed quality of service.


With A100 80GB, seven MIGs can be configured with 10 GB each (double the size of A100 40GB MIGs), making it now possible to perform inference on batch-size constrained models like BERT-LARGE (a natural language processing model with superhuman understanding) at much higher
batch sizes, delivering up to a 1.3X increase in throughput.

Design Versatility to Suit Any Workload

NVIDIA HGX™ A100 delivers a best-in-class server platform through GPU baseboards and a design guide that provides different configuration options. This allows unmatched versatility, enabling server manufacturers to build a range of CPU and GPU systems or cloud instances ideal for different workloads.


Third-Generation Tensor Cores Redefine the Future of AI and HPC

First introduced in the NVIDIA Volta™ architecture, NVIDIA Tensor Core technology has brought AI training times down from weeks to hours and provided massive acceleration to inference operations. The third generation of Tensor Cores in the NVIDIA Ampere architecture builds upon these innovations by providing up to 20X more floating operations per second (FLOPS) for AI applications and up to 2.5X more FLOPS for FP64 HPC applications.


NVIDIA HGX A100 4-GPU delivers nearly 80 teraFLOPS of FP64performance for the most demanding HPC workloads. NVIDIA HGX A100 8-GPU provides 5 petaFLOPS of FP16 deep learning compute. Abd the HGX A100 16-GPU configuration achieves a staggering 10 petaFLOPS, creating the world’s most powerful accelerated server platform for AI and HPC.



AI, HPC, VDI, machine intelligence, deep learning, machine learning, artificial intelligence, Neural Network, advanced rendering and compute.

8U GPU Chassis (JBOG)

Graphics Processing Unit (GPU):

16x NVIDIA A100 GPUs

NVIDIA Baseboard

2x NVIDIA HGX A100 8-GPU Baseboard

Expansion Slots

16x FHHL PCIe Gen4 x16 slots




8x NVMe U.2 SSDs

Headnode Connection

8x zCD connector (each zCD connector provides x16 PCIe Gen4 lane connectivity)

Power Supply

4+4 Redundant 80 Plus Titanium level Redundant Power Supplies (3,000W  Max. @180V-264Vac, 1500W Max. @100V-127Vac)

Chassis Dimension

352(H) x 447(W) x 948(D) mm

2U Headnode (Two Systems per 8U JBOG)


2x AMD EPYC™ 7002 or 7003 series processor, 7nm, Socket SP3, up to 64 cores, 128 threads, and 256MB L3 cache per processor, up to 240W TDP


32x DIMM Slots, DDR4 RDIMM (Support 2x 32GB NVDIMMs or 4x16GB NVDIMMs , optional)


  • 18x 2.5” hot-swap NVMe U.2 SSD
  • 2x SATA/NVMe M.2 (2280/22110)
  • 2x 2.5” hot-swap SATA/NVMe U.2 SSD




1x TPM 2.0 Module

Rear Panel

  • 1x RJ45 for BMC dedicated management 1x RJ45 Console port
  • 1x VGA
  • 1x UID LED
  • 2x GbE Ethernet RJ45
  • 2x USB 3.0

Front Panel

1x System Healthy LED (OFF/Amber)

JBOG Connection

4x zCD connector (each zCD connector provides x16 PCIe Gen4 lanes connectivity)

Expansion Slot

1x OCP3.0 PCIe Gen4 x16 NIC slot

Power Supply

2x 1600W Redundant (Platinum level certified)

System Cooling

6x System Fan (60x56m)

Chassis Dimension

17.5”(W) x 28.0”(D) x 3.4”(H) (446.6mm x 711.2m x 87.0mm)

Optimized for Turnkey Solutions

Enable powerful design, training, and visualization with built-in software tools including TensorFlow, Caffe, Torch, Theano, BIDMach cuDNN, NVIDIA CUDA Toolkit and NVIDIA DIGITS.