GPU POD Solutions

AI-ready supercomputing infrastructure solution for all workloads at scale

Scalable AI incorporates best of compute, networking, storage, power, and cooling to deliver the fastest application performance and meet the demands of evolving AI workloads.

🔨

If you would like to discuss building a GPU POD solution for your Datacenter, please contact us for more information.

Providing the computational power to train deep learning models

The AMAX GPU POD with NVIDIA H100 GPUs is an artificial intelligence (AI) supercomputing infrastructure, providing the computational power necessary to training today’s state-of-the-art deep learning (DL) models and to fuel innovation well into the future. The AMAX GPU POD delivers groundbreaking performance and is designed to solve the world’s most challenging computational problems.

This GPU POD reference architecture is the result of co-design between data scientists, application performance engineers, and system architects to build a system capable of supporting the widest range of deep learning workloads.

NVIDIA H100 Tensor Core GPU

The NVIDIA® H100 Tensor Core GPU provides unprecedented acceleration to power the world’s most elastic data centers for AI, data analytics, and high-performance computing (HPC) applications. The most effective end-to-end AI and HPC data center platform, enabling researchers to deliver real-world results and deploy solutions at

AMAX AceleMax DGS-428A

Each AceleMax DGS-428A system with flexible configuration supports up to eight NVIDIA Tensor Core A100 GPUs, powered by AMD EPYC™ 7003 series dual-socket processors in a 4U form factor.

The AceleMax DGS-428A features up to 11 PCIe 4.0 slots and up to 160 PCIe lanes for compute, graphics, storage and networking expansion. PCIe 4.0 provides transfer speed of up to 16 GT/s – double the bandwidth of PCIe 3.0 – and delivers lower power consumption, better lane scalability and backwards compatibility.

Mellanox QM8700 Network Switch for GPU POD

NVIDIA InfiniBand Network

NVIDIA provides the world’s smartest switches, enabling in-network computing through the Co-Design Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) ™ technology. The QM8700 series has the highest fabric performance available in the market with up to 16Tb/s of non-blocking bandwidth with sub 130ns port-to-port latency.

For this reference architecture, the StorMax® storage to the AceleMax DGS-428A systems by two NVIDIA HDR InfiniBand (for high-availability) network to provide the most efficient scalability of the GPU workloads and datasets. Built with NVIDIA’s Quantum InfiniBand switch device, the QM8700 series provides up to forty 200Gb/s full bi-directional bandwidth per port.

AMAX StorMax® Storage Systems

AMAX, together with Excelero, are delivering StorMax® all-flash and hybrid flash storage solutions, featuring 200Gb/s NVMe over Fabrics on InfiniBand with NVIDIA® ConnectX-6 adapters. StorMax platforms are the highest performance, most secure and extremely flexible architectures in the market, with unmatched price-performance that accelerates all AI computing, database, big data analytics, cloud, web 2.0, and video processing workloads.

StorMax A-1110NV (1U) and StorMax A-2440 (2U) offer two ports of 200Gb/s InfiniBand and Ethernet connectivity, sub- 600 nanosecond latency, and 215 million messages per second. The two systems deliver low-latency distributed block storage for web-scale applications, enabling shared NVMe across any network and supports any local or distributed file system. These StorMax® solutions feature an intelligent management layer that abstracts underlying hardware with CPU offload, creates logical volumes with redundancy, and provides centralized, intelligent management and monitoring.

All applications benefit from the ultra-low latency, extremely high throughput and high IOPs of a local NVMe device with the convenience of centralized storage while avoiding proprietary hardware lock-in and reducing the overall TCO.

GPU POD Reference Architecture

Designed for any dataset size, GPU POD enables training at vastly improved performance in three deployment options.

SMALL REFERENCE ARCHITECTURE: 61.44 TB Raw

GPU Server:

1x AceleMax DGS-428A
4x A100 NVIDIA GPUs
5x NVIDIA ConnectX-6 VPI HDR/200GbE dual-port adapters

Networking:

1x NVIDIA QM8700 Switch

Performance	Reads	Writes
Bandwidth	20 GB/s	7.5 GB/s
IOPS	5M	340K
Latency	95µs	21µs

High-Performance Storage:

1x StorMax® A-1110NV
1x 2nd or 3rd Gen AMD EPYC™ Processor
128GB RAM (8x 16GB) DDR4-32—DIMMs
2x NVIDIA ConnectX-6 VPI HDR/200GbE dual-port adapters
4x Kioxia CM6-R 15.36TB NVMe

MEDIUM REFERENCE ARCHITECTURE: 245.76 TB Raw

GPU Server:

2x AceleMax DGS-428A, each with:
4x A100 NVIDIA GPUs
5x NVIDIA ConnectX-6 VPI HDR/200GbE dual-port adapters

Networking:

2x NVIDIA QM8700 Switch

Performance	Reads	Writes
Bandwidth	40 GB/s	15 GB/s
IOPS	10M	680K
Latency	95µs	21µs

High-Performance Storage:

2x StorMax® A-1110NV
1x 2nd or 3rd Gen AMD EPYC™ Processor
128GB RAM (8x 16GB) DDR4-32—DIMMs
2x NVIDIA ConnectX-6 VPI HDR/200GbE dual-port adapters
4x Kioxia CM6-R 15.36TB NVMe

LARGE REFERENCE ARCHITECTURE: 368.64 TB Raw

Performance	Reads	Writes
Bandwidth	160 GB/s	46 GB/s
IOPS	30M	2M
Latency	95µs	21µs

GPU Server:

4x AceleMax DGS-428A, each with:
4x A100 NVIDIA GPUs
6x NVIDIA ConnectX-6 VPI HDR/200GbE dual-port adapters

Networking:

2x NVIDIA QM8700 Switch

High-Performance Storage:

1x StorMax® A-2440 (2U4N), each includes:
1x 2nd or 3rd Gen AMD EPYC™ Processor
128GB RAM (8x 16GB) DDR4-32—DIMMs
2x NVIDIA ConnectX-6 VPI HDR/200GbE dual-port adapters
24x Kioxia CM6-R 15.36TB NVMe

🔨

If you would like to discuss building a GPU POD solution for your Datacenter, please contact us for more information.