GPU POD Solution

AI-ready supercomputing infrastructure solution for all workloads at scale

Scalable AI incorporates best of breed compute, networking, storage, power, and cooling to deliver the fastest application performance and meet the demands of evolving AI workloads.

Providing the computational power to train deep learning models

The AMAX GPU POD with NVIDIA A100 GPUs is an artificial intelligence (AI) supercomputing infrastructure, providing the computational power necessary to training today’s state-of-the-art deep learning (DL) models and to fuel innovation well into the future. The AMAX GPU POD delivers groundbreaking performance and is designed to solve the world’s most challenging computational problems.

This GPU POD reference architecture is the result of co-design between data scientists, application performance engineers, and system architects to build a system capable of supporting the widest range of deep learning workloads.

A100_3QTR_Right-header

Powered by

NVIDIA A100 TENSOR CORE GPU

The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing (HPC) to tackle the world’s toughest computing challenges. As the engine of the NVIDIA data center platform, A100 can efficiently scale to thousands of GPUs or, with NVIDIA Multi-Instance GPU (MIG) technology, be partitioned into seven GPU instances to accelerate workloads of all sizes.

GPU POD Building Blocks

AMAX AceleMax™ DGS-428A

Each AceleMax DGS-428A system with flexible configuration supports up to eight NVIDIA Tensor Core A100 GPUs, powered byAMD EPYC™ 7002 series dual-socket processors in a 4U form factor. It delivers up to 2X the performance and 4X the floaing-point capability compared to the previous 7001 generation.

The AceleMax DGS-428A features up to 11 PCIe 4.0 slots and up to 160 PCIe lanes for compute, graphics, storage and networking expansion. PCIe 4.0 provides transfer speed of up to 16 GT/s – double the bandwidth of PCIe 3.0 – and delivers lower power consumption, better lane scalability and backwards compatibility.

 PRODUCT DETAILS

Mellanox InfiniBand Network

 

Mellanox provides the world’s smartest switches, enabling in-network computing through the Co-Design Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) ™ technology. The QM8700 series has the highest fabric performance available in the market with up to 16Tb/s of non-blocking bandwidth with sub 130ns port-to-port latency.

For this reference architecture, the StorMax® storage to the AceleMax DGS-428A systems by two Mellanox HDR InfiniBand (for high-availability) network to provide the most efficient scalability of the GPU workloads and datasets. Built with Mellanox’s Quantum InfiniBand switch device, the QM8700 series provides up to forty 200Gb/s full bi-directional bandwidth per port.

AMAX StorMax® Storage Systems

 

AMAX, together with Excelero, are delivering StorMax® all-flash and hybrid flash storage solutions, featuring 200Gb/s NVMe over Fabrics on InfiniBand with NVIDIA® Mellanox® ConnectX-6 adapters. StorMax® platforms are the highest performance, most secure and extremely flexible architectures in the market, with unmatched price-performance that accelerates all AI computing, database, big data analytics, cloud, web 2.0, and video processing workloads.

StorMax A-1110NV (1U) and StorMax A-2440 (2U) offer two ports of 200Gb/s InfiniBand and Ethernet connectivity, sub- 600 nanosecond latency, and 215 million messages per second. The two systems deliver low-latency distributed block storage for web-scale applications, enabling shared NVMe across any network and supports any local or distributed file system. These StorMax® solutions feature an intelligent management layer that abstracts underlying hardware with CPU offload, creates logical volumes with redundancy, and provides centralized, intelligent management and monitoring.

All applications benefit from the ultra-low latency, extremely high throughput and high IOPs of a local NVMe device with the convenience of centralized storage while avoiding proprietary hardware lock-in and reducing the overall TCO.

Contact us to learn more or to request a quote.

GPU POD Reference Architecture

Designed for any dataset size, GPU POD enables training at vastly improved performance in three deployment options.

SMALL REFERENCE ARCHITECTURE: 61.44 TB Raw

SM-GPU-POD-Reference

GPU Server:

  • 1x AceleMax DGS-428A
  • 4x A100 NVIDIA DPUs
  • 5x Mellanox ConnectX-6 VPI HDR/200GbE dual-port adapters

 

Networking:

  • 1x Mellanox QM8700 Switch
Performance Reads Writes
Bandwidth 20 GB/s 7.5 GB/s
IOPS 5M 340K
Latency 95µs 21µs

High-Performance Storage:

  • 1x StorMax® A-1110NV
  • 1x AMD EPYC 7542 32-core CPUs
  • 128GB RAM (8x 16GB) DDR4-32—DIMMs
  • 2x Mellanox ConnectX-6 VPI HDR/200GbE dual-port adapters
  • 4x Kioxia CM6-R 15.36TB NVMe

MEDIUM REFERENCE ARCHITECTURE: 245.76 TB Raw

MD-GPU-POD-Reference

GPU Server:

  • 2x AceleMax DGS-428A, each with:
  • 4x A100 NVIDIA DPUs
  • 5x Mellanox ConnectX-6 VPI HDR/200GbE dual-port adapters

 

Networking:

  • 2x Mellanox QM8700 Switch
Performance Reads Writes
Bandwidth 40 GB/s 15 GB/s
IOPS 10M 680K
Latency 95µs 21µs

High-Performance Storage:

  • 2x StorMax® A-1110NV
  • 1x AMD EPYC 7542 32-core CPUs
  • 128GB RAM (8x 16GB) DDR4-32—DIMMs
  • 2x Mellanox ConnectX-6 VPI HDR/200GbE dual-port adapters
  • 4x Kioxia CM6-R 15.36TB NVMe

LARGE REFERENCE ARCHITECTURE: 368.64 TB Raw

LG-GPU-POD-Reference
Performance Reads Writes
Bandwidth 160 GB/s 46 GB/s
IOPS 30M 2M
Latency 95µs 21µs

GPU Server:

  • 4x AceleMax DGS-428A, each with:
  • 4x A100 NVIDIA DPUs
  • 6x Mellanox ConnectX-6 VPI HDR/200GbE dual-port adapters

 

Networking:

  • 2x Mellanox QM8700 Switch

 

High-Performance Storage:

  • 1x StorMax® A-2440 (2U4N), each includes:
  • 1x AMD EPYC 7542 32-core CPUs
  • 128GB RAM (8x 16GB) DDR4-32—DIMMs
  • 2x Mellanox ConnectX-6 VPI HDR/200GbE dual-port adapters
  • 24x Kioxia CM6-R 15.36TB NVMe

Contact us to learn more or to request a quote.

Get in touch