GPU Accelerated Supercomputing / AI / Deep Learning Cluster Solution
AMAX’s ClusterMax™ SuperG GPU computing clusters are powered by the NVIDIA® Tesla™ GPU computing platforms, based on NVIDIA’s V100 GPU Computing Accelerator, the world’s fastest, most advanced, and most efficient data center GPUs ever built.
Our cluster solutions are designed to boost throughput and save money for HPC and hyperscale data centers, delivering performance of up to 100 CPUs in a single Tesla V100 GPU, enabling data scientists, researchers, and engineers to tackle challenges that were once thought impossible.
Key features of Tesla V100:
- Volta Architecture - By pairing CUDA Cores and Tensor Cores within a unified architecture, a single server with Tesla V100 GPUs can replace hundreds of commodity CPU servers for traditional HPC and Deep Learning.
- Tensor Core - Equipped with 640 Tensor Cores, Tesla V100 delivers 120 TeraFLOPS of deep learning performance. That’s 12X Tensor FLOPS for DL Training, and 6X Tensor FLOPS for DL Inference when compared to NVIDIA Pascal™ GPUs.
- Next Generation NVLink - NVIDIA NVLink in Tesla V100 delivers 2X higher throughput compared to the previous generation. Up to eight Tesla V100 accelerators can be interconnected at up to 300 GB/s to unleash the highest application performance possible on a single server.
- Maximum Efficiency Mode - The new maximum efficiency mode allows data centers to achieve up to 40% higher compute capacity per rack within the existing power budget. In this mode, Tesla V100 runs at peak processing efficiency, providing up to 80% of the performance at half the power consumption.
- HBM2 - With a combination of improved raw bandwidth of 900 GB/s and higher DRAM utilization efficiency at 95%, Tesla V100 delivers 1.5X higher memory bandwidth over Pascal GPUs as measured on STREAM.
- Programmability - Tesla V100 is architected from the ground up to simplify programmability. It’s new independent thread scheduling enables finer-grain synchronization and improves GPU utilization by sharing resources among small jobs.
- Delivers up to 92,160 Tensor cores, 737,280 CUDA Cores, 1,080+ Teraflops DP, 2,160+ Teraflops SP, and 17,280+ Teraflops Tensor performance per 42U cluster
- Up to 4,608GB dedicated HBM2 GPU memory
- Supports dual socket 22-core Intel® Xeon® E5-2600 v4 processor series on host systems
- Supports FDR/EDR InfiniBand fabric & real time InfiniBand diagnostics
- Faster communication with InfiniBand using NVIDIA® GPUDirect™ RDMA technology
- Cluster management and GPU monitoring software, including GPU temperature monitoring, fan speed, and power, providing exclusive access to GPUs in a cluster
Complete Cluster Assembly and Set Up Services:
- Fully integrated and pre-packaged turnkey HPC solution, including HPC professional services and support, expert installation and setup of rack-optimized cluster nodes, cabling, rails, and other peripherals
- Configuration of cluster nodes and the network
- Installation of applications and client computers to offer a comprehensive solution for your IT needs
- Rapid deployment
- Server management options include Standards-based IPMI or AMAX remote server management
- Seamless standard and custom application integration and cluster installation
- Cluster management options include a choice of commercial and open source software solutions
- Supports a variety of UPS and PDU configuration and interconnect options, including Infiniband (FDR, EDR), Fibre channel, and Ethernet (Gigabit, 10GbE, 40GbE, 25GbE, 100GbE)
- Energy efficient cluster cabinets, high performance UPS and power distribution units for expert installation and setup of rack-optimized nodes, cabling, rails, and other peripherals
Rack Level Verification
- Performance and Benchmark Testing (HPL)
- ATA rack level stress test
- Rack Level Serviceability
- Ease of Deployment Review
- MPI jobs over IB for HPC
- GPU stress test using CUDA
- Cluster management
Large Scale Rack Deployment Review
- Scalability Process
- Rack to Rack Connectivity
- Multi-Cluster Testing
- Software/Application Load
Optional Cluster System Software Installed:
- Microsoft Windows Server 2016
- Bright Computing Cluster Manager
- SuSE / Red Hat Enterprise Linux
- C-based software development tools, CUDA 9.x Toolkit and SDK, and various libraries for CPU GPU clusters
GPU Software Development Tools
|Model #||ClusterMax™ SuperG-V100.14U-4||ClusterMax™ SuperG-V100.24U-8||ClusterMax™ SuperG-V100.42U-16||ClusterMax™ SuperG-V100.42U-36|
|Configurations||4x 1U GPU Compute Nodes||8x 1U GPU Compute Nodes||16x 1U GPU Compute Nodes||36x 1U GPU Compute Nodes|
|GPU Node CPU Support||2x Intel® Xeon® Processor Scalable Family||2x Intel® Xeon® Processor Scalable Family||2x Intel® Xeon® Processor Scalable Family||2x Intel® Xeon® Processor Scalable Family|
|GPU Node Memory Support||Up to 512GB DDR4 2666/2400/2133 MHz ECC reg memory||Up to 512GB DDR4 2666/2400/2133 MHz reg memory||Up to 512GB DDR4 2666/2400/2133 MHz reg memory||Up to 512GB DDR4 2666/2400/2133 MHz reg memory|
|Included GPU / Node||4x Tesla V100, with 16 GPUs per cluster||4x Tesla V100, with 32 GPUs per cluster||4x Tesla V100, with 64 GPUs per cluster||4x Tesla V100, with 144 GPUs per cluster|
|Included GPU Memory (32GB per GPU)||512GB||1,024GB||2,304GB||4,608GB|
|# of Tensor Cores Included||10,240||20,480||40,960||92,160|
|# of GPU Cores Included||81,920||163,840||327,680||737,280|
|Double Precision Performance Included||112+ Teraflops||224+ Teraflops||448+ Teraflops||1,008+ Teraflops|
|Single Precision Performance Included||224+ Teraflops||448+ Teraflops||896+ Teraflops||2,016+ Teraflops|
|Tensor Performance Included||1,792+ Teraflops||3,584+ Teraflops||7,168+ Teraflops||16,128+ Teraflops|
|GPU Nodes Interconnectivity||InfiniBand||InfiniBand||InfiniBand||InfiniBand|
|GPU Node Storage||Up to 4x hot-swap 2.5" SATA/SSD||Up to 4x hot-swap 2.5" SATA/SSD||Up to 4x hot-swap 2.5" SATA/SSD||Up to 4x hot-swap 2.5" SATA/SSD|
|Master Node||1x 1U Master Node||1x 1U Master Node||1x 1U Master Node||1x 1U Master Node|
|Master Node CPU Support||2x Intel® Xeon® Processor Scalable Family||2x Intel® Xeon® Processor Scalable Family||2x Intel® Xeon® Processor Scalable Family||2x Intel® Xeon® Processor Scalable Family|
|Master Node Memory Support||Up to 512GB DDR4 2666/2400/2133 MHz ECC reg memory||Up to 512GB DDR4 2666/2400/2133 MHz reg memory||Up to 512GB DDR4 2666/2400/2133 MHz reg memory||Up to 512GB DDR4 2666/2400/2133 MHz reg memory|
|Master Node Storage||2x 2.5" Hot-swap & 2x 2.5" internal||2x 2.5" Hot-swap & 2x 2.5" internal||2x 2.5" Hot-swap & 2x 2.5" internal||2x 2.5" Hot-swap & 2x 2.5" internal|
|Master Node Interconnectivity||InfiniBand||InfiniBand||InfiniBand||InfiniBand|
|Network Switch||1x 24-port Gigabit Ethernet 1x 18-Port FDR/EDR InfiniBand||1x 24-port Gigabit Ethernet 1x 18-Port FDR/EDR InfiniBand||1x 24-port Gigabit Ethernet 1x 18 -Port FDR/EDR InfiniBand||1x 48-port Layer 2 Gigabit Ethernet 1x 36-port FDR/EDR (40 Gbps) InfiniBand|
|Reference #||Q706943 + Q706940 + Q706944||Q706945 + Q706940 + Q706946||Q706947 + Q706940 + Q706948|