AceleMax Server with NVIDIA HGX B300

Engineered for Next-Generation AI and HPC

The NVIDIA HGX™ B300 platform is built for the next wave of accelerated computing. Powered by Blackwell Ultra GPUs and high-speed NVLink interconnects, HGX B300 enables massive generative AI models, advanced data analytics, and high-performance simulations at scale. With a fully integrated software stack and support for up to 800 Gb/s networking, it delivers the infrastructure backbone needed to meet growing compute demands across enterprise and cloud environments.

💡

Ready to scale with HGX B300? Connect with AMAX to configure the right solution for your workload.

AMAX Platforms with HGX B300

AceleMax® AXG-828U

View Product

The AceleMax® AXG-828U is AMAX’s 8U platform engineered to support the full capabilities of the NVIDIA HGX B300. Designed for demanding AI training, inference, and simulation workloads, it pairs the performance of 8x Blackwell Ultra GPUs with robust system infrastructure built around dual AMD EPYC™ 9005 Series Processors or dual Intel® Xeon® 6 processors. This platform is optimized for high-throughput, large-memory applications in enterprise and industrial environments.

AceleMax® AXG-928U

View Product

The AceleMax® AXG-928U is a 9U server platform designed for high-power GPU deployments and large-scale AI infrastructure. Equipped with 8x NVIDIA Blackwell Ultra GPUs and dual Intel® Xeon® 6 processors, the AXG-928U offers a modular design optimized for thermal performance, reduced cable complexity, and streamlined assembly.

With built-in 800G networking and support for up to 4TB of DDR5 memory, this system is built for enterprises, cloud providers, and research institutions running high-throughput LLM training, inference, and scientific workloads. The 5+5 redundant 3200W Titanium power supplies and high-efficiency air cooling architecture ensure stable operation even under the most demanding AI workloads.

Inference Performance for Large Language Models

Up to 11x Higher Inference Throughput on Llama 3.1 405B

The NVIDIA HGX B300 delivers significant improvements in real-time inference, reaching up to 11 times the throughput of HGX H100 systems for models such as Llama 3.1 405B. This gain is achieved using NVIDIA’s second-generation Transformer Engine with custom Blackwell Tensor Cores and optimizations from TensorRT-LLM.

Llama 3.1 405B Model Training Speedup Chart

This performance is based on per-GPU comparisons in an 8-GPU HGX configuration, using token-to-token latency of 20ms and first token latency of 5 seconds. Inference was served through disaggregated architecture with sequence lengths of 32,768 tokens (input) and 1,028 tokens (output).

Accelerated Training at Scale

4x Faster Training for Llama 3.1 405B

HGX B300 systems enable up to 4x faster training for large-scale models compared to the previous generation. This is powered by Blackwell’s second-generation Transformer Engine with FP8 precision, improved numerical formats, and high interconnect bandwidth.

With 1.8 TB/s of GPU-to-GPU bandwidth via fifth-generation NVLink, combined with InfiniBand networking and NVIDIA Magnum IO software, the HGX B300 platform supports efficient model scaling across GPU clusters for enterprise training workloads.

NVIDIA HGX B300 vs. NVIDIA HGX B200

HGX B300 is powered by Blackwell Ultra GPUs, offering higher FP4 throughput, significantly greater INT8 performance, and larger memory capacity compared to HGX B200. It also doubles the networking bandwidth (1.6 TB/s vs. 0.8 TB/s) and delivers 2x attention performance, making it better suited for large-scale inference and transformer-based models. HGX B200, while based on standard Blackwell GPUs, features higher FP64 compute, which may benefit mixed workloads requiring double-precision performance.

Specification	HGX B300	HGX B200
Form Factor	8x NVIDIA Blackwell Ultra SXM	8x NVIDIA Blackwell SXM
FP4 Tensor Core**	144 PFLOPS \| 105 PFLOPS	144 PFLOPS \| 72 PFLOPS
FP8/FP6 Tensor Core*	72 PFLOPS	72 PFLOPS
INT8 Tensor Core*	2 POPS	72 POPS
FP16/BF16 Tensor Core*	36 PFLOPS	36 PFLOPS
TF32 Tensor Core*	18 PFLOPS	18 PFLOPS
FP32	600 TFLOPS	600 TFLOPS
FP64/FP64 Tensor Core	10 TFLOPS	296 TFLOPS
Total Memory	Up to 2.3 TB	1.4 TB
NVLink	Fifth generation	Fifth generation
NVIDIA NVSwitch™	NVLink 5 Switch	NVLink 5 Switch
NVSwitch GPU-to-GPU Bandwidth	1.8 TB/s	1.8 TB/s
Total NVLink Bandwidth	14.4 TB/s	14.4 TB/s
Networking Bandwidth	1.6 TB/s	0.8 TB/s
Attention Performance	2X	1X

* With sparsity
** With sparsity | without sparsity

Scalable Deployment with AMAX

AMAX offers complete rack scale deployment for NVIDIA HGX B300 platforms, starting with the AceleMax® AXG-828U. This 8U system is purpose-built for AI training and inference at scale, combining the power of 8x Blackwell Ultra GPUs with enterprise-grade CPU, memory, and storage configurations. Every system is integrated, validated, and optimized in-house by AMAX engineers to ensure high performance and efficiency from day one. Whether you're building infrastructure for generative AI or complex simulation workloads, AMAX delivers ready-to-deploy solutions backed by expert support.

AceleMax with NVIDIA HGX B300

Engineered for Next-Generation AI and HPC

AMAX Platforms with HGX B300

AceleMax® AXG-828U

AceleMax® AXG-928U

Inference Performance for Large Language Models

Accelerated Training at Scale

NVIDIA HGX B300 vs. NVIDIA HGX B200

Scalable Deployment with AMAX

Leading GenAI Developer Trains Advanced Voice Models on AMAX Deployed NVIDIA DGX SuperPOD

GPU Solutions for Cloud Service Providers & Neoclouds

AI / LLM Solutions for Enterprise