Jul 9, 2025 3 min read

AceleMax with NVIDIA HGX B300

AceleMax with NVIDIA HGX B300
Table of Contents

Engineered for Next-Generation AI and HPC

The NVIDIA HGX B300 platform is built for the next wave of accelerated computing. Powered by Blackwell Ultra GPUs and high-speed NVLink interconnects, HGX B300 enables massive generative AI models, advanced data analytics, and high-performance simulations at scale. With a fully integrated software stack and support for up to 800 Gb/s networking, it delivers the infrastructure backbone needed to meet growing compute demands across enterprise and cloud environments.

💡
Ready to scale with HGX B300? Connect with AMAX to configure the right solution for your workload.

AMAX Platforms with HGX B300

The AceleMax® AXG-828U is AMAX’s 8U platform engineered to support the full capabilities of the NVIDIA HGX B300. Designed for demanding AI training, inference, and simulation workloads, it pairs the performance of 8x Blackwell Ultra GPUs with robust system infrastructure built around dual Intel® Xeon® 6 processors. This platform is optimized for high-throughput, large-memory applications in enterprise and industrial environments.

AceleMax® AXG-828U

  • Intel® Xeon® 6 Scalable processors
  • HGX B300 8-GPU with NVSwitch
  • Rear side 8 PCIe LP and 2 FHHL slots
  • Up to 12x 2.5” drive bays
  • On-board dual 1G Base-T
Request a Quote

Inference Performance for Large Language Models

Up to 11x Higher Inference Throughput on Llama 3.1 405B

The NVIDIA HGX B300 delivers significant improvements in real-time inference, reaching up to 11 times the throughput of HGX H100 systems for models such as Llama 3.1 405B. This gain is achieved using NVIDIA’s second-generation Transformer Engine with custom Blackwell Tensor Cores and optimizations from TensorRT-LLM.

Llama 3.1 405B Model Training Speedup Chart

This performance is based on per-GPU comparisons in an 8-GPU HGX configuration, using token-to-token latency of 20ms and first token latency of 5 seconds. Inference was served through disaggregated architecture with sequence lengths of 32,768 tokens (input) and 1,028 tokens (output).

Accelerated Training at Scale

4x Faster Training for Llama 3.1 405B

HGX B300 systems enable up to 4x faster training for large-scale models compared to the previous generation. This is powered by Blackwell’s second-generation Transformer Engine with FP8 precision, improved numerical formats, and high interconnect bandwidth.

Llama 3.1 405B Model Training Speedup Chart

With 1.8 TB/s of GPU-to-GPU bandwidth via fifth-generation NVLink, combined with InfiniBand networking and NVIDIA Magnum IO software, the HGX B300 platform supports efficient model scaling across GPU clusters for enterprise training workloads.

NVIDIA HGX B300 vs. NVIDIA HGX B200

HGX B300 is powered by Blackwell Ultra GPUs, offering higher FP4 throughput, significantly greater INT8 performance, and larger memory capacity compared to HGX B200. It also doubles the networking bandwidth (1.6 TB/s vs. 0.8 TB/s) and delivers 2x attention performance, making it better suited for large-scale inference and transformer-based models. HGX B200, while based on standard Blackwell GPUs, features higher FP64 compute, which may benefit mixed workloads requiring double-precision performance.

Specification HGX B300 HGX B200
Form Factor 8x NVIDIA Blackwell Ultra SXM 8x NVIDIA Blackwell SXM
FP4 Tensor Core** 144 PFLOPS | 105 PFLOPS 144 PFLOPS | 72 PFLOPS
FP8/FP6 Tensor Core* 72 PFLOPS 72 PFLOPS
INT8 Tensor Core* 2 POPS 72 POPS
FP16/BF16 Tensor Core* 36 PFLOPS 36 PFLOPS
TF32 Tensor Core* 18 PFLOPS 18 PFLOPS
FP32 600 TFLOPS 600 TFLOPS
FP64/FP64 Tensor Core 10 TFLOPS 296 TFLOPS
Total Memory Up to 2.3 TB 1.4 TB
NVLink Fifth generation Fifth generation
NVIDIA NVSwitch™ NVLink 5 Switch NVLink 5 Switch
NVSwitch GPU-to-GPU Bandwidth 1.8 TB/s 1.8 TB/s
Total NVLink Bandwidth 14.4 TB/s 14.4 TB/s
Networking Bandwidth 1.6 TB/s 0.8 TB/s
Attention Performance 2X 1X

* With sparsity
** With sparsity | without sparsity

Scalable Deployment with AMAX

AMAX offers complete rack scale deployment for NVIDIA HGX B300 platforms, starting with the AceleMax® AXG-828U. This 8U system is purpose-built for AI training and inference at scale, combining the power of 8x Blackwell Ultra GPUs with enterprise-grade CPU, memory, and storage configurations. Every system is integrated, validated, and optimized in-house by AMAX engineers to ensure high performance and efficiency from day one. Whether you're building infrastructure for generative AI or complex simulation workloads, AMAX delivers ready-to-deploy solutions backed by expert support.