Chat with us, powered by LiveChat

Deep Learning Inference Platforms

LEARN MORE OR REQUEST A QUOTE

Fueling the Next Wave of AI-Powered Services

AI is constantly challenged to keep up with exploding volumes of data yet still deliver fast responses. Meet the challenges head on with AMAX’s deep learning inference server solutions, based on NVIDIA® Tesla® GPUs and the NVIDIA TensorRT™ platform, delivering the fastest, most efficient data center inference platforms on the market.

The Most Advanced AI Inference Platform

Powered by NVIDIA Turing Tensor Cores, the NVIDIA T4 delivers breakthrough performance for deep learning training in FP32, FP16, INT8, and INT4 precisions. With 130 TeraOPS (TOPS) of INT8 and 260TOPS of INT4, the T4 has the world’s highest inference efficiency, up to 40X compared to CPUs with just 60 percent of the power consumption. Using just 70 watts, it’s the ideal solution for scale-out servers at the edge.

BrainMax™ DL-E410T

Excellent for low latency video and image inferencing, supports up to ten Tesla T4 GPUs in a 4U Chasis with dual Intel Xeon Scalable processors.

AMAX DL-E410T server platform
AMAX DL-E410T server platform

BrainMax™ DL-E410T

Excellent for low latency video and image inferencing, supports up to ten Tesla T4 GPUs in a 4U Chasis with dual Intel Xeon Scalable processors.

AMAX DL-E28T Server

BrainMax™ DL-E28T

Supports up to eight Tesla T4 GPUs in a 2U chassis, with two Intel Xeon Scalable processors. Excellent for low- latency video and image inference.

BrainMax™ AG-28T

Supports up to eight Tesla T4 GPUs in a 2U chassis, with one AMD EPYC processor. Ideal for low latency video and image inference.

AMAX AG-28T
AMAX AG-28T

BrainMax™ AG-28T

Supports up to eight Tesla T4 GPUs in a 2U chassis, with one AMD EPYC processor. Ideal for low latency video and image inference.

AMAX DL-E24T compact server

BrainMax™ DL-E24T

Supports up to four Tesla T4 GPUs in a 2U chassis, with dual Intel Xeon Scalable processors. Scale-out deployment with minimal space, ideal for low- latency speech, language, and image inference

BrainMax™ DL-E14T

Utra compact server excellent for appliances with space restrictions. Supports up to four Tesla T4 GPUs in a 1U chassis, with two Intel Xeon Scalable processors.

AMAX DL-E14T ultra compact server
AMAX DL-E14T ultra compact server

BrainMax™ DL-E14T

Utra compact server excellent for appliances with space restrictions. Supports up to four Tesla T4 GPUs in a 1U chassis, with two Intel Xeon Scalable processors.

NVIDIA TensorRT

NVIDIA TensorRT is a high-performance neural-network inference platform that can speed up applications such as recommenders, speech recognition, and machine translation by 40X compared to CPU-only architectures. TensorRT optimizes neural network models, calibrates for lower precision with high accuracy, and deploys the models to production environments in enterprise and hyperscale data centers. NVIDIA TensorRT optimizer and runtime engines deliver high throughout at low latency for applications such as recommender systems, speech recognition, and machine translation. With TensorRT, models trained in 32-bit or 16-bit data can be optimized for INT8 operations on Tesla T4 and P4, or FP16 on Tesla V100. NVIDIA DeepStream SDK taps into the power of Tesla GPUs to simultaneously decode and analyze video streams.

Maximize GPU Utilization

NVIDIA TensorRT inference servers deliver high throughput data center inference and helps you get the most from your GPUs. Delivered in a ready-to-run container, NVIDIA TensorRT inference servers are a microservice that lets you perform inference via an API for any combination of models from Caffe2, NVIDIA TensorRT, TensorFlow, and any framework that supports the ONNX standard on one or more GPUs.

LEARN MORE OR REQUEST A QUOTE