Fueling the Next Wave of AI-Powered Services
AI is constantly challenged to keep up with exploding volumes of data yet still deliver fast responses. Meet the challenges head on with AMAX’s deep learning inference server solutions, based on NVIDIA® Tesla® GPUs and the NVIDIA TensorRT™ platform, delivering the fastest, most efficient data center inference platforms on the market.
Utra compact server excellent for appliances with space restrictions. Supports up to four Tesla T4 GPUs in a 1U chassis, with two Intel Xeon Scalable processors.
NVIDIA TensorRT is a high-performance neural-network inference platform that can speed up applications such as recommenders, speech recognition, and machine translation by 40X compared to CPU-only architectures. TensorRT optimizes neural network models, calibrates for lower precision with high accuracy, and deploys the models to production environments in enterprise and hyperscale data centers. NVIDIA TensorRT optimizer and runtime engines deliver high throughout at low latency for applications such as recommender systems, speech recognition, and machine translation. With TensorRT, models trained in 32-bit or 16-bit data can be optimized for INT8 operations on Tesla T4 and P4, or FP16 on Tesla V100. NVIDIA DeepStream SDK taps into the power of Tesla GPUs to simultaneously decode and analyze video streams.
Maximize GPU Utilization
NVIDIA TensorRT inference servers deliver high throughput data center inference and helps you get the most from your GPUs. Delivered in a ready-to-run container, NVIDIA TensorRT inference servers are a microservice that lets you perform inference via an API for any combination of models from Caffe2, NVIDIA TensorRT, TensorFlow, and any framework that supports the ONNX standard on one or more GPUs.