The MAU Accelerator is a low latency inference accelerator for data center machine learning workloads. It achieves both deterministic low tail latency and high throughput, without trading off one against the other. This enables higher quality models to be deployed, providing better services and customer experiences, while significant savings can be made in infrastructure costs and energy consumption.
Low latency inference acceleration for real-time, memory-bounded workloads including:
The MAU Accelerator runs on data center servers enhanced by accelerator cards from Intel and Xilinx. These accelerator cards are available today, both in the cloud and for on-premise data centers, facilitating rapid implementation at scale. Neural network models created using popular ONNX supported frameworks such as TensorFlow, PyTorch or MXNet can easily be deployed on the MAU Accelerator, which is ONNX Runtime supported.