To meet the huge increase in demand for AI, technologies must scale efficiently in order to meet strict latency and performance requirements while keeping the total cost of ownership and total power consumption low.
Inefficient solutions are creating challenges today:
We optimize ML inference for efficient hyperscale deployment using our patented MAU Accelerator™ technologies and proven design techniques such as:
• Heterogeneous compute employing algorithm, hardware & software co-design
• Quantization to suit the targeted hardware platform
• Exploitation of sparsity in the model
Combined, these can:
• Reduce latency by more than 20x
• Reduce the number of compute and memory operations by up to 95%
• Reduce memory storage and bandwidth requirements by more than 10x
• Reduce memory access energy consumption by more than 100x
while having little to no impact on the accuracy of the final model.
These techniques and the resulting compelling benefits are described in this white paper
The MAU Accelerator™ can accelerate RNNs and other DNNs with sparse layers, simultaneously achieving maximum throughput and ultra-low latency for hyperscale inference in data center applications. This enables higher quality models to be deployed, providing better services and customer experiences, while significant savings can be made in infrastructure costs and energy consumption.
Deterministic low tail latency
Improved latency-bounded throughput
Reduced infrastructure costs
Enables use of higher quality models under a given latency bound
Reduced energy consumption
Natural language processing
Time series analysis
Payment & trading fraud detection
The MAU Accelerator runs on data center servers enhanced by accelerator cards from Intel, Xilinx and BittWare/Molex. These accelerator cards are available today, both in the cloud and for on-premise data centers, facilitating rapid implementation at scale. Neural network models created using popular ONNX supported frameworks such as TensorFlow, PyTorch or MXNet can easily be deployed on the MAU Accelerator.
The MAU Accelerator can be used to deliver high fidelity speech synthesis at very high throughput, running WaveNet on a BittWare 520NX Accelerator Card.
The MAU Accelerator can be used to achieve very high throughput at ultra-low latency, running speech transcription on an Intel PAC D5005, AMD Alveo U250 or Achronix VectorPath Accelerator Card.
The MAU Accelerator can be used to significantly reduce the server infrastructure required for an NLP workload when run on an Intel PAC D5005 or AMD Alveo U250 Accelerator Card.
For more information on NLP, please contact us at firstname.lastname@example.org