Efficient Hyperscale Inference

To meet the huge increase in demand for AI, technologies must scale efficiently in order to meet strict latency and performance requirements while keeping the total cost of ownership and total power consumption low.

Inefficient solutions are creating challenges today:

  • Development teams are reducing model sizes in order to meet strict latency and performance requirements. This reduces the accuracy and therefore the quality of the service, which can directly impact revenue.
  • The available hardware in a business’s infrastructure is being underutilized due to inefficiencies and therefore they are required to significantly over-provision, leading to increased cost.
Maximum throughput with ultra-low latency

Achieving Efficient Inference

We optimize ML inference for efficient hyperscale deployment using our patented MAU Accelerator™ technologies and proven design techniques such as:

• Heterogeneous compute employing algorithm, hardware & software co-design
• Quantization to suit the targeted hardware platform
• Exploitation of sparsity in the model

Combined, these can:

• Reduce latency by more than 20x
• Reduce the number of compute and memory operations by up to 95%
• Reduce memory storage and bandwidth requirements by more than 10x
• Reduce memory access energy consumption by more than 100x

while having little to no impact on the accuracy of the final model.

These techniques and the resulting compelling benefits are described in this white paper

MAU Accelerator

The MAU Accelerator™ can accelerate RNNs and other DNNs with sparse layers, simultaneously achieving maximum throughput and ultra-low latency for hyperscale inference in data center applications. This enables higher quality models to be deployed, providing better services and customer experiences, while significant savings can be made in infrastructure costs and energy consumption.

Key Benefits

Deterministic low tail latency
Improved latency-bounded throughput
Reduced infrastructure costs
Enables use of higher quality models under a given latency bound
Reduced energy consumption

Applications

Speech transcription
Natural language processing
Speech synthesis
Time series analysis
Payment & trading fraud detection
Recommendation systems

Rapid & Easy Deployment

The MAU Accelerator runs on data center servers enhanced by accelerator cards from Intel, Xilinx and BittWare/Molex. These accelerator cards are available today, both in the cloud and for on-premise data centers, facilitating rapid implementation at scale. Neural network models created using popular ONNX supported frameworks such as TensorFlow, PyTorch or MXNet can easily be deployed on the MAU Accelerator.

Application Example 1: Speech Synthesis

The MAU Accelerator can be used to deliver high fidelity speech synthesis at very high throughput, running WaveNet on a BittWare 520NX Accelerator Card.

  • Best in class vocoder model for near-human-quality speech synthesis
  • Low, deterministic tail latency
  • 16x throughput advantage over a GPU solution
  • Significant CapEx and energy savings

For more information on speech synthesis, please see our blog, our demo video and our White Paper

Application Example 2: Automatic Speech Recognition

The MAU Accelerator can be used to achieve very high throughput at ultra-low latency, running speech transcription on an Intel PAC D5005, AMD Alveo U250 or Achronix VectorPath Accelerator Card.

  • 3x higher performance than a GPU-only solution
  • 2.1x higher performance per watt than a GPU solution
  • 29x lower latency than a GPU solution

For more information on speech transcription, please see our Achronix White Paper, Intel White Paper and Intel Solution Brief

Application Example 3: Natural Language Processing

The MAU Accelerator can be used to significantly reduce the server infrastructure required for an NLP workload when run on an Intel PAC D5005 or AMD Alveo U250 Accelerator Card.

  • 2.2x lower cost than a CPU-only solution
  • 7.7x smaller carbon footprint than a CPU-only solution

For more information on NLP, please contact us at hello@myrtle.ai

Featured documents & videos

Trusted By