Recurrent neural networks form a significant proportion of data center machine learning inference. This includes workloads like machine translation, speech synthesis, speech transcription and time series analysis. The common feature of these models is unstructured sparsity.
We create specialised compute architectures to run on data center FPGA hardware, exploiting sparsity for extreme performance. FPGA hardware is the best solution for data center acceleration; it outperforms the GPU when handling unstructured sparsity and with much lower latency; it is not fixed to today’s solutions like an ASIC.
Our inference accelerator is scalable across a range of FPGA platforms, providing optimal end-to-end acceleration for multicore CPU plus FPGA data center platforms. We can process speech at 1000x real time for extreme performance, or scale back for cost and power sensitive applications. Our recurrent neural network accelerator supports long chain LSTM capability and non-linearities required of today’s machine learning models. We add new features based on cutting edge research from our in-house ML team and externally as research into these networks moves forward.
Speech is the future of interaction with the computers that now enhance our everyday lives – mobile, vehicle and our homes. Recurrent neural network processing is core to that interaction, and supporting ever more advanced machine learning models for real time interaction poses a significant computing challenge.
Our expertise in compressing recurrent neural networks allows us to do more at the edge today. We are global MLPerf benchmark owners for edge. Our compressed models are smaller and train faster. Our ability to compress networks without loss of performance allows us to target real-time applications at the edge.
Where CPU performance alone is not enough, our RNN acceleration solutions can be licensed for ASIC implementation for the ultimate in cost efficient, power sensitive acceleration. Our accelerator provides efficient low latency processing for single batch applications.
Our scalable accelerator architecture allows us to position solutions for a given application, trading performance for cost and power. We have a strong track record in delivering FPGA based accelerators for automotive applications using low to mid-range FPGA devices, and our ASIC ready design can be licensed for high volume, low power implementations or integration into larger systems.
We provide pre-configured examples of benchmark models running on our accelerator for evaluation. We work directly with our partners to support the mapping of proprietary machine learning models to the accelerator, allowing our partners to achieve the highest possible compute performance using their own IP.
Our ML team is expert in training RNNs to very high-levels of sparsity (>90%) with no loss of accuracy. We describe how to achieve this in detail in this white paper.