In this white paper we survey a wide variety of model compression techniques that are amenable to deployment on a range of hardware platforms. In particular, we compare different model sparsity methods and levels, and seven widely used precisions as targets for quantization.
When we were looking for a great application to run on the Intel Stratix 10 NX FPGA, we turned our attention to WaveNet, a neural network model that we know to be extremely difficult to implement on existing compute platforms, see our previous blog post https://myrtle.ai/learn/wavenet/. Two years on and armed with new
In the final post of the series we come full circle, speeding up our single-GPU training implementation to take on a field of multi-GPU competitors. We roll-out a bag of standard and not-so-standard tricks to reduce training time to 34s, or 26s with test-time augmentation.