Learn
We care about understanding the machine learning models we produce and making them as small and accurate as possible. We like to give our understanding back to the community by explaining techniques we’ve developed on tiny datasets.
The open source notebooks that form part of this series officially held the #1 spots of two Stanford machine learning league tables for over six months until April 2019. Today, all models that currently rank above those notebooks are derivations of this original work.


Deep Learning Model Compression Techniques on the WaveNet Vocoder
In this white paper we survey a wide variety of model compression techniques that are amenable to deployment on a range of hardware platforms. In particular, we compare different model sparsity methods and levels, and seven widely used precisions as targets for quantization.

WaveNet Revisited : Excels on New AI Optimized FPGAs
When we were looking for a great application to run on the Intel Stratix 10 NX FPGA, we turned our attention to WaveNet, a neural network model that we know to be extremely difficult to implement on existing compute platforms, see our previous blog post https://myrtle.ai/learn/wavenet/. Two years on and armed with new AI-optimised FPGA

Implementing WaveNet using Intel Stratix 10 NX FPGA for Real-time Speech Synthesis
We jointly published this white paper with Intel, describing the way in which a WaveNet vocoder model can be compressed to optimize the use of the AI Tensor blocks and HBM memory on the Intel® Stratix® 10 NX FPGA.

Exploiting Unstructured Sparsity on Next-Generation Datacenter Hardware
This white paper explains how we exploited the sparsity inherent in typical RNNs and used quantisation to compress an ASR model by as much as 95% with minimal loss of accuracy.

How to Train Your ResNet 8: Bag of Tricks
In the final post of the series we come full circle, speeding up our single-GPU training implementation to take on a field of multi-GPU competitors. We roll-out a bag of standard and not-so-standard tricks to reduce training time to 34s, or 26s with test-time augmentation.

How to Train Your ResNet 7: Batch Norm
We investigate how batch normalisation helps optimisation (spoiler: it involves internal covariate shift…). Along the way we meet some bad initialisations, degenerate networks and spiky Hessians.

How to Train Your ResNet 6: Weight Decay
We learn more about the influence of weight decay on training and uncover an unexpected relation to LARS.

How to Train Your ResNet 5: Hyperparameters
We develop some heuristics for hyperparameter tuning.

How to Train Your ResNet 4: Architecture
In which we try out some different networks and discover that we’ve been working too hard So far, we’ve been training a fixed network architecture, taken from the fastest single-GPU DAWNBench entry on CIFAR10. With some simple changes, we’ve reduced the time taken to reach 94% test accuracy from 341s to 154s. Today we’re going

How to Train Your ResNet 3: Regularisation
We identify a performance bottleneck and add regularisation to reduce the training time further to 154s.

How to Train Your ResNet 2: Mini-batches
We investigate the effects of mini-batch size on training and use larger batches to reduce training time to 256s.

How to Train Your ResNet 1: Baseline
We establish a baseline for training a Residual network to 94% test accuracy on CIFAR10, which takes 297s on a single V100 GPU.

How to Train Your ResNet
The introduction to a series of posts investigating how to train Residual networks efficiently on the CIFAR10 image classification dataset. By the fourth post, we can train to the 94% accuracy threshold of the DAWNBench competition in 79 seconds on a single V100 GPU.

WaveNet
Are GPUs a good target for speech synthesis? Is Baidu’s GPU implementation of WaveNet the best you can do on a GPU? We run some tests, discuss latency and find out
TRUSTED BY





