Myrtle AI


We care about understanding the machine learning models we produce and making them as small and accurate as possible. We like to give our understanding back to the community by explaining techniques we’ve developed on tiny datasets.

The open source notebooks that form part of this series officially held the #1 spots of two Stanford machine learning league tables for over six months until April 2019. Today, all models that currently rank above those notebooks are derivations of this original work.

WaveNet Revisited : Excels on New AI Optimized FPGAs

When we were looking for a great application to run on the Intel Stratix 10 NX FPGA, we turned our attention to WaveNet, a neural network model that we know to be extremely difficult to implement on existing compute platforms, see our previous blog post  Two years on and armed with new AI-optimised FPGA

How to Train Your ResNet 8: Bag of Tricks

In the final post of the series we come full circle, speeding up our single-GPU training implementation to take on a field of multi-GPU competitors. We roll-out a bag of standard and not-so-standard tricks to reduce training time to 34s, or 26s with test-time augmentation.

How to Train Your ResNet 7: Batch Norm

We investigate how batch normalisation helps optimisation (spoiler: it involves internal covariate shift…). Along the way we meet some bad initialisations, degenerate networks and spiky Hessians.

How to Train Your ResNet 4: Architecture

In which we try out some different networks and discover that we’ve been working too hard So far, we’ve been training a fixed network architecture, taken from the fastest single-GPU DAWNBench entry on CIFAR10. With some simple changes, we’ve reduced the time taken to reach 94% test accuracy from 341s to 154s. Today we’re going

How to Train Your ResNet

The introduction to a series of posts investigating how to train Residual networks efficiently on the CIFAR10 image classification dataset. By the fourth post, we can train to the 94% accuracy threshold of the DAWNBench competition in 79 seconds on a single V100 GPU.


Are GPUs a good target for speech synthesis? Is Baidu’s GPU implementation of WaveNet the best you can do on a GPU? We run some tests, discuss latency and find out


One of ten global mlperf benchmark owners

Scroll to Top

This website uses cookies to ensure you get the best experience on our website. By continuing to browse on this website, you accept the use of cookies for the above purposes.