Myrtle AI

How to Train Your ResNet

The introduction to a series of posts investigating how to train Residual networks efficiently on the CIFAR10 image classification dataset. By the fourth post, we can train to the 94% accuracy threshold of the DAWNBench competition in 79 seconds on a single V100 GPU.

In this series of posts, we investigate how to train Residual networks on the CIFAR10 image classification dataset and how to do so efficiently on a single GPU.

To track progress we report the time taken to train a network from scratch to 94% test accuracy. This benchmark comes from the recent DAWNBench competition. At the end of the competition, state-of-the-art was 341s on a single GPU and 174s on eight GPUs. By the fourth post, we will be training in under 100s on a single GPU, comfortably beating the winning multi-GPU time, with plenty of room for improvement. Code to reproduce this result is available here.

Later in the series, we try to gain insight into the training dynamics and extract lessons for other settings.


  1. Baseline: We analyse a baseline and remove a bottleneck in the data loading. (training time: 297s)
  2. Mini-batches: We increase the size of mini-batches. Things go faster and don’t break. We investigate how this can be. (training time: 256s)
  3. Regularisation: We remove a speed bump in the code and add some regularisation. Our single GPU is faster than an eight GPU competition winner. (training time: 154s)
  4. Architecture: We search for more efficient network architectures and find a 9 layer network that trains well. (training time: 79s)
  5. Hyperparameters: We develop some heuristics to aid with hyperparameter tuning.
  6. Weight decay: We investigate how weight decay controls the learning rate dynamics.
  7. Batch norm: We learn that batch normalisation protects against covariate shift after all.
  8. Bag of tricks: We uncover many ways to speed things up further when we find ourselves displaced from the top of the leaderboard. (final training time: 26s)
Scroll to Top accelerates recommendation models.

This website uses cookies to ensure you get the best experience on our website. By continuing to browse on this website, you accept the use of cookies for the above purposes.