[Collection] Large-Scale Model Training Related Papers & Repos.

less than 1 minute read

Published: March 05, 2020

Last Updated: 2020-03-05

Papers

Large Batch Optimizatio For Deep Learning: Training BERT in 76 Minutes: https://arxiv.org/pdf/1904.00962.pdf
Scaling SGD Batch Size to 32K for ImageNet Training: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-156.pdf
One weird trick for parallelizing convolutional neural networks (https://arxiv.org/pdf/1404.5997.pdf): General principles for parallellized (or distributed) training.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (https://arxiv.org/pdf/1706.02677.pdf): Useful techniques for Large Minibatch SGD.