[Collection] Large-Scale Model Training Related Papers & Repos.
Published:
Last Updated: 2020-03-05
Papers
- Large Batch Optimizatio For Deep Learning: Training BERT in 76 Minutes: https://arxiv.org/pdf/1904.00962.pdf
- Scaling SGD Batch Size to 32K for ImageNet Training: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-156.pdf
- One weird trick for parallelizing convolutional neural networks (https://arxiv.org/pdf/1404.5997.pdf): General principles for parallellized (or distributed) training.
- Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (https://arxiv.org/pdf/1706.02677.pdf): Useful techniques for Large Minibatch SGD.