Distributed Training and Inference

Here are my notes on training large ML models and running inference on them. I will be spending some time trying to understand these topics and will post some writeups on each of the topics, mostly to force me to think more deeply about the subject. These notes have not been proofread by anyone so may potentially contain errors.