## Math

## Machine-learning

- Sampling Natural language models and temperature in GPTx -
- Why Deep Learning Works Even Though It Shouldn’t -
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained) -
- Deep double descent: more data/bigger model can be detrimental -
- Fast weights: Integrating (not only) Hopfield into DNNs. -
- CS231n: Intuitive explanation of backprop (t: 5:40, 28:27) -