Backpropagation often feels like a rite of passage in machine learning, usually involving hours of manual calculus and a high probability of errors. But in modern frameworks like PyTorch or TensorFlow, we rarely touch the math ourselves.

How does that magic actually work? I recently realized that a lecture I gave on this topic could be a helpful resource for anyone trying to peek under the hood, and hence, I’m repurposing my lecture into a blog.

When I was a teaching assistant for the Machine Learning course at IIT Gandhinagar, I noticed students often struggled with the gap between the theory of gradients and the actual implementation in code. To bridge that gap, I developed a lecture focused on Automatic Differentiation and built a toy neural network library, nnlib, to show how it’s done from scratch.

Here is the code for the toy neural network library:


Here is the video recoding of my lecture: