Autodiff: Why Gradients Matter
Automatic differentiation (autodiff) uses the chain rule to compute how changing each weight will change the loss. Forward: compute predictions. Backward: propagate ∂loss/∂node from outputs to inputs, accumulating gradients.
Without gradients, your model can’t learn. With them, an optimizer updates variables:
w ← w − η * ∂L/∂w where η is the learning rate.



