The Math You Really Need
Here's the mathematical terms at play:
Weights(W) andbiases(b): the parameters we learn.Activation functionφ: adds non-linearity (e.g.,ReLU(x) = max(0,x)).Loss: scalar measuring error, e.g.,MSEfor regression,cross-entropyfor classification.Gradient: vector of partial derivatives that tells us how to tweak parameters to reduce loss.Gradient descent: update ruleθ ← θ − η ∇θ Lwithlearning rateη.


