Multiclass Heads & Cross-Entropy
For K classes, we compute a vector of logits z ∈ ℝ^K, then apply softmax(z)_k = e^{z_k} / Σ_j e^{z_j}. Use cross-entropy loss:
L = − Σ_k y_k log(softmax(z)_k) where y is a one-hot label.


For K classes, we compute a vector of logits z ∈ ℝ^K, then apply softmax(z)_k = e^{z_k} / Σ_j e^{z_j}. Use cross-entropy loss:
L = − Σ_k y_k log(softmax(z)_k) where y is a one-hot label.
