Multiclass Heads & Cross-Entropy
For K
classes, we compute a vector of logits
z ∈ ℝ^K
, then apply softmax(z)_k = e^{z_k} / Σ_j e^{z_j}
. Use cross-entropy loss
:
L = − Σ_k y_k log(softmax(z)_k)
where y
is a one-hot label.
For K
classes, we compute a vector of logits
z ∈ ℝ^K
, then apply softmax(z)_k = e^{z_k} / Σ_j e^{z_j}
. Use cross-entropy loss
:
L = − Σ_k y_k log(softmax(z)_k)
where y
is a one-hot label.