Positional Encoding (Because There’s No Recurrence)
Transformers have no built-in notion of order. To fix that, they inject a positional encoding into each token embedding.
They use fixed sinusoidal encodings:
- Each dimension of the position encoding is a sine or cosine with a different frequency.
- These encodings are added to the token embeddings at the bottom of the encoder/decoder.
Why sinusoids?
- They let the model infer both absolute and relative positions.
- They can, in principle, generalize to longer sequences than seen in training, because the pattern is continuous.



