Self-Attention: The Core Operation
self-attention (also called intra-attention) relates every position in a sequence to every other position in that same sequence, to compute a new representation for each position.
Intuition:
- For each word, ask: which other words in this sentence matter for me?
- Weight them.
- Blend their info into my updated representation.
This lets the model capture word-word relationships without caring how far apart they are in the sentence.

