Mark As Completed Discussion

Self-Attention: The Core Operation

self-attention (also called intra-attention) relates every position in a sequence to every other position in that same sequence, to compute a new representation for each position.

Intuition:

  • For each word, ask: which other words in this sentence matter for me?
  • Weight them.
  • Blend their info into my updated representation.

This lets the model capture word-word relationships without caring how far apart they are in the sentence.