Mark As Completed Discussion

Transformer = Encoder + Decoder (Still)

The Transformer keeps the classic encoder-decoder structure common in translation:

  • The encoder reads the input sentence (e.g. English) and produces contextual vector representations.
  • The decoder generates the output sentence (e.g. German) one symbol at a time, using what it has produced so far plus the encoded input.

But both encoder and decoder are now built out of repeated layers of multi-head self-attention plus small feed-forward networks, instead of stacks of RNN cells.

Transformer = Encoder + Decoder (Still)