Transformer = Encoder + Decoder (Still)
The Transformer keeps the classic encoder-decoder structure common in translation:
- The
encoderreads the input sentence (e.g. English) and produces contextual vector representations. - The
decodergenerates the output sentence (e.g. German) one symbol at a time, using what it has produced so far plus the encoded input.
But both encoder and decoder are now built out of repeated layers of multi-head self-attention plus small feed-forward networks, instead of stacks of RNN cells.



