Results: Translation Quality and Cost
On English→German:
Transformer (big)reached BLEU ≈ 28.4, beating previous state-of-the-art (including ensembles) by more than 2 BLEU.- It became the new SOTA with a single model.
On English→French:
Transformer (big)hit BLEU ≈ 41.8 (or ~41.0 depending on table presentation), also state-of-the-art for single models.- Training cost (in FLOPs and wall-clock) was dramatically lower than older systems like GNMT or convolutional seq2seq.
Key story: higher quality, drastically less compute time to reach that quality.

