Mark As Completed Discussion

Results: Translation Quality and Cost

On English→German:

  • Transformer (big) reached BLEU ≈ 28.4, beating previous state-of-the-art (including ensembles) by more than 2 BLEU.
  • It became the new SOTA with a single model.

On English→French:

  • Transformer (big) hit BLEU ≈ 41.8 (or ~41.0 depending on table presentation), also state-of-the-art for single models.
  • Training cost (in FLOPs and wall-clock) was dramatically lower than older systems like GNMT or convolutional seq2seq.

Key story: higher quality, drastically less compute time to reach that quality.