Mark As Completed Discussion

Inference: How It Generates Translations

At inference:

  • The decoder generates tokens one at a time.
  • They use beam search (beam size ~4 for translation), which keeps multiple candidate sequences in parallel and chooses the best-scoring.
  • They apply a length penalty so the model doesn’t unfairly prefer too-short outputs.

They also cap max output length to input_length + 50, but will stop early if it predicts an end-of-sentence token.