Beyond Translation: Parsing
They tested English constituency parsing (turning a sentence into a full syntax tree). This task has tricky long-range structure.
They trained a 4-layer Transformer (with d_model = 1024) on:
- Just the Penn Treebank WSJ (~40K sentences), and
- A semi-supervised setup with millions more high-confidence parse trees.
Result:
- Even with limited data, the Transformer was competitive with strong parsers.
- With semi-supervised data, it surpassed many previous approaches, showing the architecture generalizes beyond translation.

