Microsoft Research334 тыс
Опубликовано 7 сентября 2016, 16:42
Bilingual word alignment, the task of finding word-to-word connections between a sentence and its translation, is an important  part of knowledge acquisition for statistical machine translation. An Inversion Transduction Grammar, or ITG, provides an efficient algorithm to align a bilingual sentence pair, by simultaneously parsing the two sentences. However, the simple bracketing grammar usually employed in ITG parsing has no linguistic content. We investigate two methods to inform the ITG parser with the phrase structure from a linguistically-motivated dependency tree: a phrasal cohesion constraint on a simple ITG aligner, and a complete discriminative ITG parser that uses cohesion with the dependency tree as a soft constraint. This final system not only recovers links lost by the hard constraint, but also improves link recall beyond a completely unconstrained system.