Generate-and-Test Models for Machine Translation

75
Опубликовано 12 августа 2016, 0:54
I discuss translation as an optimization problem subject to three kinds of constraints: lexical, configurational, and constraints enforcing target-language wellformedness. Lexical constraints ensure that the lexical choices in the output are meaning-preserving; configurational constraints ensure that the relationships between source words and phrases (e.g., semantic roles and modifier-head relationships) are properly transformed in translation; and target-language wellformedness constraints ensure the grammaticality of the output. The constraint-based framework suggests a generate-and-test (discriminative) model of translation in which features sensitive to input and output structures are engineered by language and translation experts, and the feature weights are trained to maximize the conditional likelihood of a corpus of example translations. The specified features represent empirical hypotheses about what correlates (but not why) and thus encode domain-specific knowledge; the learned weights indicate to what extent these hypotheses are confirmed or refuted. To verify the usefulness of the feature-based approach, I discuss the performance two models: first, a lexical translation model evaluated by the word alignments it learns. Unlike previous unsupervised alignment models, the new model utilizes features that capture diverse lexical and alignment relationships, including morphological relatedness, orthographic similarity, and conventional co-occurrence statistics. Results from typologically diverse language pairs demonstrate that the generate-and-test model provides substantial performance benefits compared to state-of-the-art generative baselines. Second, I discuss the results of an end-to-end translation model in which lexical, configurational, and wellformedness constraints are modeled explicitly. This model is substantially more compact than state-of-the-art translation models, but still performs significantly better on languages where source-target word order differences are substantial.
автотехномузыкадетское