Redesiging Neural Architectures for Sequence to Sequence Learning

1 486
14.2
Следующее
Популярные
Опубликовано 26 марта 2019, 17:37
The Encoder-Decoder model with soft-attention is now the defacto standard for sequence to sequence learning, having enjoyed early success in tasks like translation, error correction, and speech recognition. In this talk, I will present a critique of various aspect of this popular model, including its soft attention mechanism, local loss function, and sequential decoding. I will present a new Posterior Attention Network for a more transparent joint attention that provides easy gains on several translation and morphological inflection tasks. Next, I will expose a little known problem of mis-calibration in state of the art neural machine translation (NMT) systems. For structured outputs like in NMT, calibration is important not just for reliable confidence with predictions, but also for proper functioning of beam-search inference. I will discuss reasons for mis-calibration and some fixes. Finally, I will summarize recent research efforts towards parallel decoding of long sequences.

See more at microsoft.com/en-us/research/v...
Случайные видео
225 дней – 661 4674:20:29
Piracy Is Over Party - WAN Show April 12, 2024
12.02.12 – 10 8552:03
Logan's in Kentucky
автотехномузыкадетское