Microsoft Research335 тыс
Опубликовано 21 июня 2016, 21:24
Humans commonly understand sequential events by giving importance to what they expect rather than exclusively to what they actually observe. The ability to fill in the blanks, useful in speech recognition to favor words that make sense in the current context, is particularly important in noisy conditions. In this talk, we present a probabilistic model of symbolic sequences based on a recurrent neural network that can serve as a powerful prior during information retrieval. We show that conditional distribution estimators can describe much more realistic output distributions, and we devise inference procedures to efficiently search for the most plausible annotations when the observations are partially destroyed or distorted. We demonstrate improvements in the state of the art in polyphonic music transcription, chord recognition, speech recognition and audio source separation.
Свежие видео