Distant Speech Recognition: No Black Boxes Allowed

390

Microsoft Research330 тыс

Следующее

06.09.16 – 851:04:47

Warp Processing -- Dramatically Speeding up Programs by Dynamically Moving them to FPGAs

Популярные

17 дней – 1 34048:26

AI for Business Transformation: The Business of Data

32 дня – 3949:39

Fostering appropriate reliance on AI

Опубликовано 6 сентября 2016, 17:42

A complete system for distant speech recognition (DSR) typically consists of several distinct components. Among these are: o An array of microphone for far-field sound capture; o An algorithm for tracking the positions of the active speaker or speakers; o A beamforming algorithm for focusing on the desired speaker and suppressing noise, reverberation, and competing speech from other speakers; o A recognition engine to extract the most likely hypothesis from the output of the beamformer; o A speaker adaptation component for adapting to the characteristics of a given speaker as well as to channel effects; o Postfiltering to further enhance the beamformed output. Moreover, several of these components are comprised of one or more subcomponents. While it is tempting to isolate and optimize each component individually, experience has proven that such an approach cannot lead to optimal performance. In this talk, we will discuss several examples of the interactions between the individual components of a DSR system. In addition, we will describe the synergies that become possible as soon as each component is no longer treated as a ``black box''. To wit, instead of treating each component as having solely an input and an output, it is necessary to peal back the lid look inside. It is only then that it becomes apparent how the individual components of a DSR system can be viewed not as separate entities, but as the various organs of a complete body, and how optimal performance of such a system can be obtained. Joint work with: Kenichi Kumatani, Barbara Rauch, Friedrich Faubel, Matthias Wolfel, and Dietrich~Klakow

Свежие видео