Environmental robustness to speech recognition

82
Опубликовано 17 августа 2016, 22:33
The talk will present some of the algorithms developed as part of my graduate work at Carnegie Mellon. Speech is the natural medium of communication for humans, and in the last decade various speech technologies like automatic speech recognition (ASR), voice response systems etc. have considerably matured. The above systems rely on the clarity of the captured speech but many of the real-world environments include noise and reverberation that mitigate the system performance. The key focus of the talk will on the ASR robustness to reverberation. We first provide a new framework to adequately and efficiently represent the problem of reverberation in the speech spectral and cepstral feature domains and later develop different dereverberation algorithms on the proposed framework. The algorithms reduce the uncertainly involved in dereverberation tasks by using speech knowledge in terms of cepstral auto-correlation, cepstral distribution, and, non-negativity and sparsity of spectral values. We demonstrate the success of our algorithms on clean-training as well as matched-training. Apart from dereverberation, we also provide two approaches for noise robustness. One of them is in terms of audio-visual feature combination with new visual features being derived from the profile-view images of a person. The second noise robustness approach is via temporal-difference operation in speech spectral domain, where via a theoretical analysis, we also predict an expected improvement in the SNR threshold shift for white-noise conditions. Finally, we combine our individual dereverberation and noise compensation approaches for a joint noise and reverberation compensation task. Website: ece.cmu.edu/~kshitizk
автотехномузыкадетское