Geometry-constrained Beamforming Network for end-to-end Farfield Sound Source Separation

1 237
12.9
Опубликовано 2 декабря 2020, 1:05
Environmental noise, reverberation and interfering speakers negatively affect the quality of the speech signal and therefore degrade the performance of many speech communication systems including automatic speech recognition systems, hearing assistive devices and mobile devices. Many deep learning solutions are available to perform source separation and reduce background noise. However, when a physical interpretation of a signal is possible or multi-channel inputs are available conventional acoustic signal processing, e.g., beamforming and direction-of-arrival estimators (DOA), tend to be more interpretable and yield reasonably good solutions in many cases. This motivates to integrate deep learning and conventional acoustic signal processing solutions to profit from each other, as has been proposed by several works. However, the integration is typically performed in a modular way where each component is optimized individually, which may lead to non-optimal solution.

In this talk, we propose a DOA-driven beamforming network (DBnet) for end-to-end source separation, i.e., the gradient is passed in an end-to-end optimization way from time-domain separated speech signals of speakers to time-domain microphone signals. For DBnet structure, we consider either recurrent neural network (RNN) or a mixture of convolutional and RNN. We analyze the performance of the DBnet for challenging noisy and reverberant conditions and benchmark it with the state-of-the-art source separation methods.

Learn more about this and other talks at Microsoft Research: microsoft.com/en-us/research/v...
автотехномузыкадетское