Multi-rate neural networks for efficient acoustic modeling

Опубликовано 13 июня 2016, 19:48
In sequence recognition, the problem of long-span dependency in input sequences is typically tackled using recurrent neural network architectures, and robustness to sequential distortions is achieved using training data representative of a variety of these distortions. However, both these solutions substantially increase the training time. Thus low computation complexity during training is critical for acoustic modeling. This talk proposes the use of multi-rate neural network architectures to satisfy the design requirement of computational efficiency. In these architectures the network is partitioned into groups of units, operating at various sampling rates. As the network evaluates certain groups only once every few time steps, the computational cost is reduced. This talk will focus on the multi-rate feed-forward convolutional architecture. It will present results on several large vocabulary continuous speech recognition (LVCSR) tasks with training data ranging from 3 to 1800 hours to show the effectiveness of this architecture in efficiently learning wider temporal dependencies in both small and large data scenarios. Further it will discuss the use of this architecture for robust acoustic modeling in far-field environments. This model was shown to provide state-of-art results in the ASpIRE far-field recognition challenge. This talk will also discuss some preliminary results of multi-rate recurrent neural network based acoustic models.