multi-DNN for Speech

multi-DNN Acoustic models for Speech Recognition



We propose a novel DNN-based acoustic modeling framework for speech recognition, where the posterior probabilities of HMM states are computed from multiple DNNs (mDNN), instead of a single large DNN, for the purpose of parallel training towards faster turnaround. In the proposed mDNN method all tied HMM states are first grouped into several disjoint clusters based on data-driven methods. Next, several hierarchically structured DNNs are trained separately in parallel for these clusters using multiple computing units (e.g. GPUs).


Reference:

[1] P. Zhou, H. Jiang, L. Dai, Y. Hu and Q. Liu, “A State-Clustering based Multiple Deep Neural Networks Modelling Approach for Speech Recognition,” IEEE/ACM Trans. on Audio, Speech and Language Processing, pp.631-642, Vol. 23, No. 4, April 2015.

[2] P. Zhou, L. Dai, H. Jiang, “Sequence Training of Multiple Deep Neural Networks for Better Performance and Faster Training Speed,” Proc. of of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'14), Florence, Italy, May 2014.

[3] P. Zhou, C. Liu, Q. Liu, L. Dai and H. Jiang, “A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition,” Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'13), Vancouver, Canada, 2013.