We propose a novel DNN-based acoustic modeling framework for speech
recognition, where the posterior probabilities of HMM states are
computed from multiple DNNs (mDNN), instead of a single large
DNN, for the purpose of parallel training towards faster turnaround.
In the proposed mDNN method all tied HMM states are
first grouped into several disjoint clusters based on data-driven
methods. Next, several hierarchically structured DNNs are trained
separately in parallel for these clusters using multiple computing
units (e.g. GPUs).
Reference:
[1] P. Zhou, H. Jiang, L. Dai, Y. Hu and Q. Liu, “A State-Clustering based Multiple Deep Neural Networks Modelling Approach for Speech Recognition,” IEEE/ACM Trans. on Audio, Speech and Language Processing, pp.631-642, Vol. 23, No. 4, April 2015.
[2] P. Zhou, L. Dai, H. Jiang, “Sequence Training of Multiple Deep Neural Networks for Better Performance and Faster Training Speed,” Proc. of of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'14), Florence, Italy, May 2014.
[3] P. Zhou, C. Liu, Q. Liu, L. Dai and H. Jiang, “A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition,” Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'13), Vancouver, Canada, 2013.