This is an old revision of the document!

CNNs for Speech

CNNs for Speech Processing

We propose to use convolutional neural networks (CNNs) for speech recognition, where convolution is applied in the frequency domain to normalize speech variations. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations.