This is an old revision of the document!

Annealing SGD

Annealing Stochastic Gradient Descent (AGD)

Here, we propose a novel annealed gradient descent (AGD) method for deep learning. AGD optimizes a sequence of gradually improved smoother mosaic functions that approximate the original non-convex objective function according to an annealing schedule during optimization process. We present a theoretical analysis on its convergence properties and learning speed. The proposed AGD algorithm is applied to learning deep neural networks (DNN) for image recognition in MNIST and speech recognition in Switchboard.

Reference:

[1]