Here, we propose a novel annealed gradient descent (AGD) method for
deep learning. AGD optimizes a sequence of gradually improved smoother mosaic
functions that approximate the original non-convex objective function according
to an annealing schedule during optimization process. We present a theoretical
analysis on its convergence properties and learning speed. The proposed AGD
algorithm is applied to learning deep neural networks (DNN) for image recognition
in MNIST and speech recognition in Switchboard.
Reference:
[1] Hengyue Pan, Hui Jiang, “Annealed Gradient Descent for Deep Learning”, Proc. of 31th Conference on Uncertainty in Artificial Intelligence (UAI 2015), July 2015. ( here)