Abstract
Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some situations. Recently, Newton methods have been investigated as an alternative optimization technique, but most existing studies consider only fully connected feedforward neural networks. These studies do not investigate some more commonly used networks such as Convolutional Neural Networks (CNN). One reason is that Newton methods for CNN involve complicated operations, and so far no works have conducted a thorough investigation. In this work, we give details of all building blocks, including the evaluation of function, gradient, Jacobian, and Gauss-Newton matrix-vector products. These basic components are very important not only for practical implementation but also for developing variants of Newton methods for CNN. We show that an efficient
MATLAB
implementation can be done in just several hundred lines of code. Preliminary experiments indicate that Newton methods are less sensitive to parameters than the stochastic gradient approach.
Topics

No keywords indexed for this article. Browse by subject →

References
32
[3]
Dean Jeffrey "Large scale distributed deep networks" Advances in Neural Information Processing Systems (NIPS) (2012)
[5]
Grosse Roger (2016)
[6]
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren et al.

2015 IEEE International Conference on Computer Vis... 10.1109/iccv.2015.123
[7]
He Xi (2016)
[8]
Methods of conjugate gradients for solving linear systems

M.R. Hestenes, E. Stiefel

Journal of Research of the National Bureau of Stan... 10.6028/jres.049.044
[9]
Ryan Kiros. 2013. Training neural networks with stochastic Hessian-free optimization. arXiv preprint arXiv:1301.3641. Ryan Kiros. 2013. Training neural networks with stochastic Hessian-free optimization. arXiv preprint arXiv:1301.3641.
[10]
Krizhevsky Alex (2009)
[11]
Krizhevsky Alex (2012)
[12]
Le Quoc V.
[13]
Backpropagation Applied to Handwritten Zip Code Recognition

Y. Lecun, B. Boser, J. S. Denker et al.

Neural Computation 10.1162/neco.1989.1.4.541
[14]
Gradient-based learning applied to document recognition

Y. Lecun, L. Bottou, Y. Bengio et al.

Proceedings of the IEEE 10.1109/5.726791
[15]
LeCun Yann
[16]
Learning methods for generic object recognition with invariance to pose and lighting

Y. Lecun, Fu Jie Huang, L. Bottou

Proceedings of the 2004 IEEE Computer Society Conf... 10.1109/cvpr.2004.1315150
[17]
A method for the solution of certain non-linear problems in least squares

Kenneth Levenberg

Quarterly of Applied Mathematics 10.1090/qam/10666
[19]
An Algorithm for Least-Squares Estimation of Nonlinear Parameters

Donald W. Marquardt

Journal of the Society for Industrial and Applied... 10.1137/0111030
[21]
Martens James
[22]
Netzer Yuval
[25]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[26]
Sutskever Ilya (2013)
[28]
Vinyals Oriol (2012)
[31]
Ashia C. Wilson Rebecca Roelofs Mitchell Stern Nati Srebro and Benjamin Recht. 2017. The marginal value of adaptive gradient methods in machine learning. In Advances in Neural Information Processing Systems. 4148--4158. Ashia C. Wilson Rebecca Roelofs Mitchell Stern Nati Srebro and Benjamin Recht. 2017. The marginal value of adaptive gradient methods in machine learning. In Advances in Neural Information Processing Systems. 4148--4158.
[32]
Matthew
Metrics
4
Citations
32
References
Details
Published
Jan 25, 2020
Vol/Issue
11(2)
Pages
1-30
License
View
Funding
MOST of Taiwan via Award: 105-2218-E-002-033
Cite This Article
Chien-Chih Wang, Kent Loong Tan, Chih-Jen Lin (2020). Newton Methods for Convolutional Neural Networks. ACM Transactions on Intelligent Systems and Technology, 11(2), 1-30. https://doi.org/10.1145/3368271
Related

You May Also Like

LIBSVM

Chih-Chung Chang, Chih-Jen Lin · 2011

41,159 citations

A Survey on Evaluation of Large Language Models

Yupeng Chang, Xu Wang · 2024

2,144 citations

Trajectory Data Mining

Yu Zheng · 2015

1,330 citations

Urban Computing

Yu Zheng, Licia Capra · 2014

977 citations

A Survey of Unsupervised Deep Domain Adaptation

Garrett Wilson, Diane J. Cook · 2020

726 citations