Newton Methods for Convolutional Neural Networks

Chien-Chih Wang; Kent Loong Tan; Chih-Jen Lin

doi:10.1145/3368271

journal article Jan 25, 2020

Newton Methods for Convolutional Neural Networks

Chien-Chih Wang

Kent Loong Tan Chih-Jen Lin

ACM Transactions on Intelligent Systems and Technology Vol. 11 No. 2 pp. 1-30 · Association for Computing Machinery (ACM)

View at Publisher Save 10.1145/3368271

Abstract

Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some situations. Recently, Newton methods have been investigated as an alternative optimization technique, but most existing studies consider only fully connected feedforward neural networks. These studies do not investigate some more commonly used networks such as Convolutional Neural Networks (CNN). One reason is that Newton methods for CNN involve complicated operations, and so far no works have conducted a thorough investigation. In this work, we give details of all building blocks, including the evaluation of function, gradient, Jacobian, and Gauss-Newton matrix-vector products. These basic components are very important not only for practical implementation but also for developing variants of Newton methods for CNN. We show that an efficient
MATLAB
implementation can be done in just several hundred lines of code. Preliminary experiments indicate that Newton methods are less sensitive to parameters than the stochastic gradient approach.

Topics

No keywords indexed for this article. Browse by subject →

References

32

[1]

10.5555/3305381.3305439

[2]

10.1137/10079923x

[3]

Dean Jeffrey "Large scale distributed deep networks" Advances in Neural Information Processing Systems (NIPS) (2012)

[4]

10.1145/77626.79170

[5]

Grosse Roger (2016)

[6]

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren et al.

2015 IEEE International Conference on Computer Vis... 10.1109/iccv.2015.123

[7]

He Xi (2016)

[8]

Methods of conjugate gradients for solving linear systems

M.R. Hestenes, E. Stiefel

Journal of Research of the National Bureau of Stan... 10.6028/jres.049.044

[9]

Ryan Kiros. 2013. Training neural networks with stochastic Hessian-free optimization. arXiv preprint arXiv:1301.3641. Ryan Kiros. 2013. Training neural networks with stochastic Hessian-free optimization. arXiv preprint arXiv:1301.3641.

[10]

Krizhevsky Alex (2009)

[11]

Krizhevsky Alex (2012)

[12]

Le Quoc V.

[13]

Backpropagation Applied to Handwritten Zip Code Recognition

Y. Lecun, B. Boser, J. S. Denker et al.

Neural Computation 10.1162/neco.1989.1.4.541

[14]

Gradient-based learning applied to document recognition

Y. Lecun, L. Bottou, Y. Bengio et al.

Proceedings of the IEEE 10.1109/5.726791

[15]

LeCun Yann

[16]

Learning methods for generic object recognition with invariance to pose and lighting

Y. Lecun, Fu Jie Huang, L. Bottou

Proceedings of the 2004 IEEE Computer Society Conf... 10.1109/cvpr.2004.1315150

[17]

A method for the solution of certain non-linear problems in least squares

Kenneth Levenberg

Quarterly of Applied Mathematics 10.1090/qam/10666

[18]

10.5555/1390681.1390703

[19]

An Algorithm for Least-Squares Estimation of Nonlinear Parameters

Donald W. Marquardt

Journal of the Society for Industrial and Applied... 10.1137/0111030

[20]

10.5555/3104322.3104416

[21]

Martens James

[22]

Netzer Yuval

[23]

10.1016/s0893-6080(98)00116-6

[24]

10.1162/08997660260028683

[25]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

[26]

Sutskever Ilya (2013)

[27]

10.1145/2733373.2807412

[28]

Vinyals Oriol (2012)

[29]

10.1162/neco_a_00751

[30]

10.1162/neco_a_01088

[31]

Ashia C. Wilson Rebecca Roelofs Mitchell Stern Nati Srebro and Benjamin Recht. 2017. The marginal value of adaptive gradient methods in machine learning. In Advances in Neural Information Processing Systems. 4148--4158. Ashia C. Wilson Rebecca Roelofs Mitchell Stern Nati Srebro and Benjamin Recht. 2017. The marginal value of adaptive gradient methods in machine learning. In Advances in Neural Information Processing Systems. 4148--4158.

[32]

Matthew

Metrics

4

Citations

32

References

Details

Published: Jan 25, 2020
Vol/Issue: 11(2)
Pages: 1-30
License: View

Authors

C

Chien-Chih Wang

Rakuten Institute of Technology, Tokyo, Japan

K

Kent Loong Tan

National Taiwan University, Taipei City, Taiwan

C

Chih-Jen Lin

National Taiwan University, Taipei City, Taiwan

Funding

MOST of Taiwan via Award: 105-2218-E-002-033

Cite This Article

Chien-Chih Wang, Kent Loong Tan, Chih-Jen Lin (2020). Newton Methods for Convolutional Neural Networks. ACM Transactions on Intelligent Systems and Technology, 11(2), 1-30. https://doi.org/10.1145/3368271

Newton Methods for Convolutional Neural Networks

You May Also Like