journal article Dec 04, 2012

Tuning Parameter Selection in High Dimensional Penalized Likelihood

View at Publisher Save 10.1111/rssb.12001
Abstract
SummaryDetermining how to select the tuning parameter appropriately is essential in penalized likelihood methods for high dimensional data analysis. We examine this problem in the setting of penalized likelihood methods for generalized linear models, where the dimensionality of covariates p is allowed to increase exponentially with the sample size n. We propose to select the tuning parameter by optimizing the generalized information criterion with an appropriate model complexity penalty. To ensure that we consistently identify the true model, a range for the model complexity penalty is identified in the generlized information criterion. We find that this model complexity penalty should diverge at the rate of some power of log (p) depending on the tail probability behaviour of the response variables. This reveals that using the Akaike information criterion or Bayes information criterion to select the tuning parameter may not be adequate for consistently identifying the true model. On the basis of our theoretical study, we propose a uniform choice of the model complexity penalty and show that the approach proposed consistently identifies the true model among candidate models with asymptotic probability 1. We justify the performance of the procedure proposed by numerical simulations and a gene expression data analysis.
Topics

No keywords indexed for this article. Browse by subject →

References
34
[1]
Akaike (1973)
[2]
Bai "Model selection with data-oriented penalty" J. Statist. Planng Inf. (1999) 10.1016/s0378-3758(98)00168-2
[3]
Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection

Patrick Breheny, Jian Huang

The Annals of Applied Statistics 2011 10.1214/10-aoas388
[4]
Statistics for High-Dimensional Data

Peter Bühlmann, Sara van de Geer

Springer Series in Statistics 2011 10.1007/978-3-642-20192-9
[5]
[6]
De La Peña "Bounds on the tail probability of U-statistics and quadratic forms" Bull. Am. Math. Soc. (1994) 10.1090/s0273-0979-1994-00522-1
[7]
Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data

Sandrine Dudoit, Jane Fridlyand, Terence P Speed

Journal of the American Statistical Association 2002 10.1198/016214502753479248
[8]
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

Jianqing Fan, Runze Li

Journal of the American Statistical Association 2001 10.1198/016214501753382273
[9]
Fan "A selective overview of variable selection in high dimensional feature space" Statist. Sin. (2010)
[10]
Fan "Non-concave penalized likelihood with np-dimensionality" IEEE Trans. Inform. Theor. (2011) 10.1109/tit.2011.2158486
[11]
Fan "Sure independence screening in generalized linear models with NP-dimensionality" Ann. Statist. (2010) 10.1214/10-aos798
[12]
Friedman "Regularization paths for generalized linear models via coordinate descent" J. Statist. Softwr. (2010)
[13]
van de Geer "M-estimation using penalties or sieves" J. Statist. Planng Inf. (2002) 10.1016/s0378-3758(02)00270-7
[14]
Golub "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring" Science (1999) 10.1126/science.286.5439.531
[15]
Hastie (2009)
[16]
van der Hilst "Seismo-stratigraphy and thermal structure of earth’s core-mantle boundary region" Science (2007) 10.1126/science.1137867
[17]
Jagannathan "Risk reduction in large portfolios: why imposing the wrong constraints helps" J. Finan. (2003) 10.1111/1540-6261.00580
[18]
Lv "A unified approach to model selection and sparse recovery using regularized least squares" Ann. Statist. (2009) 10.1214/09-aos683
[19]
Lv "Model selection principles in misspecified models" Manuscript (2010)
[20]
Generalized Linear Models

P. McCullagh, J. A. Nelder

1989 10.1007/978-1-4899-3242-6
[21]
Nishii "Asymptotic properties of criteria for selection of variables in multiple regression" Ann. Statist. (1984) 10.1214/aos/1176346522
[22]
Estimating the Dimension of a Model

Gideon Schwarz

The Annals of Statistics 1978 10.1214/aos/1176344136
[23]
Shao "An asymptotic theory for linear model selection" Statist. Sin. (1997)
[24]
Regression Shrinkage and Selection Via the Lasso

Robert Tibshirani

Journal of the Royal Statistical Society Series B:... 1996 10.1111/j.2517-6161.1996.tb02080.x
[25]
Wang "Forward regression for ultra-high dimensional variable screening" J. Am. Statist. Ass. (2009) 10.1198/jasa.2008.tm08516
[26]
Wang "Shrinkage tuning parameter selection with a diverging number of parameters" J. R. Statist. Soc. B (2009) 10.1111/j.1467-9868.2008.00693.x
[27]
Wang "Tuning parameter selectors for the smoothly clipped absolute deviation method" Biometrika (2007) 10.1093/biomet/asm053
[28]
Wang "Consistent tuning parameter selection in high dimensional sparse linear regression" J. Multiv. Anal. (2011) 10.1016/j.jmva.2011.03.007
[29]
Yang "Can the strengths of aic and bic be shared?: a conflict between model identification and regression estimation" Biometrika (2005) 10.1093/biomet/92.4.937
[30]
Zhang "Nearly unbiased variable selection under minimax concave penalty" Ann. Statist. (2010) 10.1214/09-aos729
[31]
Zhang "The sparsity and bias of the Lasso selection in high-dimensional linear regression" Ann. Statist. (2006)
[32]
Zhang "Regularization parameter selections via generalized information criterion" J. Am. Statist. Ass. (2010) 10.1198/jasa.2009.tm08013
[33]
Zhao "On model selection consistency of Lasso" J. Mach. Learn. Res. (2006)
[34]
Zou "One-step sparse estimates in nonconcave penalized likelihood models (with discussion)" Ann. Statist. (2008)
Cited By
238
AStA Advances in Statistical Analys...
Advances in Data Analysis and Class...
Digital Finance
Biometrika
Journal of the American Statistical...
BMC Medical Genomics
Metrics
238
Citations
34
References
Details
Published
Dec 04, 2012
Vol/Issue
75(3)
Pages
531-552
License
View
Funding
University of Southern California
National University of Singapore
National Science Foundation ‘Career’ Award: DMS-1150318
Risk Management Institute
Cite This Article
Yingying Fan, Cheng Yong Tang (2012). Tuning Parameter Selection in High Dimensional Penalized Likelihood. Journal of the Royal Statistical Society Series B: Statistical Methodology, 75(3), 531-552. https://doi.org/10.1111/rssb.12001
Related

You May Also Like

Regression Shrinkage and Selection Via the Lasso

Robert Tibshirani · 1996

50,685 citations

Maximum Likelihood from Incomplete Data Via the EM Algorithm

A. P. Dempster, N. M. Laird · 1977

49,275 citations

Regression Models and Life-Tables

D. R. Cox · 1972

38,899 citations

Regularization and Variable Selection Via the Elastic Net

Hui Zou, Trevor Hastie · 2005

20,401 citations