TY - JOUR
T1 - A perturbation method for inference on regularized regression estimates
AU - Minnier, Jessica
AU - Tian, Lu
AU - Cai, Tianxi
N1 - Funding Information:
Jessica Minnier is Ph.D. Candidate, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115 (E-mail: jminnier@hsph.harvard. edu). Lu Tian is Assistant Professor, Department of Health Research & Policy, Stanford University School of Medicine, Palo Alto, CA 94304 (E-mail: lutian@stanford.edu). Tianxi Cai is Associate Professor, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115 (E-mail: tcai@hsph.harvard.edu). This research was supported by National Institutes of Health grants T32 AI007358, R01 GM079330, R01 HL089778, and DMS 0854970. The authors thank the editor, the associate editor, and two referees for their insightful and constructive comments that greatly improved the article.
PY - 2011/12
Y1 - 2011/12
N2 - Analysis of high-dimensional data often seeks to identify a subset of important features and to assess the effects of these features on outcomes. Traditional statistical inference procedures based on standard regression methods often fail in the presence of high-dimensional features. In recent years, regularization methods have emerged as promising tools for analyzing high-dimensional data. These methods simultaneously select important features and provide stable estimation of their effects. Adaptive LASSO and SCAD, for instance, give consistent and asymptotically normal estimates with oracle properties. However, in finite samples, it remains difficult to obtain interval estimators for the regression parameters. In this article, we propose perturbation resampling-based procedures to approximate the distribution of a general class of penalized parameter estimates. Our proposal, justified by asymptotic theory, provides a simple way to estimate the covariance matrix and confidence regions. Through finite-sample simulations, we verify the ability of this method to give accurate inference and compare it with other widely used standard deviation and confidence interval estimates. We also illustrate our proposals with a dataset used to study the association of HIV drug resistance and a large number of genetic mutations.
AB - Analysis of high-dimensional data often seeks to identify a subset of important features and to assess the effects of these features on outcomes. Traditional statistical inference procedures based on standard regression methods often fail in the presence of high-dimensional features. In recent years, regularization methods have emerged as promising tools for analyzing high-dimensional data. These methods simultaneously select important features and provide stable estimation of their effects. Adaptive LASSO and SCAD, for instance, give consistent and asymptotically normal estimates with oracle properties. However, in finite samples, it remains difficult to obtain interval estimators for the regression parameters. In this article, we propose perturbation resampling-based procedures to approximate the distribution of a general class of penalized parameter estimates. Our proposal, justified by asymptotic theory, provides a simple way to estimate the covariance matrix and confidence regions. Through finite-sample simulations, we verify the ability of this method to give accurate inference and compare it with other widely used standard deviation and confidence interval estimates. We also illustrate our proposals with a dataset used to study the association of HIV drug resistance and a large number of genetic mutations.
KW - High-dimensional regression
KW - Interval estimation
KW - Oracle property
KW - Regularized estimation
KW - Resampling methods
UR - http://www.scopus.com/inward/record.url?scp=84855997958&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84855997958&partnerID=8YFLogxK
U2 - 10.1198/jasa.2011.tm10382
DO - 10.1198/jasa.2011.tm10382
M3 - Article
AN - SCOPUS:84855997958
SN - 0162-1459
VL - 106
SP - 1371
EP - 1382
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 496
ER -