A perturbation method for inference on regularized regression estimates

Jessica Minnier, Lu Tian, Tianxi Cai

Research output: Contribution to journalArticlepeer-review

66 Scopus citations

Abstract

Analysis of high-dimensional data often seeks to identify a subset of important features and to assess the effects of these features on outcomes. Traditional statistical inference procedures based on standard regression methods often fail in the presence of high-dimensional features. In recent years, regularization methods have emerged as promising tools for analyzing high-dimensional data. These methods simultaneously select important features and provide stable estimation of their effects. Adaptive LASSO and SCAD, for instance, give consistent and asymptotically normal estimates with oracle properties. However, in finite samples, it remains difficult to obtain interval estimators for the regression parameters. In this article, we propose perturbation resampling-based procedures to approximate the distribution of a general class of penalized parameter estimates. Our proposal, justified by asymptotic theory, provides a simple way to estimate the covariance matrix and confidence regions. Through finite-sample simulations, we verify the ability of this method to give accurate inference and compare it with other widely used standard deviation and confidence interval estimates. We also illustrate our proposals with a dataset used to study the association of HIV drug resistance and a large number of genetic mutations.

Original languageEnglish (US)
Pages (from-to)1371-1382
Number of pages12
JournalJournal of the American Statistical Association
Volume106
Issue number496
DOIs
StatePublished - Dec 2011
Externally publishedYes

Keywords

  • High-dimensional regression
  • Interval estimation
  • Oracle property
  • Regularized estimation
  • Resampling methods

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'A perturbation method for inference on regularized regression estimates'. Together they form a unique fingerprint.

Cite this