Resampling-based information criteria for best-subset regression

Philip T. Reiss, Lei Huang, Joseph E. Cavanaugh, Amy Krain Roy

Research output: Contribution to journalArticlepeer-review

Abstract

When a linear model is chosen by searching for the best subset among a set of candidate predictors, a fixed penalty such as that imposed by the Akaike information criterion may penalize model complexity inadequately, leading to biased model selection. We study resampling-based information criteria that aim to overcome this problem through improved estimation of the effective model dimension. The first proposed approach builds upon previous work on bootstrap-based model selection. We then propose a more novel approach based on cross-validation. Simulations and analyses of a functional neuroimaging data set illustrate the strong performance of our resampling-based methods, which are implemented in a new R package.

Original languageEnglish
Pages (from-to)1161-1186
Number of pages26
JournalAnnals of the Institute of Statistical Mathematics
Volume64
Issue number6
DOIs
StatePublished - Dec 2012
Externally publishedYes

Bibliographical note

Funding Information:
Acknowledgments The first author’s research is supported in part by National Science Foundation grant DMS-0907017. The authors thank Mike Milham, Eva Petkova, Thad Tarpey, Lee Dicker and Tao Zhang, for illuminating discussions; Zarrar Shehzad, for assistance with the functional connectivity data; and the Associate Editor and referee, whose incisive comments led to major improvements in the paper.

Keywords

  • Adaptive model selection
  • Covariance inflation criterion
  • Cross-validation
  • Extended information criterion
  • Functional connectivity
  • Overoptimism

ASJC Scopus subject areas

  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Resampling-based information criteria for best-subset regression'. Together they form a unique fingerprint.

Cite this