Improved estimators for semi-supervised high-dimensional regression model

Ilan Livne, Yair Goldberg, David Azriel

Research output: Contribution to journalArticlepeer-review

Abstract

We study a high-dimensional linear regression model in a semisupervised setting, where for many observations only the vector of covari-ates X is given with no responses Y . We do not make any sparsity as-sumptions on the vector of coefficients, nor do we assume normality of the covariates. We aim at estimating the signal level, i.e., the amount of variation in the response that can be explained by the set of covariates. We propose an estimator, which is unbiased, consistent, and asymptotically normal. This estimator can be improved by adding zero-estimators arising from the unlabeled data. Adding zero-estimators does not affect the bias and potentially can reduce the variance. We further present an algorithm based on our approach that improves any given signal level estimator. Our theoretical results are demonstrated in a simulation study.

Original languageEnglish
Pages (from-to)5437-5487
Number of pages51
JournalElectronic Journal of Statistics
Volume16
Issue number2
DOIs
StatePublished - 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2022, Institute of Mathematical Statistics. All rights reserved.

Keywords

  • Linear regression
  • U-statistics
  • semi-supervised learning
  • variance estimation
  • zero-estimators

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Improved estimators for semi-supervised high-dimensional regression model'. Together they form a unique fingerprint.

Cite this