Abstract
We study a high-dimensional linear regression model in a semisupervised setting, where for many observations only the vector of covari-ates X is given with no responses Y . We do not make any sparsity as-sumptions on the vector of coefficients, nor do we assume normality of the covariates. We aim at estimating the signal level, i.e., the amount of variation in the response that can be explained by the set of covariates. We propose an estimator, which is unbiased, consistent, and asymptotically normal. This estimator can be improved by adding zero-estimators arising from the unlabeled data. Adding zero-estimators does not affect the bias and potentially can reduce the variance. We further present an algorithm based on our approach that improves any given signal level estimator. Our theoretical results are demonstrated in a simulation study.
Original language | English |
---|---|
Pages (from-to) | 5437-5487 |
Number of pages | 51 |
Journal | Electronic Journal of Statistics |
Volume | 16 |
Issue number | 2 |
DOIs | |
State | Published - 2022 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2022, Institute of Mathematical Statistics. All rights reserved.
Keywords
- Linear regression
- U-statistics
- semi-supervised learning
- variance estimation
- zero-estimators
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty