Abstract
Missing responses is a common type of data where the inter-ested outcomes are not always observed. In this paper, we develop two new kernel machines to handle such a case, which can be used for both regression and classification. The first proposed kernel machine uses only the complete cases where both response and covariates are observed. It is, however, subject to some assumption limitations. Our second proposed doubly-robust kernel machine overcomes such limitations regardless of the misspecifica-tion of either the missing mechanism or the conditional distribution of the response. Theoretical properties, including the oracle inequalities for the excess risk, universal consistency, and learning rates are established. We demonstrate the superiority of the proposed methods to some existing methods by simulation and illustrate their application to a real data set concerning a survey about homeless people.
Original language | English |
---|---|
Pages (from-to) | 3766-3820 |
Number of pages | 55 |
Journal | Electronic Journal of Statistics |
Volume | 14 |
Issue number | 2 |
DOIs | |
State | Published - 2020 |
Externally published | Yes |
Bibliographical note
Funding Information:The authors thank the editor, the associate editor and the two referees for carefully reading our manuscript and their helpful comments and constructive suggestion to improve the paper. Yair Goldberg was partially supported by the Israeli Science Foundation (grant No. 849/17). Tiantian Liu was partially supported by China Scholarship Council.
Publisher Copyright:
© 2020, Institute of Mathematical Statistics. All rights reserved.
Keywords
- Consistency
- Doubly-robust estimator
- Inverse probability weighted estimator
- Kernel machines
- Learning rate. Received October 2019
- Missing responses
- Oracle inequality
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty