Structural zeros in high-dimensional data with applications to microbiome studies

Abhishek Kaul, Ori Davidov, Shyamal D. Peddada

Research output: Contribution to journalArticlepeer-review

Abstract

This paper is motivated by the recent interest in the analysis of high-dimensional microbiome data. A key feature of these data is the presence of "structural zeros" which are microbes missing from an observation vector due to an underlying biological process and not due to error in measurement. Typical notions of missingness are unable to model these structural zeros. We define a general framework which allows for structural zeros in the model and propose methods of estimating sparse high-dimensional covariance and precision matrices under this setup. We establish error bounds in the spectral and Frobenius norms for the proposed estimators and empirically verify them with a simulation study. The proposed methodology is illustrated by applying it to the global gut microbiome data of Yatsunenko and others (2012. Human gut microbiome viewed across age and geography. Nature 486, 222-227). Using our methodology we classify subjects according to the geographical location on the basis of their gut microbiome.

Original languageEnglish
Pages (from-to)422-433
Number of pages12
JournalBiostatistics
Volume18
Issue number3
DOIs
StatePublished - 1 Jul 2017

Bibliographical note

Funding Information:
Intramural Research Program of the NIH, NIEHS (Z01 ES101744-04) to S.D.P. and A.K.; Israeli Science Foundation (1256/13) to O.D.

Keywords

  • Classification
  • High dimension
  • Microbiome data
  • Missing data
  • Sparsity

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Structural zeros in high-dimensional data with applications to microbiome studies'. Together they form a unique fingerprint.

Cite this