Analysis of microbiome data in the presence of excess zeros

Abhishek Kaul, Siddhartha Mandal, Ori Davidov, Shyamal D. Peddada

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data. Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR.

Original languageEnglish
Article number2114
JournalFrontiers in Microbiology
Volume8
Issue numberNOV
DOIs
StatePublished - 7 Nov 2017

Bibliographical note

Publisher Copyright:
© 2017 Kaul, Mandal, Davidov and Peddada.

Keywords

  • Aitchisons log-ratio
  • Bootstrap
  • Covariates
  • Cross-sectional data
  • False discovery rate (FDR)
  • Microbiome data

ASJC Scopus subject areas

  • Microbiology (medical)
  • Microbiology

Fingerprint

Dive into the research topics of 'Analysis of microbiome data in the presence of excess zeros'. Together they form a unique fingerprint.

Cite this