Motivation: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype.
Bibliographical noteFunding Information:
This work was supported by the National Institute on Aging [U01AG023755, U19AG023122, R21AG056630]; the William M. Wood Foundation; the Paulette and Marty Samowitz Family Foundation; the Longevity Genes Project [R01AG618381, R01AG042188, R01AG046949, P01AG021654]; the Einstein Nathan Shock Center grant [P30AG038072]; and the Einstein Glenn Center for the Biology of Human Aging. The Health and Retirement Study genetic data are sponsored by the National Institute on Aging [U01AG009740, RC2AG036495, RC4AG039029] and was conducted by the University of Michigan.
© 2019 The Author(s).
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics