Epigenetic pacemaker: Closed form algebraic solutions

Research output: Contribution to journalArticlepeer-review

Abstract

Background: DNA methylation is widely used as a biomarker in crucial medical applications as well as for human age prediction of very high accuracy. This biomarker is based on the methylation status of several hundred CpG sites. In a recent line of publications we have adapted a versatile concept from evolutionary biology - the Universal Pacemaker (UPM) - to the setting of epigenetic aging and denoted it the Epigenetic PaceMaker (EPM). The EPM, as opposed to other epigenetic clocks, is not confined to specific pattern of aging, and the epigenetic age of the individual is inferred independently of other individuals. This allows an explicit modeling of aging trends, in particular non linear relationship between chronological and epigenetic age. In one of these recent works, we have presented an algorithmic improvement based on a two-step conditional expectation maximization (CEM) algorithm to arrive at a critical point on the likelihood surface. The algorithm alternates between a time step and a site step while advancing on the likelihood surface. Results: Here we introduce non trivial improvements to these steps that are essential for analyzing data sets of realistic magnitude in a manageable time and space. These structural improvements are based on insights from linear algebra and symbolic algebra tools, providing us greater understanding of the degeneracy of the complex problem space. This understanding in turn, leads to the complete elimination of the bottleneck of cumbersome matrix multiplication and inversion, yielding a fast closed form solution in both steps of the CEM.In the experimental results part, we compare the CEM algorithm over several data sets and demonstrate the speedup obtained by the closed form solutions. Our results support the theoretical analysis of this improvement. Conclusions: These improvements enable us to increase substantially the scale of inputs analyzed by the method, allowing us to apply the new approach to data sets that could not be analyzed before.

Original languageEnglish
Article number257
JournalBMC Genomics
Volume21
Issue numberSuppl 2
DOIs
StatePublished - 16 Apr 2020

Bibliographical note

Funding Information:
We would like to thank the three reviewers of this manuscript for their detailed and supportive comments that helped us to improve substantially the paper. The authors also acknowledge the Computational Genomics Summer Institute funded by NIH Grant GM112625 that fostered international collaboration among the groups involved in this project. We also wish to thank Matteo Pellegrini and Colin Farrell for helpful discussions about the manuscript.

Funding Information:
The publication cost of this article was funded by the VW Foundation, project VWZN3157. Part of this research was done when SS attended the computational genomics summer institute (CGSI) at UCLA.

Publisher Copyright:
© 2020 The Author(s).

Keywords

  • Conditional Expectation Maximization
  • Epigenetics
  • Matrix Multiplication
  • Symbolic Algebra
  • Universal PaceMaker
  • Aging/genetics
  • Epigenomics
  • Epigenesis, Genetic
  • Humans
  • Genomics/methods
  • DNA Methylation
  • Likelihood Functions
  • Algorithms
  • Biological Clocks/genetics
  • CpG Islands
  • Models, Genetic

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Fingerprint

Dive into the research topics of 'Epigenetic pacemaker: Closed form algebraic solutions'. Together they form a unique fingerprint.

Cite this