A statistical model for early estimation of the prevalence and severity of an epidemic or pandemic from simple tests for infection confirmation

Yuval Shahar, Osnat Mokryn

Research output: Contribution to journalArticlepeer-review


Epidemics and pandemics require an early estimate of the cumulative infection prevalence, sometimes referred to as the infection "Iceberg," whose tip are the known cases. Accurate early estimates support better disease monitoring, more accurate estimation of infection fatality rate, and an assessment of the risks from asymptomatic individuals. We find the Pivot group, the population sub-group with the highest probability of being detected and confirmed as positively infected. We differentiate infection susceptibility, assumed to be almost uniform across all population sub-groups at this early stage, from the probability of being confirmed positive. The latter is often related to the likelihood of developing symptoms and complications, which differs between sub-groups (e.g., by age, in the case of the COVID-19 pandemic). A key assumption in our method is the almost-random subgroup infection assumption: The risk of initial infection is either almost uniform across all population subgroups or not higher in the Pivot sub-group. We then present an algorithm that, using the lift value of the pivot sub-group, finds a lower bound for the cumulative infection prevalence in the population, that is, gives a lower bound on the size of the entire infection "Iceberg." We demonstrate our method by applying it to the case of the COVID-19 pandemic. We use UK and Spain serological surveys of COVID-19 in its first year to demonstrate that the data are consistent with our key assumption, at least for the chosen pivot sub-group. Overall, we applied our methods to nine countries or large regions whose data, mainly during the early COVID-19 pandemic phase, were available: Spain, the UK at two different time points, New York State, New York City, Italy, Norway, Sweden, Belgium, and Israel. We established an estimate of the lower bound of the cumulative infection prevalence for each of them. We have also computed the corresponding upper bounds on the infection fatality rates in each country or region. Using our methodology, we have demonstrated that estimating a lower bound for an epidemic's infection prevalence at its early phase is feasible and that the assumptions underlying that estimate are valid. Our methodology is especially helpful when serological data are not yet available to gain an initial assessment on the prevalence scale, and more so for pandemics with an asymptomatic transmission, as is the case with Covid-19.

Original languageEnglish
Article numbere0280874
JournalPLoS ONE
Issue number1
StatePublished - Jan 2023

Bibliographical note

Publisher Copyright:
© 2023 Shahar, Mokryn. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


  • Humans
  • COVID-19/diagnosis
  • Pandemics
  • Prevalence
  • Models, Statistical
  • New York City

ASJC Scopus subject areas

  • General


Dive into the research topics of 'A statistical model for early estimation of the prevalence and severity of an epidemic or pandemic from simple tests for infection confirmation'. Together they form a unique fingerprint.

Cite this