Upstream open reading frames (uORFs) are tissue-specific cis-regulators of protein translation. Isolated reports have shown that variants that create or disrupt uORFs can cause disease. Here, in a systematic genome-wide study using 15,708 whole genome sequences, we show that variants that create new upstream start codons, and variants disrupting stop sites of existing uORFs, are under strong negative selection. This selection signal is significantly stronger for variants arising upstream of genes intolerant to loss-of-function variants. Furthermore, variants creating uORFs that overlap the coding sequence show signals of selection equivalent to coding missense variants. Finally, we identify specific genes where modification of uORFs likely represents an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in neurofibromatosis. Our results highlight uORF-perturbing variants as an under-recognised functional class that contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data in studying non-coding variant classes.
Bibliographical noteFunding Information:
N.W. is supported by a Rosetrees and Stoneygate Imperial College Research Fellowship. This work was supported by the Wellcome Trust (107469/Z/15/Z), the Medical Research Council (UK), the NIHR Biomedical Research Unit in Cardiovascular Disease at Royal Brompton and Harefield NHS Foundation Trust and Imperial College London, the NIHR Imperial College Biomedical Research Centre, the Fondation Leducq (11 CVD-01), a Health Innovation Challenge Fund award from the Wellcome Trust and Department of Health, UK (HICF-R6–373), and by NIDDK U54DK105566 and NIGMS R01GM104371. D.G.E. and M.J.S. are funded by the NIHR Biomedical research centre Manchester (IS-BRC-1215-20007). L.C.F. is supported by the Swiss National Science Foundation (Advanced Postdoc.Mobility 177853). N.Q. is supported by the Imperial College Academic Health Science Centre. The results published here are in part based upon data: (1) generated by The Cancer Genome Atlas managed by the NCI and NHGRI (accession: phs000178.v10.p8). Information about TCGA can be found at http:// cancergenome.nih.gov, (2) generated by the Genotype-Tissue Expression Project (GTEx) managed by the NIH Common Fund and NHGRI (accession: phs000424.v7.p2), (3) generated by the Exome Sequencing Project, managed by NHLBI, (4) generated by the Alzheimer’s Disease Sequencing Project (ADSP), managed by the NIA and NHGRI (accession: phs000572.v7.p4). The views expressed in this work are those of the authors and not necessarily those of any of the funders. We would like to thank Dr. Christopher M Yates for his statistical advice.
© 2020, The Author(s).
ASJC Scopus subject areas
- Physics and Astronomy (all)
- Chemistry (all)
- Biochemistry, Genetics and Molecular Biology (all)