哺乳類の塩基対制約を利用して遺伝的変異とヒトの病気を理解する Leveraging base-pair mammalian constraint to understand genetic variation and human disease
Patrick F. Sullivan,Jennifer R. S. Meadows,Steven Gazal,BaDoi N. Phan,Xue Li ,Diane P. Genereux,Michael X. Dong,Matteo Bianchi,Gregory Andrews,Sharadha Sakthikumar,Jessika Nordin,Ananya Roy,Matthew J. Christmas,Voichita D. Marinescu,Chao Wang,Ola Wallerman,James Xue,Shuyang Yao,Quan Sun,Jin Szatkiewicz ,Jia Wen,Laura M. Huckins ,Alyssa Lawler ,Kathleen C. Keough ,Zhili Zheng ,Jian Zeng ,Naomi R. Wray,Yun Li,Jessica Johnson,Jiawen Chen,Zoonomia Consortium,Benedict Paten,Steven K. Reilly,Graham M. Hughes,Zhiping Weng,Katherine S. Pollard,Andreas R. Pfenning,Karin Forsberg-Nilsson,Elinor K. Karlsson and Kerstin Lindblad-Toh
Science Published:28 Apr 2023
Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge.
We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx).
Using constraint calculated across placental mammals, 3.3% of bases in the human genome are significantly constrained, including 57.6% of coding bases. Most constrained bases (80.7%) are noncoding. Common variants (allele frequency ≥ 5%) and low-frequency variants (0.5% ≤ allele frequency < 5%) are depleted for constrained bases (1.85 versus 3.26% expected by chance, P < 2.2 × 10-308). Pathogenic ClinVar variants are more constrained than benign variants (P < 2.2 × 10-16).
The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base.
Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes.
Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease.
Using evolutionary constraint in genomic studies of human diseases.
(A) Constraint was calculated across 240 mammal species, including 43 primates (teal line). (B) Pathogenic ClinVar variants (N = 73,885) are more constrained across mammals than benign variants (N = 231,642; P < 2.2 × 10-16). (C) More-constrained bases are more enriched for trait-associated variants (63 GWASs). (D) Enrichment of heritability is higher in constrained regions than in functional annotations (left), even in a joint model with 106 annotations (right). (E) Fine-mapping (PolyFun) using a model that includes constraint scores identifies an experimentally validated association at rs1421085. Error bars represent 95% confidence intervals. BMI, body mass index; LF, low frequency; PIP, posterior inclusion probability.
Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.
保存された要素におけるヒト特異的欠失の機能的および進化的な影響 The functional and evolutionary impacts of human-specific deletions in conserved elements
James R. Xue,Ava Mackay-Smith,Kousuke Mouri,Meilin Fernandez Garcia,Michael X. Dong,Jared F. Akers,Mark Noble,Xue Li,Zoonomia Consortium,Kerstin Lindblad-Toh,Elinor K. Karlsson,James P. Noonan,Terence D. Capellini,Kristen J. Brennand,Ryan Tewhey,Pardis C. Sabeti,and Steven K. Reilly
Science Published:28 Apr 2023
Deciphering the molecular and genetic changes that differentiate humans from our closest primate relatives is critical for understanding our origins. Although earlier studies have prioritized how newly gained genetic sequences or variations have contributed to evolutionary innovation, the role of sequence loss has been less appreciated. Alterations in evolutionary conserved regions that are enriched for biological function could be particularly more likely to have phenotypic effects. We thus sought to identify and characterize sequences that have been conserved across evolution, but are then surprisingly lost in all humans. These human-specific deletions in conserved regions (hCONDELs) may play an important role in uniquely human traits.
Sequencing advancements have identified millions of genetic changes between chimpanzee and human genomes; however, the functional impacts of the ~1 to 5% difference between our species is largely unknown. hCONDELs are one class of these predominantly noncoding sequence changes. Although large hCONDELs (>1 kb) have been previously identified, the vast majority of all hCONDELs (95.7%) are small (<20 base pairs) and have not yet been functionally assessed. We adapted massively parallel reporter assays (MPRAs) to characterize the effects of thousands of these small hCONDELs and uncovered hundreds with functional effects. By understanding the effects of these hCONDELs, we can gain insight into the mechanistic patterns driving evolution in the human genome.
We identified 10,032 hCONDELs by examining conserved regions across diverse vertebrate genomes and overlapping with confidently annotated, human-specific fixed deletions. We found that these hCONDELs are enriched to delete conserved sequences originating from stem amniotes. Overlap with transcriptional, epigenomic, and phenotypic datasets all implicate neuronal and cognitive functional impacts. We characterized these hCONDELs using MPRA in six different human cell types, including induced pluripotent stem cell–derived neural progenitor cells. We found that 800 hCONDELs displayed species-specific regulatory effect effects. Although many hCONDELs perturb transcription factor–binding sites in active enhancers, we estimate that 30% create or improve binding sites, including activators and repressors.
Some hCONDELs exhibit molecular functions that affect core neurodevelopmental genes. One hCONDEL removes a single base in an active enhancer in the neurogenesis gene HDAC5, and another deletes six bases in an alternative promoter of PPP2CA, a gene that regulates neuronal signaling. We deeply characterized an hCONDEL in a putative regulatory element of LOXL2, a gene that controls neuronal differentiation. Using genome engineering to reintroduce the conserved chimpanzee sequence into human cells, we confirmed that the human deletion alters transcriptional output of LOXL2. Single-cell RNA sequencing of these cells uncovered a cascade of myelination and synaptic function–related transcriptional changes induced by the hCONDEL.
Our identification of hundreds of hCONDELs with functional impacts reveals new molecular changes that may have shaped our unique biological lineage. These hCONDELs display predicted functions in a variety of biological systems but are especially enriched for function in neuronal tissue. Many hCONDELs induced gains of regulatory activity, a surprising discovery given that deletions of conserved bases are commonly thought to abrogate function. Our work provides a paradigm for the characterization of nucleotide changes shaping species-specific biology across humans or other animals.
Human-specific deletions that remove nucleotides from regions highly conserved in other animals (hCONDELs).
We assessed 10,032 hCONDELs across diverse, biologically relevant datasets and identified tissue-specific enrichment (top left). The regulatory impact of hCONDELs was characterized by comparing chimp and human sequences in MPRAs (bottom left). The ability of hCONDELs to either improve or perturb activating and repressing gene-regulatory elements was assessed (top right). The deleted chimpanzee sequence was reintroduced back into human cells, causing a cascade of transcriptional differences for an hCONDEL regulating LOXL2 (bottom right).
Conserved genomic sequences disrupted in humans may underlie uniquely human phenotypic traits. We identified and characterized 10,032 human-specific conserved deletions (hCONDELs). These short (average 2.56 base pairs) deletions are enriched for human brain functions across genetic, epigenomic, and transcriptomic datasets. Using massively parallel reporter assays in six cell types, we discovered 800 hCONDELs conferring significant differences in regulatory activity, half of which enhance rather than disrupt regulatory function. We highlight several hCONDELs with putative human-specific effects on brain development, including HDAC5, CPEB4, and PPP2CA. Reverting an hCONDEL to the ancestral sequence alters the expression of LOXL2 and developmental genes involved in myelination and synaptic function. Our data provide a rich resource to investigate the evolutionary mechanisms driving new traits in humans and other species.
数百種の胎生哺乳類における進化的制約と革新性 Evolutionary constraint and innovation across hundreds of placental mammals
Matthew J. Christmas,Irene M. Kaplow,Diane P. Genereux,Michael X. Dong,Graham M. Hughes,Xue Li ,Patrick F. Sullivan ,Allyson G. Hindle ,Gregory Andrews ,Joel C. Armstrong,Matteo Bianchi ,Ana M. Breit,Mark Diekhans,Cornelia Fanter,Nicole M. Foley,Daniel B. Goodman,Linda Goodman,Kathleen C. Keough,Bogdan Kirilenko,Amanda Kowalczyk,Colleen Lawless,Abigail L. Lind,Jennifer R. S. Meadows,Lucas R. Moreira,Ruby W. Redlich,Louise Ryan,Ross Swofford,Alejandro Valenzuela,Franziska Wagner,Ola Wallerman ,Ashley R. Brown ,Joana Damas ,Kaili Fan ,John Gatesy ,Jenna Grimshaw ,Jeremy Johnson ,Sergey V. Kozyrev ,Alyssa J. Lawler ,Voichita D. Marinescu ,Kathleen M. Morrill,Austin Osmanski,Nicole S. Paulat,BaDoi N. Phan,Steven K. Reilly,Daniel E. Schäffer,Cynthia Steiner,Megan A. Supple ,Aryn P. Wilder ,Morgan E. Wirthlin ,James R. Xue ,Zoonomia Consortium,Bruce W. Birren,Steven Gazal,Robert M. Hubley,Klaus-Peter Koepfli,Tomas Marques-Bonet,Wynn K. Meyer,Martin Nweeia,Pardis C. Sabeti ,Beth Shapiro ,Arian F. A. Smit ,Mark S. Springer ,Emma C. Teeling ,Zhiping Weng ,Michael Hiller,Danielle L. Levesque,Harris A. Lewin,William J. Murphy,Arcadi Navarro,Benedict Paten,Katherine S. Pollard,David A. Ray,Irina Ruf,Oliver A. Ryder,Andreas R. Pfenning,Kerstin Lindblad-Toh ,and Elinor K. Karlsson
Science Published:28 Apr 2023
A major challenge in genomics is discerning which bases among billions alter organismal phenotypes and affect health and disease risk. Evidence of past selective pressure on a base, whether highly conserved or fast evolving, is a marker of functional importance. Bases that are unchanged in all mammals may shape phenotypes that are essential for organismal health. Bases that are evolving quickly in some species, or changed only in species that share an adaptive trait, may shape phenotypes that support survival in specific niches. Identifying bases associated with exceptional capacity for cellular recovery, such as in species that hibernate, could inform therapeutic discovery.
The power and resolution of evolutionary analyses scale with the number and diversity of species compared. By analyzing genomes for hundreds of placental mammals, we can detect which individual bases in the genome are exceptionally conserved (constrained) and likely to be functionally important in both coding and noncoding regions. By including species that represent all orders of placental mammals and aligning genomes using a method that does not require designating humans as the reference species, we explore unusual traits in other species.
Zoonomia’s mammalian comparative genomics resources are the most comprehensive and statistically well-powered produced to date, with a protein-coding alignment of 427 mammals and a whole-genome alignment of 240 placental mammals representing all orders. We estimate that at least 10.7% of the human genome is evolutionarily conserved relative to neutrally evolving repeats and identify about 101 million significantly constrained single bases (false discovery rate < 0.05). We cataloged 4552 ultraconserved elements at least 20 bases long that are identical in more than 98% of the 240 placental mammals.
Many constrained bases have no known function, illustrating the potential for discovery using evolutionary measures. Eighty percent are outside protein-coding exons, and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Constrained bases tend to vary less within human populations, which is consistent with purifying selection. Species threatened with extinction have few substitutions at constrained sites, possibly because severely deleterious alleles have been purged from their small populations.
By pairing Zoonomia’s genomic resources with phenotype annotations, we find genomic elements associated with phenotypes that differ between species, including olfaction, hibernation, brain size, and vocal learning. We associate genomic traits, such as the number of olfactory receptor genes, with physical phenotypes, such as the number of olfactory turbinals. By comparing hibernators and nonhibernators, we implicate genes involved in mitochondrial disorders, protection against heat stress, and longevity in this physiologically intriguing phenotype. Using a machine learning–based approach that predicts tissue-specific cis–regulatory activity in hundreds of species using data from just a few, we associate changes in noncoding sequence with traits for which humans are exceptional: brain size and vocal learning.
Large-scale comparative genomics opens new opportunities to explore how genomes evolved as mammals adapted to a wide range of ecological niches and to discover what is shared across species and what is distinctively human. High-quality data for consistently defined phenotypes are necessary to realize this potential. Through partnerships with researchers in other fields, comparative genomics can address questions in human health and basic biology while guiding efforts to protect the biodiversity that is essential to these discoveries.
Comparing genomes from 240 species to explore the evolution of placental mammals.
Our new phylogeny (black lines) has alternating gray and white shading, which distinguishes mammalian orders (labeled around the perimeter). Rings around the phylogeny annotate species phenotypes. Seven species with diverse traits are illustrated, with black lines marking their branch in the phylogeny. Sequence conservation across species is described at the top left.
IMAGE CREDIT: K. MORRILL
Zoonomia is the largest comparative genomics resource for mammals produced to date. By aligning genomes for 240 species, we identify bases that, when mutated, are likely to affect fitness and alter disease risk. At least 332 million bases (~10.7%) in the human genome are unusually conserved across species (evolutionarily constrained) relative to neutrally evolving repeats, and 4552 ultraconserved elements are nearly perfectly conserved. Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth’s vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.