NISTの研究により、初の完全ヒトゲノムが遺伝子解析の強化につながることが示される(First Complete Human Genome Poised to Strengthen Genetic Analysis, NIST Study Shows)

ad
ad

2022-03-31 アメリカ国立標準技術研究所(NIST)

・新たに更新されたヒトゲノムは、長年のギャップを埋め、私たちの遺伝暗号を構成する30億以上の文字を完全に綴ったものですが、別の関連研究により、このゲノムがDNA配列決定の能力を飛躍的に向上させる正確なテンプレートとして機能することが示されました。
・ゲノムを完成させたTelomere-to-Telomere(T2T)コンソーシアムでは、フルゲノムが数千人のDNA配列決定を支援する能力をテストしています。従来のゲノム配列で生じた何万ものエラーが修正され、医学的に重要な200以上の遺伝子の解析に適していることが、『サイエンス』誌に発表された新しい論文で明らかにされました。
・T2Tのゲノムが遺伝性疾患の研究を大きく推進し、さらに将来、患者がより信頼性の高い診断の恩恵を受ける可能性がある。

<関連情報>

完全な参照ゲノムがヒトの遺伝的変異の解析を向上させる A complete reference genome improves analysis of human genetic variation

SERGEY AGANEZOV,STEPHANIE M. YAN,XDANIELA C. SOTO,XMELANIE KIRSCHE,SAMANTHA ZARATE,PAVEL AVDEYEV,DYLAN J. TAYLOR,KISHWAR SHAFIN,ALAINA SHUMATE,CHUNLIN XIAO ,JUSTIN WAGNER,JENNIFER MCDANIEL,NATHAN D. OLSON,MICHAEL E. G. SAURIA,MITCHELL R. VOLLGER,ARANG RHIE,MELISSA MEREDITH,SKYLAR MARTIN,JOYCE LEE,SERGEY KOREN,JEFFREY A. ROSENFELD,BENEDICT PATEN,RYAN LAYER,CHEN-SHAN CHIN,FRITZ J. SEDLAZECK ,NANCY F. HANSEN,DANNY E. MILLER,ADAM M. PHILLIPPY,KAREN H. MIGA,RAJIV C. MCCOY,X MEGAN Y. DENNIS, JUSTIN M. ZOOK AND MICHAEL C. SCHATZ
Science Published:1 Apr 2022
DOI: 10.1126/science.abl3533

Abstract

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.

For the past 20 years, the human reference genome (GRCh38) has served as the bedrock of human genetics and genomics (13). One of the central applications of the human reference genome, and of reference genomes in general, has been to serve as a substrate for clinical, comparative, and population genomic analyses. More than 1 million human genomes have been sequenced to study genetic diversity and clinical relationships, and nearly all of them have been analyzed by aligning the sequencing reads from the donors to the reference genome [e.g., (46)]. Even when donor genomes are assembled de novo, independent of any reference, the assembled sequences are almost always compared to a reference genome to characterize variation by leveraging deep catalogs of available annotations (7, 8). Consequently, human genetics and genomics benefit from the availability of a high-quality reference genome, ideally without gaps or errors that may obscure important variation and regulatory relationships.

The current human reference genome, GRCh38, is used for countless applications, with rich resources available to visualize and annotate the sequence across cell types and disease states (3, 912). However, despite decades of effort to construct and refine its sequence, the human reference genome still suffers from several major limitations that hinder comprehensive analysis. Most immediately, GRCh38 contains more than 100 million nucleotides that either remain entirely unresolved (currently represented as “N” characters), such as the p-arms of the acrocentric chromosomes, or are substituted with artificial models, such as the centromeric satellite arrays (13). Furthermore, GRCh38 possesses 11.5 Mbp of unplaced and unlocalized sequences that are represented separately from the primary chromosomes (3, 14). These sequences are difficult to study, and many genomic analyses exclude them to avoid identifying false variants and false regulatory relationships (6). Relatedly, artifacts such as an apparent imbalance between insertions and deletions (indels) have been attributed to systematic misassemblies in GRCh38 (1517). Overall, these errors and omissions in GRCh38 introduce biases in genomic analyses, particularly in centromeres, satellites, and other complex regions.

タイトルとURLをコピーしました