統計ツールにより、DNAデータセットの「ギャップ」を無視してはいけないことが判明(Statistical Tool Finds ‘Gaps’ in DNA Data Sets Shouldn’t Be Ignored)

2022-08-17

2022-08-15 ノースカロライナ州立大学(NCState)

簡単な統計的検定により、現在の慣行とは異なり、進化生物学でよく用いられるDNAタンパク質と配列のアラインメントにおける「ギャップ」が、時間の経過に伴うヌクレオチドやアミノ酸の置換に関する重要な情報を提供できることが示されました。
ギャップ位置がアミノ酸置換過程から独立しているかどうかを評価するための簡単な統計的検定を作成しました。彼らは1390の異なる配列アラインメントのセットをテストし、およそ3分の2のセットで、ギャップ位置とアミノ酸置換の間の独立性という通常の仮定が棄却されることを発見しました。

<関連情報>

アライメントギャップとヌクレオチド置換またはアミノ酸置換の相関関係 Correlations between alignment gaps and nucleotide substitution or amino acid replacement

Tae-Kun Seo , Benjamin D. Redelings, and Jeffrey L. Thorne
Proceedings of the National Academy of Sciences Published:August 16, 2022
DOI:https://doi.org/10.1073/pnas.2204435119

Significance

We introduce a test of the null hypothesis that nucleotide substitution or amino acid replacement processes are independent of gap locations within sequence alignments. When applying this test to alignments that are informed by protein structure, the null is rejected about 2/3 of the time. This indicates that modifications are needed to the usual approach of ignoring gap locations when making evolutionary inferences. Additionally, we demonstrate that optimal alignments introduce spurious correlations between gap locations and nucleotide substitution patterns. Because these spurious correlations will not be eliminated by employing genomic-scale datasets, we emphasize the need for modifying the conventional approach of basing evolutionary inferences upon single optimal alignments.

Abstract

To assess the conventional treatment in evolutionary inference of alignment gaps as missing data, we propose a simple nonparametric test of the null hypothesis that the locations of alignment gaps are independent of the nucleotide substitution or amino acid replacement process. When we apply the test to 1,390 protein alignments that are informed by protein tertiary structure and use a 5% significance level, the null hypothesis of independence between amino acid replacement and gap location is rejected for ∼65% of datasets. Via simulations that include substitution and insertion–deletion, we show that the test performs well with true alignments. When we simulate according to the null hypothesis and then apply the test to optimal alignments that are inferred by each of four widely used software packages, the null hypothesis is rejected too frequently. Via further simulations and analyses, we show that the overly frequent rejections of the null hypothesis are not solely due to weaknesses of widely used software for finding optimal alignments. Instead, our evidence suggests that optimal alignments are unrepresentative of true alignments and that biased evolutionary inferences may result from relying upon individual optimal alignments.

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30