少ない脳画像データから再現性の高い予測が可能に－大規模脳画像データの「主成分」を小規模研究に転用し、脳行動予測の再現性を向上－

2026-06-12

2026-06-12 東北大学

東北大学の研究グループは、脳MRI研究で長年課題となっている「再現性危機」を改善する新たな解析手法を提案した。脳画像データは約16万4千次元という超高次元データであり、主成分分析（PCA）によって情報を圧縮するのが一般的だが、被験者数が少ない研究では主成分ベクトルが不安定になり、研究結果の再現性が低下する問題があった。研究チームはHuman Connectome Project（HCP）の1,113名分の脳画像データを用いて検証し、100人未満のサンプルでは主成分の安定性が著しく低く、その94～97％がサンプルサイズによって説明されることを定量的に示した。さらに、大規模データから算出した安定した主成分ベクトルを、独立した小規模データに適用する手法を開発したところ、認知能力や性格特性の予測精度が有意に向上した。また、利用する主成分数には最適値が存在することも明らかになった。本成果は、公開大規模脳画像データを活用することで、小規模研究でも信頼性の高い脳行動予測を実現できる可能性を示し、神経科学研究の再現性向上に貢献すると期待される。

図1. 新手法の模式図。大規模データから抽出した統計的に安定した成分を、統計的に不安定な小規模データに適用することによって、小規模データの解析の統計的安定性を高めることができる。

＜関連情報＞

大規模サンプルPCA固有ベクトルは皮質厚成分を安定化させ、小規模サンプルの脳行動予測を改善する Large-sample PCA eigenvectors stabilize cortical thickness components and improve small sample brain behavior prediction

Zhang Yun Feng,Kenchi Hosokawa & Chihiro Hosoda
Scientific Reports Published:25 May 2026
DOI:https://doi.org/10.1038/s41598-026-52800-4 Unedited version

Abstract

Reproducible brain-wide association studies remain challenging in structural MRI, in part because high-dimensional cortical measures yield unstable eigenspaces in small samples. Here, using cortical thickness data from the Human Connectome Project Young Adult cohort (N = 1,113), we examined how sample size influences the stability of principal component analysis (PCA) and whether eigenvectors derived from larger samples can improve brain-behavior prediction in independent small samples. PCA stability was quantified across overlapping and non-overlapping resampling schemes using cosine similarity and one-to-one Hungarian matching of components. PCA stability increased systematically with sample size: Subsamples under 100 participants yielded few stable components, whereas larger subsamples produced dozens. Thus, reproducibility hinges not only on statistical power but on the stability of the representational basis: components learned in small subsamples are fragile, while eigenvectors from larger samples converge to stable, transferable axes. We then compared three prediction settings (500 vs. 500, 500 vs. 100, and 100 vs. 100) across 65 cognitive and personality traits using linear regression and machine-learning models. Transferring eigenvectors derived from larger samples to another smaller samples consistently improved prediction relative to deriving PCA components within the same small sample, although absolute effect sizes remained modest. Prediction performance was highest at an intermediate dimensionality of approximately 30 principal components, indicating that increasing the number of retained components does not necessarily improve generalization. These findings identify PCA eigenspace stability as a key determinant of reproducible brain-behavior inference and suggest that reusing larger-sample PCA eigenvectors is a practical strategy for stabilizing feature extraction in resource-limited neuroimaging studies.

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30