2026-06-12 東北大学

図1. 新手法の模式図。大規模データから抽出した統計的に安定した成分を、統計的に不安定な小規模データに適用することによって、小規模データの解析の統計的安定性を高めることができる。
<関連情報>
- https://www.tohoku.ac.jp/japanese/2026/06/press20260612-01-brain.html
- https://www.nature.com/articles/s41598-026-52800-4
大規模サンプルPCA固有ベクトルは皮質厚成分を安定化させ、小規模サンプルの脳行動予測を改善する Large-sample PCA eigenvectors stabilize cortical thickness components and improve small sample brain behavior prediction
Zhang Yun Feng,Kenchi Hosokawa & Chihiro Hosoda
Scientific Reports Published:25 May 2026
DOI:https://doi.org/10.1038/s41598-026-52800-4 Unedited version
Abstract
Reproducible brain-wide association studies remain challenging in structural MRI, in part because high-dimensional cortical measures yield unstable eigenspaces in small samples. Here, using cortical thickness data from the Human Connectome Project Young Adult cohort (N = 1,113), we examined how sample size influences the stability of principal component analysis (PCA) and whether eigenvectors derived from larger samples can improve brain-behavior prediction in independent small samples. PCA stability was quantified across overlapping and non-overlapping resampling schemes using cosine similarity and one-to-one Hungarian matching of components. PCA stability increased systematically with sample size: Subsamples under 100 participants yielded few stable components, whereas larger subsamples produced dozens. Thus, reproducibility hinges not only on statistical power but on the stability of the representational basis: components learned in small subsamples are fragile, while eigenvectors from larger samples converge to stable, transferable axes. We then compared three prediction settings (500 vs. 500, 500 vs. 100, and 100 vs. 100) across 65 cognitive and personality traits using linear regression and machine-learning models. Transferring eigenvectors derived from larger samples to another smaller samples consistently improved prediction relative to deriving PCA components within the same small sample, although absolute effect sizes remained modest. Prediction performance was highest at an intermediate dimensionality of approximately 30 principal components, indicating that increasing the number of retained components does not necessarily improve generalization. These findings identify PCA eigenspace stability as a key determinant of reproducible brain-behavior inference and suggest that reusing larger-sample PCA eigenvectors is a practical strategy for stabilizing feature extraction in resource-limited neuroimaging studies.

