2026-05-21 マウントサイナイ医療システム(MSHS)
<関連情報>
- https://www.mountsinai.org/about/newsroom/2026/researchers-develop-ai-model-that-maps-how-genes-work-together-in-human-cells
- https://www.cell.com/patterns/fulltext/S2666-3899(26)00074-7
GSFM:多様な遺伝子セットの大規模コレクションで事前学習された遺伝子セット基盤モデル GSFM: A gene set foundation model pre-trained on a massive collection of diverse gene sets
Daniel J.B. Clarke ∙ Giacomo B. Marino ∙ Avi Ma’ayan
Patterns Published:May 21, 2026
DOI:https://doi.org/10.1016/j.patter.2026.101565

Highlights
- GSFM enables gene function prediction using any annotated gene set as input
- The GSFM website serves gene pages with function predictions for all human genes
- Simple model architecture was trained on a massive and diverse collection of gene sets
- GSFM is also used to predict protein interactions and perform enrichment analysis
Summary
Trained on massive datasets, foundation models produce embeddings used for many applications. We created a gene set foundation model (GSFM) trained on a massive collection of unlabeled gene sets from Rummagene and RummaGEO. Rummagene extracts gene sets from supplemental materials of publications, and RummaGEO hosts gene sets computed from published transcriptomics studies. Several GSFM architectures were benchmarked for their ability to predict gene function, gene-disease associations, and protein-protein interactions as well as to perform gene set enrichment analysis. Gene function predictions were compared with other models and evaluated using labeled gene sets from the Gene Ontology and KEGG pathways, the GWAS Catalog, and ChEA. The best GSFM architecture is a denoising autoencoder trained on multi-hot-encoded gene sets. This GSFM model achieves better performance compared with the other models. Gene-focused landing pages were created to serve GSFM gene function predictions for all human genes. These landing pages are served on a dedicated platform that also provide GSFM gene set augmentation and GSFM gene set enrichment analysis.

