2026-05-09 合肥物質科学研究院(HFIPS)

BCRInsight Model Framework (Image by ZHAO Hailong)
<関連情報>
- https://english.hf.cas.cn/nr/rn/202605/t20260509_1158638.html
- https://academic.oup.com/bib/article/27/2/bbag154/8653665
BCRInsight:BCR配列から生物学的シグナルを解読するための抗体言語モデル BCRInsight: an antibody language model to decode biological signals from BCR sequences
Hailong Zhao,Shang Lou,Xuhua Li,Yiyang Gao,Wenjing Cao,Hongcang Gu,Fan Zhang
Briefings in Bioinformatics Published:14 April 2026
DOI:https://doi.org/10.1093/bib/bbag154
Abstract
The B-cell receptor (BCR) repertoire encodes not only antigen-binding specificity but also intrinsic signatures reflecting B-cell functional states and differentiation trajectories. Deciphering the intricate sequence semantics embedded within these repertoires is pivotal for elucidating immune dynamics and expediting antibody discovery. Although single-cell sequencing provides high-resolution insights, its scalability and cost remain major obstacles, leaving population-level repertoire data underexploited. Furthermore, conventional bioinformatics approaches struggle to model the high-order, non-linear semantic dependencies inherent in antibody sequences. To address these challenges, we present BCRInsight, an antibody-specific pretrained language model that integrates a Transformer architecture with phenotype-aware contrastive learning. Pretrained on 80 million human BCR sequences, BCRInsight learns biologically meaningful contextual representations that encode subtle signatures of B-cell activation, maturation, and clonal evolution. Extensive benchmarking demonstrates that BCRInsight achieves state-of-the-art performance across multiple downstream tasks, particularly in paratope prediction. Further evaluation on diverse single-cell immune cohorts, including healthy, neoplastic, and viral infection states, reveals cross-scenario robustness and superior generalization relative to existing methods. Notably, attention-based analyses show that high-attention regions correspond closely to physical antigen-contact residues, highlighting emergent structural interpretability derived solely from self-supervised learning. Collectively, BCRInsight establishes a new paradigm for decoding the “language” of antibodies, offering a scalable and interpretable framework for computational immunology and rational antibody engineering.

