Foundational Models in Biology: When hammering nails with a microscope makes sense
O. Mezhenskyi1, D. Kravchuk1
- Bogomoletz Institute of Physiology NASU
DOI: https://doi.org/10.15407/fz71.05.075

Abstract
Foundational models (FMs)—large pre-trained neural architectures—are transforming modern biology by providing universal representations learned from massive, heterogeneous, and often unlabeled datasets. Unlike classical task-specific machine-learning models, FMs can be fine-tuned for genomics, cheminformatics, bioimaging, and physiological signal analysis with minimal amounts of labeled data. This mini-review summarizes key applications of DNABERT, MolBERT, DiffDock, and Segment Anything, highlighting their advantages in accuracy, generalizability, and multimodal integration. We also outline the potential of FMs in physiology and neurophysiology, where they may unify signals from patch-clamp recordings, microelectrode measurements, and calcium imaging into a single analytical framework..
Keywords:
: foundational models; machine learning; DNABERT; MolBERT; DiffDock; Segment Anything; bioinformatics; neurophysiology; patch-clamp; image analysis
References
- Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022;23(1):40-55.
CrossRef
PubMed
- Awais M, Naseer M, Khan S, Anwer RM, Cholakkal H, Shah M, et al. Foundational models defining a new era in vision: A survey and outlook. IEEE Trans Pattern Anal Mach Intell. 2023;47(4):2245-64.
CrossRef
PubMed
- Neidlinger P, Nahhas OSME, Muti HS, Lenz T, Hoffmeister M, Brenner H, et al. Benchmarking foundation models as feature extractors for weakly-supervised computational pathology. arXiv. 2024. Available from: http://arxiv.org/abs/2408.15823
CrossRef
PubMed
- Tran DH, Meunier M, Cheriet F. Multi-domain learning CNN model for microscopy image classification. arXiv. 2023. Available from: http://arxiv.org/abs/2304.10616
- Danuser G. Computer Vision in Cell Biology. Cell. 2011;147(5):973-8.
CrossRef
PubMed
- Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, et al. Segment Anything. Proc IEEE Int Conf Comput Vis. 2023;3992-4003.
CrossRef
- Stringer C, Wang T, Michaelos M, Pachitariu M. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods. 2021;18(1):100-6.
CrossRef
PubMed
- Low-dimensional embeddings of high-dimensional data. arXiv. 2025. Available from: https://arxiv.org/html/2508.15929v1
- Wang C, Jiang Y, Peng Z, Li C, Bang C, Zhao L, et al. Towards a general-purpose foundation model for fMRI analysis. 2025.
- Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015;12(10):931-4.
CrossRef
PubMed PubMedCentral
- Ji Y, Zhou Z, Liu H, Davuluri RV. DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics. 2021;37(15):2112-20.
CrossRef
PubMed PubMedCentral
- Zhou Z, Ji Y, Li W, Ramana D, Davuluri V, et al. DNABERT-2: Efficient foundation model and benchmark for multi-species genome. 2024. Available from: https://arxiv.org/pdf/2306.15006v2
Finding the Tree of Life in Evo2. 2025. Available from: https://www.goodfire.ai/research/phylogeny-manifold
- Brixi G, Durrant MG, Ku J, Poli M, Brockman G, Chang D, et al. Genome modeling and design across all domains of life with Evo2. bioRxiv. 2025. Available from: https://www.biorxiv.org/content/10.1101/2025.02.18.638918v1
CrossRef
- Keyvanpour MR, Shirzad MB. An analysis of QSAR research based on machine learning concepts. Curr Drug Discov Technol. 2021;18(1):17-30.
CrossRef
PubMed
- Graph neural networks for materials science and chemistry. Commun Mater. 2022. Available from: https://www.nature.com/articles/s43246-022-00315-6
- Chithrananda S, Grand G, Ramsundar B. ChemBERTa: Large-scale self-supervised pretraining for molecular property prediction. 2020. Available from: https://arxiv.org/pdf/2010.09885
- Li J, Jiang X. Mol-BERT: An effective molecular representation with BERT for molecular property prediction. Wirel Commun Mob Comput. 2021;2021:7181815.
CrossRef
- Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, et al. MoleculeNet: A benchmark for molecular machine learning. arXiv. 2018. Available from: http://arxiv.org/abs/1703.00564
CrossRef
PubMed PubMedCentral
- Corso G, Stärk H, Jing B, Barzilay R, Jaakkola T. DiffDock: Diffusion steps, twists, and turns for molecular docking. ICLR 2023. Available from: https://arxiv.org/pdf/2210.01776
- Wang R, Fang X, Lu Y, Yang CY, Wang S. The PDBbind database: methodologies and updates. J Med Chem. 2005;48(12):4111-9.
CrossRef
PubMed
- Buttenschoen M, Morris GM, Deane CM. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem Sci. 2023;15(9):3130-9.
CrossRef
PubMed PubMedCentral
- Johnson AEW, Pollard TJ, Shen L, Lehman LWH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035.
CrossRef
PubMed PubMedCentral
- Clifford GD, Liu C, Moody B, Lehman LH, Silva I, Li Q, et al. AF classification from a short single-lead ECG recording: The PhysioNet/Computing in Cardiology Challenge 2017. Comput Cardiol. 2017;44:1-4.
CrossRef
PubMed PubMedCentral
|