bioRxiv. 2026 Jan 07. pii: 2026.01.06.698060. [Epub ahead of print]
Recent years have seen rapid growth in single-cell foundation models (scFMs), raising expectations for transformative advances in genomic data analysis. However, their adoption has been hindered by inconsistent performance across datasets, fragmented software ecosystems, high technical barriers, and the lack of best practices established through systematic, reproducible benchmarks. Here we present a unified, extensible, and fully automated computational framework that standardizes the execution, evaluation, and extension of diverse scFMs. The framework harmonizes software environments, eliminates manual configuration, and enables large-scale, reproducible evaluation across heterogeneous datasets and training regimes. Leveraging this infrastructure, we systematically benchmark thirteen foundation models alongside classical baselines across more than fifty datasets under zero-shot, few-shot, and fine-tuning settings. We show that pretrained embeddings capture biologically meaningful structure and provide clear advantages in low-label and transfer-learning scenarios, while classical PCA approach remains competitive or even preferable in others. Together, this work lowers technical barriers, delivers best practices, and establishes a transparent and reproducible standard for community-wide evaluation, accelerating rigorous development and adoption of scFMs.