bims-micpro 2025-08-10 papers

BMC Methods. 2025 ;2(1): 16

ShortStop: a machine learning framework for microprotein discovery.

Brendan Miller, Eduardo Vieira de Souza, Victor J Pai, Hosung Kim, Joan M Vaughan, Calvin J Lau, Jolene K Diedrich, Alan Saghatelian.

Background: The human genome contains over 3 million small open reading frames (smORFs, ≤ 150 codons). Ribosome profiling and proteogenomics transformed our understanding of these sequences by showing that thousands are actively translated, and hundreds produce detectable peptides by mass spectrometry. However, the random arrangement of codons across the 3-gigabase human genome naturally generates smORFs by chance, suggesting many may represent translational noise or regulatory elements rather than functional proteins. This is supported by the fact that most translating smORFs occur in upstream open reading frames (uORFs), which typically regulate translation of canonical coding sequences rather than encode bioactive microproteins. As interest grows in uncovering biologically meaningful microproteins, a key challenge remains: distinguishing functional smORFs from non-functional or regulatory translation products. Although empirical methods such as individual microprotein studies or large-scale screens can help, these approaches are time-consuming, expensive, and come with technical limitations. New complementary strategies are needed.
Methods: To address this challenge, we developed ShortStop, a computational framework based on the idea that not all translating smORFs produce functional proteins, but the ones that do may resemble experimentally characterized microproteins. ShortStop classifies smORFs into two reference groups: Swiss-Prot Analog Microproteins (SAMs), which resemble known microproteins, and PRISMs (Physicochemically Resembling In Silico Microproteins), which are synthetic sequences designed to match the composition of translating smORFs but lacking sequence order or evolutionary selection, and therefore serving as a proxy for non-functional peptides. This two-class system enables machine learning to help prioritize smORFs for downstream study.
Results: ShortStop achieved high precision (90-94%), recall (87-96%), and F1 scores (90-93%) across all classes. When applied to a published dataset of translating smORFs, ShortStop classified about 8% as candidates with biochemical properties resembling Swiss-Prot microproteins (i.e., called SAMs). The remaining 92% resembled in silico generated sequences (i.e., called PRISMs), representing noncanonical proteins, non-functional peptides, or regulatory translation events. SAMs showed lower C-terminal hydrophobicity-linked to reduced proteasomal degradation-and greater N-terminal hydrophilicity at neutral pH, suggesting improved solubility and intracellular stability. ShortStop also identified microproteins overlooked by other methods, including one encoded by an upstream overlapping smORF in the StAR gene, which was detectable in human cells and steroid-producing tissues. In a clinical lung cancer dataset, ShortStop uncovered differentially expressed microprotein candidates, several of which were validated by mass spectrometry.
Discussion: ShortStop addresses a key gap in microprotein research-the lack of scalable tools to characterize microproteins and standardized negative training data to train machine learning models for microproteins. By providing a classification framework rooted in biochemical features, ShortStop offers a practical solution for targeting smORFs in functional studies, benchmarking new discovery tools, and advancing microprotein research.
Supplementary Information: The online version contains supplementary material available at 10.1186/s44330-025-00037-4.

Keywords: Cancer; De Novo genes; Machine learning; Microprotein; Peptides; Proteogenomics; Ribosome profiling; Small open reading frame; Steroidogenic acute regulatory protein

DOI: https://doi.org/10.1186/s44330-025-00037-4

Proc Natl Acad Sci U S A. 2025 Aug 12. 122(32): e2506534122

CRISPR-Cas9 screening reveals microproteins regulating adipocyte proliferation and lipid metabolism.

Victor J Pai, Huanqi Shan, Cynthia J Donaldson, Joan M Vaughan, Eduardo V De Souza, Carolyn O'Connor, Michelle Liem, Antonio F M Pinto, Jolene Diedrich, Alan Saghatelian.

Small open reading frames (smORFs) encode microproteins that play crucial roles in various biological processes, yet their functions in adipocyte biology remain largely unexplored. In a previous study, we identified thousands of smORFs in white and brown adipocytes derived from the stromal vascular fraction of mice using ribosome profiling. Here, we expand on this work by identifying additional smORFs related to adipocytes using the in vitro 3T3-L1 preadipocyte model. To systematically investigate the functional relevance of these smORFs, we designed a custom CRISPR/Cas9 single guide RNA (sgRNA) library and screened for smORFs influencing adipocyte proliferation and differentiation. Through a dropout screen and fluorescence-assisted cell sorting of lipid droplets, we identified dozens of smORFs that regulate either cell proliferation or lipid accumulation. The smORFs on the 5'- and 3'-untranslated regions (i.e., upstream smORFs (uORFs) and downstream smORFs (dORFs)) of functional genes can exert activity through cis-regulatory effects of the main ORF on these messenger RNAs (mRNAs), such as uORFs of MDM2 that impact proliferation. However, other smORFs, especially those from mRNAs with no other ORFs, point to a functional microprotein. Indeed, we tested a candidate smORF 1183 from a long noncoding RNA 923011K14Rik and demonstrated that the microprotein regulates adipocyte differentiation. These findings highlight the potential of CRISPR/Cas9-based screening to uncover functional smORFs and provide a framework for further exploration of microproteins in adipocyte biology and metabolic regulation.

Keywords: CRISPR; adipogenesis; lipid; microprotein; smORF

DOI: https://doi.org/10.1073/pnas.2506534122

FASEB J. 2025 Aug 15. 39(15): e70924

Ribo-Seq Analysis-Based Elucidation of the Dynamic Translation Landscape of Yak Ovarian Tissues in Different Reproductive Stages.

Liyan Hu, Shaoke Guo, Mengli Cao, Lin Xiong, Ziqiang Ding, Yandong Kang, Ben Zhang, Bao Cai, Jie Pei, Xian Guo.

Translation, plays a critical regulatory role in follicular development, ovulation, and corpus luteum formation and degeneration in the ovaries. To better understand the molecular mechanisms of reproductive regulation at the translation level in yaks, the present study analyzed gene expression changes in the ovarian tissues of yaks in different reproductive stages by using ribosome profiling and integrating RNA sequencing data. The small open reading frames (sORFs) of the ovarian tissue were characterized, and the effect of the translation efficiency of the targeted genes on their sequence features was determined. The results showed that over 80% of genes in the two groups exhibited inconsistent changes in their expression at the transcription and translation levels; this finding indicated that the changes in gene expression at both levels were not merely synergistic. The pathway enrichment analysis revealed that these differentially expressed genes were enriched in various pathways, including PI3K-Akt, MAPK, calcium signaling, and ovarian steroidogenesis. Further investigations showed that some genes related to ovarian function displayed inconsistent changes in their expression at both transcription and translation levels and exhibited dynamic changes in translation activity, including PALB2, BMP7, PIK3R2, and WNT2B. Additionally, we identified 66 predicted translatable sORFs and assessed the impact of upstream ORFs on the translation efficiency of downstream major ORFs. The present study systematically revealed the characteristics of gene expression at the translational level in yak ovarian tissues in different reproductive stages for the first time and provided a new perspective for in-depth understanding of the physiological mechanisms of yak reproduction.

Keywords: ovarian development; ribosome profiling; sORF; translation regulation; yak

DOI: https://doi.org/10.1096/fj.202500646RR

Int J Biol Macromol. 2025 Aug 04. pii: S0141-8130(25)07076-X. [Epub ahead of print]321(Pt 3): 146519

Ribosome profiling sequencing reveals translational dynamics during yak testicular development.

Shaoke Guo, Mengli Cao, Xingdong Wang, Ziqiang Ding, Yandong Kang, Liyan Hu, Ben Zhang, Jie Pei, Xian Guo.

Translation regulation plays a crucial role in testicular development and spermatogenesis, but its dynamic mechanism has not yet been elucidated. This study integrated transcriptome data through ribosomal sequencing (Ribo-seq) to analyze the translation landscape of yak (Bos grunniens) testes at 6 months (Y6M), 18 months (Y18M), and 4 years (Y4Y) of age. The results revealed that the ribosome footprint characteristics of yaks were consistent with those of other mammals. The differentially translated genes during sexual maturity are significantly enriched in the meiotic cell cycle, PI3K Akt, and Notch signaling pathways. From Y6M to Y18M, most of the TE altered genes showed inverse transcription-translation efficiency trends, potentially involved in protein ubiquitination modification. From 18 M to 4Y, translationally altered genes lacked transcriptional changes but associated with acetyltransferase and phosphotransferase activity. PPI analysis identified stage-specific regulatory genes: COL1A2/MEIOB/SYCP3 (6 M-18 M) and STAT1/ITGB5/ERBB2 (18 M-4Y). Additionally, we identified 106 predicted translatable small open reading frames (sORFs), which included annotations for 58 known coding proteins and 1 long non-coding RNA. Sequence feature analysis revealed that higher translation efficiency correlates with longer uORF length, lower GC content, shorter CDS length, and higher NMEF. In conclusion, the results provide new insights into the dynamic regulation of gene translation during testicular development and spermatogenesis, which is highly significant for enhancing yak reproductive performance.

Keywords: ORFs; Ribo-seq; Testicular development; Translation regulation; Yak

DOI: https://doi.org/10.1016/j.ijbiomac.2025.146519