bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2026–02–08
five papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Biochemistry. 2026 Feb 04.
      Thousands of recently discovered microproteins represent a new frontier in the search for functional and disease-causing genes. Though shorter than canonical proteins, some microproteins contain signal peptides and are predicted to produce secreted peptides. However, whether any of the microprotein-derived secreted peptides possess biological activity remains underexplored. Here, we screen a small library of secreted peptides from the microproteome by measuring signaling downstream from GPCRs. This approach identified several cAMP-stimulating peptides, including a secreted peptide from a "non-coding" HLA complex P5 RNA (HCP5). The HCP5-secreted peptide (HCP5-SP) is encoded by a small open reading frame embedded in the HCP5 mRNA. In vitro assays with synthetic HCP5-SP and HCP5-SP analogs validated its cAMP-stimulating activity and revealed the necessity for the wild-type C-terminal sequence for activity. Furthermore, HCP5-SP promotes the proliferation of HEK293T cells, providing an alternative mechanism that might explain some of the cancer biology associated with HCP5 mRNA. In summary, this work establishes a workflow for the preliminary identification of bioactive microproteins and demonstrates that the vast, largely untapped microproteome is a source of novel bioactive endogenous peptides.
    DOI:  https://doi.org/10.1021/acs.biochem.5c00764
  2. iScience. 2026 Feb 20. 29(2): 114585
      Small proteins (SPs, ≤50 aa) are often overlooked in genomics. We conducted the first systematic analysis of prokaryotic SPs across the full ocean-depth gradient. From 433,311 short open reading frames (sORFs) predicted from 71 western Pacific metagenomes, we identified 193,281 SP clusters. Filtration yielded 75,581 prevalent SPs, including 4,307 high-confidence clusters (RfSPs). Notably, 87.09% of RfSPs lacked non-marine homologs, and ∼70% contained unknown domains. While most (65.57%) were phylum-specific, twelve were distributed across ≥5 phyla, and some were prophage-associated. Geographically, twenty-three core RfSPs were universally present. Co-occurrence analysis revealed that interacting RfSPs typically originated from the same or adjacent zones. Finally, we confirmed the transcription of 8.20% RfSP clusters in deep-sea metatranscriptomes. The zone-specific transcription of certain RfSPs suggests adaptive functions, such as stress response and molecular chaperoning, in distinct marine environments. Our study reveals SPs as a critical strategy for prokaryotic adaptation to deep-sea stressors.
    Keywords:  Aquatic biology; Microbiology; Oceanography
    DOI:  https://doi.org/10.1016/j.isci.2025.114585
  3. Synth Syst Biotechnol. 2026 Jun;12 383-392
      Cellulase plays an irreplaceable role in biomanufacturing using plant biomass as feedstock. However, improving cellulase production by fungi through manipulation of upstream open reading frames (uORFs) in the 5'-untranslated regions (5'-UTR) of cellulase genes has been less frequently explored. This study aimed to screen uORFs in the 5'-UTR of cellulase genes in Penicillium oxalicum, identify functional uORFs in the 5'-UTR of the eg1 gene which encodes a key endo-β-1,4-glucanase (EG) in P. oxalicum, and enhance fungal cellulase production through uORF modifications. Among the 25 cellulase genes examined in P. oxalicum strain HP7-1, 23 contained uORFs in their 5'-UTR. Seven uORFs were annotated in the 5'-UTR of the eg1 gene. A uORF-green fluorescent protein (GFP) reporter system demonstrated that uORF1 and uORF3 inhibited, while uORF7 enhances, GFP abundance. Overexpression of eg1 containing uORF1 or uORF3 variants where the start codon of the uORF was mutated to AAG in P. oxalicum led to a significant 91.7 % and 62.1 % average increase in carboxymethyl cellulase production after 4 days of induction compared to the start strain ΔPoxKu70. Real-time quantitative reverse transcription-polymerase chain reaction, mRNA stability determination, and in vitro translation experiments collectively revealed that these three uORFs influence the mRNA stability of the downstream mORF, but not translation efficiency. These findings highlight the critical role of uORFs in regulating gene expression during fungal enzyme biosynthesis and offer a valuable alternative strategy for improving enzyme production.
    Keywords:  Endo-β-1,4-glucanase; Gene editing; Penicillium oxalicum; uORF
    DOI:  https://doi.org/10.1016/j.synbio.2025.12.016
  4. Front Bioinform. 2025 ;5 1676149
       Introduction: Translation initiation and termination are critical regulatory checkpoints in protein synthesis, yet accurate computational prediction of their sites remains challenging due to training data biases and the complexity of full-length transcripts.
    Methods: To address these limitations, we present TRANSAID (TRANSlation AI for Detection), a novel deep learning framework that accurately and simultaneously predicts translation initiation (TIS) and termination (TTS) sites from complete transcript sequences. TRANSAID's hierarchical architecture efficiently processes long transcripts, capturing both local motifs and long-range dependencies. Crucially, the model was trained on a human transcriptome dataset that was rigorously partitioned at the gene level to prevent data leakage and included both protein-coding (NM) and non-coding (NR) transcripts.
    Results: This mixed-training strategy enables TRANSAID to achieve high fidelity, correctly identifying 73.61% of NR transcripts as non-coding. Performance is further enhanced by an integrated biological scoring system, improving "perfect ORF prediction" for coding sequences to 94.94% and "correct non-coding prediction" to 82.00%. The human-trained model demonstrates remarkable cross-species applicability, maintaining high accuracy on organisms from mammals to yeast. Beyond annotation, TRANSAID serves as a powerful discovery tool for novel coding events. When applied to long-read sequencing data, it accurately identified previously unannotated protein isoforms validated by mass spectrometry (76.28% validation rate). Furthermore, homology searches of high-scoring ORFs predicted within NR transcripts suggest a strong potential for identifying cryptic translation events.
    Discussion: As a fully documented open-source tool with a user-friendly web server, TRANSAID provides a powerful and accessible resource for improving transcriptome annotation and proteomic discovery.
    Keywords:  cross-species analysis; deep learning; integrated scoring system; open reading frame; transcriptome annotation; translation site prediction
    DOI:  https://doi.org/10.3389/fbinf.2025.1676149
  5. J Genet Genomics. 2026 Jan 28. pii: S1673-8527(26)00023-8. [Epub ahead of print]
      Muscle growth and development are fundamental biological processes with significant implications for both human health and livestock production. Although circular RNAs (circRNAs) have long been regarded as noncoding RNAs, recent studies suggest that some circRNAs possess protein-coding potential. However, the biological roles and mechanisms of circRNA-encoded proteins remain poorly understood. Here, we identify circARHGAP10 as a protein-coding circRNA in cattle skeletal muscle that encodes a 202-amino acid protein, ARHGAP10-202aa, through an internal ribosome entry site (IRES)-dependent mechanism. ARHGAP10-202aa expression is confirmed by in vitro translation, immunodetection with a specific antibody, and Western blotting analysis. Functional assays reveal that ARHGAP10-202aa interacts with myosin light chain 6 (MYL6) to promote myoblast differentiation. Moreover, in vivo overexpression of ARHGAP10-202aa significantly enhances MYL6 expression and accelerates the regeneration of injured tibialis anterior muscle in mice. These findings not only expand our understanding of the role of circRNAs in muscle biology but also underscore the functional significance of circRNA-encoded proteins in muscle recovery and regeneration.
    Keywords:  ARHGAP10-202aa; CircRNA; Muscle differentiation; Myogenesis; Translation
    DOI:  https://doi.org/10.1016/j.jgg.2026.01.008