bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2026–01–11
eight papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Expert Opin Drug Discov. 2026 Jan 04. 1-28
       INTRODUCTION: Small open reading frame-encoded peptides (SEPs) are short peptides translated from small open reading frames (sORFs) that were previously overlooked in genome annotations. SEPs have relatively small molecular sizes, fewer than 100 amino acids, some SEPs can be as short as a dozen amino acids. Recent studies have revealed their widespread presence across plants, animals, and microorganisms, as well as their diverse biological functions and potential applications.
    AREAS COVERED: This review introduces the characteristics and biogenesis of SEPs, the processes and methods for their identification and validation, and their functional roles and target sites, highlighting the significant potential of SEPs in biological research and therapeutic development. Relevant literature was identified on PubMed (2010-2025) by searching for 'SEP,' 'sORF,' and 'Microprotein.'
    EXPERT OPINION: The revolutionary advances in high-throughput omics technologies - particularly mass spectrometry and ribosome profiling - combined with computational prediction methods such as machine learning, have enabled the discovery of an increasing number of SEPs. Research on SEP is currently in a phase of rapid development, and this suggests that the field of peptide drugs may gain many promising molecular candidate.
    Keywords:  SEP; Small open reading frame-encoded peptides; drug discovery; function; target site
    DOI:  https://doi.org/10.1080/17460441.2025.2603517
  2. Plant J. 2026 Jan;125(1): e70663
      Seed germination is crucial for agricultural reproduction. A deep understanding of this process can secure healthy growth at the early phases of plant development and therefore yield. Recent research indicates that germination is a complex process involving translational regulation. A large group of seed-stored mRNAs together with newly synthesized transcripts are regulated by post-transcriptional mechanisms and selectively translated at different stages to support the germination process. To investigate the mechanism of translational control, we performed ribosome profiling on mRNAs of distinct physiological stages during Arabidopsis thaliana seed germination. The presence of ribosome association on mRNAs with three-nucleotide periodicity indicates their capacity for translation. Dry seeds, in which translation is on hold, are characterized by a unique ribosome association landscape with a higher ribosome association at the 5' and 3' UTR, compared with physiological stages that show active translation. Start codon-specific stalling of ribosomes in dry seeds is associated with an adenine-enriched sequence motif. Throughout germination, codons encoding glycine, aspartate, tyrosine, and proline are the most frequent ribosome pausing sites. Moreover, the non-coding ribosome-associated RNAs that we identified are indeed translated, as was revealed by investigating total seed proteome data. Seed-specific upstream open reading frames (uORFs) have been identified that may play a role in translational regulation of early seed germination. Altogether, we present a first ribosome profiling analysis across seed germination that illuminates various regulatory mechanisms that potentially contribute to the seed's survival strategy.
    Keywords:  Arabidopsis thaliana; long non‐coding RNAs; ribosome profiling; seed germination; translational control; upstream open reading frame (uORF)
    DOI:  https://doi.org/10.1111/tpj.70663
  3. FEBS Lett. 2026 Jan 09.
      Mitochondrial protein Slm35 is linked to TOR1 signaling, mitophagy, and stress response in Saccharomyces cerevisiae. Nonetheless, little is known about its regulation or how it affects stress adaptation. In this work, we identified stress-related transcription factor binding sites and two upstream open reading frames (uORFs) in the 5'-UTR of SLM35. Using transcriptional reporters, we showed that the transcription factor Gis1 represses SLM35 transcription; however, Slm35 protein levels increased under oxidative stress and in early stationary phase, suggesting post-transcriptional regulation. Site-directed mutagenesis revealed that one uORF negatively regulates translation, with its disruption leading to altered Slm35 levels and a reproducible increase in mitophagy flux. These findings reveal multilayered control of SLM35 expression and underscore the role of uORF-mediated translation in mitochondrial stress responses. Impact statement This study shows that SLM35, encoding a mitochondrial protein, is controlled through multiple regulatory layers, combining transcriptional repression by stress-responsive factors with uORF-mediated translational regulation. By linking these mechanisms to mitophagy, the work provides new insight into mitochondrial quality control under stress.
    Keywords:  SLM35; Saccharomyces cerevisiae; gene expression; mitochondria; mitophagy; stress‐response; upstream open reading frame
    DOI:  https://doi.org/10.1002/1873-3468.70269
  4. J Hum Genet. 2026 Jan 05.
      The importance of 5'-untranslated region (5'-UTR) variants in genetic diseases has become increasingly recognized. However, systematic frameworks for interpreting their pathogenic mechanisms remain underdeveloped. We performed genome sequencing (GS) or reanalyzed exome sequencing (ES) data from patients with neurodevelopmental disorders in whom no pathogenic variants had previously been identified, and searched for variants affecting upstream open reading frames (uORFs) in the 5'-UTR using UTRannotator, a tool for annotating 5'-UTR variants. We identified one patient with a maternally inherited single nucleotide duplication upstream of ATRX (c.-138dup), which is predicted to result in the formation of an out-of-frame uORF overlapping the coding sequence (CDS). The patient exhibited the core features of ATRX-related disorders. RNA sequencing of urine-derived cells (UDCs) revealed reduced ATRX expression in the patient. Luciferase reporter assays demonstrated that wild-type and mutant ATRX 5'-UTR sequences conferred significantly increased and decreased luciferase activity compared with the parental pGL3-promoter vector, respectively, suggesting that the c.-138dup variant may disrupt an enhancer-like regulatory element and impair translation. We also identified another patient with a de novo single nucleotide variant upstream of POU3F3 (c.-303C>A), which introduces a novel uORF overlapping the CDS in-frame. This patient showed phenotypes consistent with POU3F3-related disorder. Although immunoblotting using UDCs revealed no elongated POU3F3 proteins, the luciferase assay showed reduced activity with mutant 5'-UTR compared to the wild-type. Our study demonstrates that integrating GS or ES with UTRannotator is useful for identifying candidate 5'-UTR variants; however, the potential impact of predicted non-coding variants still requires careful experimental evaluation.
    DOI:  https://doi.org/10.1038/s10038-025-01446-7
  5. Nat Commun. 2026 Jan 06.
      Noncoding regions in eukaryotes are extensively expressed and represent a significant source of novel microproteins, some of which become fixed as de novo genes. However, the structural properties of these unevolved products and the features driving their fixation remain poorly understood. Particularly, the influence of nucleotide composition (GC content) on their structural properties and evolutionary trajectories is still unclear. Here, we predict the foldability and sequence properties of millions of microproteins potentially encoded in the noncoding open reading frames (ORFs) of 3,379 eukaryotic genomes with GC contents ranging from 18% to 79%. Depending on GC content, these microproteins exhibit distinct structural properties, suggesting different cellular impacts if non-genic regions are pervasively expressed. Using phylostratigraphy, de novo gene search, and ancestral sequence reconstruction, we trace the evolution of several hundred de novo proteins across 22 organisms with varying GC contents. We show that de novo genes preferentially emerge from GC-rich ORFs with folding potential, revealing that the interplay between GC content and foldability - rooted in the structure of the genetic code - shapes the emergence of novel genes.
    DOI:  https://doi.org/10.1038/s41467-025-68022-7
  6. Exp Mol Med. 2026 Jan 08.
      Accumulating evidence has revealed noncoding RNAs (ncRNAs) as versatile regulators in skeletal muscle development, extending beyond their canonical roles as nontranslating transcripts. Recent advancements in proteomics and translatomics have demonstrated that ncRNAs containing cryptic open reading frames can encode peptides/proteins. Here we systematically evaluate computational tools and databases for predicting ncRNA-encoded products, dissect the molecular mechanisms underlying their translation and synthesize the current landscape of ncRNA-derived peptides/proteins identified in skeletal muscle across species. We further discuss their emerging roles in myogenesis and potential clinical implications for muscle-related disorders. By highlighting the dual functionality of ncRNAs as both regulatory RNAs and peptide/protein precursors, this work provides a comprehensive resource for understanding the expanding complexity of skeletal muscle development and proposes novel therapeutic targets for muscle diseases.
    DOI:  https://doi.org/10.1038/s12276-025-01610-1
  7. Genomics. 2026 Jan 05. pii: S0888-7543(26)00002-9. [Epub ahead of print] 111194
      Skeletal muscle development is crucial for goat meat production. While most research focuses on transcriptional regulation, translational control is often overlooked. This study integrated transcriptomic data to analyze the translational landscape during myogenic differentiation of goat skeletal muscle satellite cells (SMSCs). We found that differentiation pathways were activated at both levels, with enhancement at translation. Furthermore, we identified 25 novel lncORFs and 36 circORFs with coding potential. Among these, LncORF32653 and LncORF98488 encoded micropeptides promoting SMSCs proliferation and differentiation. We also identified circUSP25, encoding circUSP25-177aa, which inhibited proliferation but promoted differentiation. Thus, lncORF32653-53aa, lncORF98488-98aa, and circUSP25-177aa are key regulators of myogenesis, revealing the potential of RNAs annotated as non-coding to encode functional micropeptides.
    Keywords:  Goat skeletal muscle satellite cells; Micropeptides; Translation landscape; circUSP25; lncRNA
    DOI:  https://doi.org/10.1016/j.ygeno.2026.111194
  8. Cancer Cell Int. 2026 Jan 07.
      Long non-coding RNAs (lncRNAs) are broad-spectrum cellular transcripts that can directly act as RNA regulators and/or partly encode functional peptides (lncRNA-encoded peptides, LRPs) in cancer cells. Recently, cancer LRPs have been found to be involved in cancer cell variability and proliferation, thus gaining widespread attention for their potential in cancer diagnosis, prognosis and therapy. As structures determine functions, the structural diversities of LRPs are the sources of functional variations of LRPs in cancers. Since 6135 cancer LRPs are listed in SPENCER database and 24 SPENCER-unlisted cancer LRPs are reported in several previous studies, this article reviews recent advances of cancer LRPs, analyzes amino acid compositions of them, and undertakes in silico evaluations to assess their structural and functional attributes. These LRPs are dominated by the amino acids Glu, Leu, and Ser and are rarer in the amino acids Cys, His, and Trp, and that many of the LRPs are rich in secondary or tertiary structures. Like mRNA-encoded peptides, these structure-rich cancer LRPs have a wide range of functions, including anti-cancer, cell-penetrating, anti-inflammatory, and antibacterial activities. Relatively, two groups of anticancer values (predicted by AntiCP 2.0 and PreTP-Stack) of these LRPs commonly showed positive and negative correlations with their total charge content and metal-bind aa content, respectively. The increasing amount of data and analysis on cancer LRPs, as reported here, offers opportunities to enhance practical cancer diagnosis and treatment, and to overcome remaining research challenges for cancer LRPs.
    Keywords:  Amino acid compositions; Cancer diagnosis and treatment; Cancer lncRNA-encoded peptides (cancer LRPs); Long non-coding RNAs (lncRNAs); Structures and functions
    DOI:  https://doi.org/10.1186/s12935-025-04158-2