bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2024‒10‒20
seven papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Mol Cell Proteomics. 2024 Oct 15. pii: S1535-9476(24)00150-6. [Epub ahead of print] 100860
      sORF-encoded peptides (SEPs) refer to proteins encoded by small open reading frames (sORFs) with a length of less than 100 amino acids, which play an important role in various life activities. Analysis of known SEPs showed that using non-canonical initiation codons of SEPs was more common. However, the current analysis of SEP sequences mainly relies on bioinformatics prediction, and most of them use AUG as the start site, which may not be completely correct for SEPs. Chemical labeling was used to systematically analyze the N-terminal sequences of SEPs to accurately define the start sites of SEPs. By comparison, we found that dimethylation and guanidinylation are more efficient than acetylation. The ACN precipitation and heating precipitation performed better in SEP enrichment. As an N-terminal peptide enrichment material, Hexadhexaldehyde was superior to CNBr-activated agarose and NHS-activated agarose. Combining these methods, we identified 128 SEPs with 131 N-terminal sequences. Among them, two-thirds are novel N-terminal sequences, and most of them start from the 11-31st amino acids of the original sequence. Partial novel N-termini were produced by proteolysis or signal peptide removal. Some SEPs' transcription start sites were corrected to be non-AUG start codons. One novel start codon was validated using GFP-tag vectors. These results demonstrated that the chemical labeling approaches would be beneficial for identifying the start codons of sORFs and the real N-terminal of their encoded peptides, which helps better understand the characterization of SEPs.
    Keywords:  N-terminomics; chemical labeling; sORF-encoded peptides; signal peptides; start codons
    DOI:  https://doi.org/10.1016/j.mcpro.2024.100860
  2. Trends Genet. 2024 Oct 14. pii: S0168-9525(24)00212-9. [Epub ahead of print]
      Small proteins are ubiquitous in all kingdoms of life. MicroProteins, initially characterized as small proteins with protein interaction domains that enable them to interact with larger multidomain proteins, frequently modulate the function of these proteins. The study of these small proteins has contributed to a greater comprehension of protein regulation. In addition to sequence homology, sequence-divergent small proteins have the potential to function as microProtein mimics, binding to structurally related proteins. Moreover, a multitude of other small proteins encoded by short open reading frames (sORFs) and peptides, derived from diverse sources such as long noncoding RNAs (lncRNAs) and miRNAs, contribute to a variety of biological processes. The potential of small proteins is evident, offering promising avenues for bioengineering that could revolutionize crop performance and reduce reliance on agrochemicals in future agriculture.
    Keywords:  lncRNA; microProteins; sORFs; transcription factor
    DOI:  https://doi.org/10.1016/j.tig.2024.09.004
  3. Cells. 2024 Oct 02. pii: 1645. [Epub ahead of print]13(19):
      Recently developed experimental and computational approaches to identify putative coding small ORFs (smORFs) in genomes have revealed thousands of smORFs localized within coding and non-coding RNAs. They can be translated into smORF peptides or microproteins, which are defined as less than 100 amino acids in length. The identification of such a large number of potential biological regulators represents a major challenge, notably for elucidating the in vivo functions of these microproteins. Since the emergence of this field, Drosophila has proved to be a valuable model for studying the biological functions of microproteins in vivo. In this review, we outline how the smORF field emerged and the nomenclature used in this domain. We summarize the technical challenges associated with identifying putative coding smORFs in the genome and the relevant translated microproteins. Finally, recent findings on one of the best studied smORF peptides, Pri, and other microproteins studied so far in Drosophila are described. These studies highlight the diverse roles that microproteins can fulfil in the regulation of various molecular targets involved in distinct cellular processes during animal development and physiology. Given the recent emergence of the microprotein field and the associated discoveries, the microproteome represents an exquisite source of potentially bioactive molecules, whose in vivo biological functions can be explored in the Drosophila model.
    Keywords:  Drosophila; development; microproteins; peptides; pri; smORF; small ORF; tal
    DOI:  https://doi.org/10.3390/cells13191645
  4. Interdiscip Sci. 2024 Oct 14.
      The primary microRNAs (pri-miRNAs) have been observed to contain translatable small open reading frames (sORFs) that can encode peptides as an independent element. Relevant studies have proven that those of sORFs are of significance in regulating the expression of biological traits. The existing methods for predicting the coding potential of sORFs frequently overlook this data or categorize them as negative samples, impeding the identification of additional translatable sORFs in pri-miRNAs. In light of this, a novel method named misORFPred has been proposed. Specifically, an enhanced scalable k-mer (ESKmer) that simultaneously integrates the composition information within a sequence and distance information between sequences is designed to extract the nucleotide sequence features. After feature selection, the optimal features and several machine learning classifiers are combined to construct the ensemble model, where a newly devised dynamic ensemble voting strategy (DEVS) is proposed to dynamically adjust the weights of base classifiers and adaptively select the optimal base classifiers for each unlabeled sample. Cross-validation results suggest that ESKmer and DEVS are essential for this classification task and could boost model performance. Independent testing results indicate that misORFPred outperforms the state-of-the-art methods. Furthermore, we execute misORFPerd on the genomes of various plant species and perform a thorough analysis of the predicted outcomes. Taken together, misORFPred is a powerful tool for identifying the translatable sORFs in plant pri-miRNAs and can provide highly trusted candidates for subsequent biological experiments.
    Keywords:  Ensemble voting; Pri-miRNAs; k-mer; sORFs
    DOI:  https://doi.org/10.1007/s12539-024-00661-8
  5. Genome Biol. 2024 Oct 14. 25(1): 268
      BACKGROUND: Pervasive translation is a widespread phenomenon that plays a critical role in the emergence of novel microproteins, but the diversity of translation patterns contributing to their generation remains unclear. Based on 54 ribosome profiling (Ribo-Seq) datasets, we investigated the yeast Ribo-Seq landscape using a representation framework that allows the comprehensive inventory and classification of the entire diversity of Ribo-Seq signals, including non-canonical ones.RESULTS: We show that if coding regions occupy specific areas of the Ribo-Seq landscape, noncoding regions encompass a wide diversity of Ribo-Seq signals and, conversely, populate the entire landscape. Our results show that pervasive translation can, nevertheless, be associated with high specificity, with 1055 noncoding ORFs exhibiting canonical Ribo-Seq signals. Using mass spectrometry under standard conditions or proteasome inhibition with an in-house analysis protocol, we report 239 microproteins originating from noncoding ORFs that display canonical but also non-canonical Ribo-Seq signals. Each condition yields dozens of additional microprotein candidates with comparable translation properties, suggesting a larger population of volatile microproteins that are challenging to detect. Our findings suggest that non-canonical translation signals may harbor valuable information and underscore the significance of considering them in proteogenomic studies. Finally, we show that the translation outcome of a noncoding ORF is primarily determined by the initiating codon and the codon distribution in its two alternative frames, rather than features indicative of functionality.
    CONCLUSION: Our results enable us to propose a topology of a species' Ribo-Seq landscape, opening the way to comparative analyses of this translation landscape under different conditions.
    Keywords:  De novo coding products; Genome evolution; Non-canonical translation signals; Noncoding genome; Pervasive translation
    DOI:  https://doi.org/10.1186/s13059-024-03403-7
  6. Brief Bioinform. 2024 Sep 23. pii: bbae510. [Epub ahead of print]25(6):
      Advancements in peptidomics have revealed numerous small open reading frames with coding potential and revealed that some of these micropeptides are closely related to human cancer. However, the systematic analysis and integration from sequence to structure and function remains largely undeveloped. Here, as a solution, we built a workflow for the collection and analysis of proteomic data, transcriptomic data, and clinical outcomes for cancer-associated micropeptides using publicly available datasets from large cohorts. We initially identified 19 586 novel micropeptides by reanalyzing proteomic profile data from 3753 samples across 8 cancer types. Further quantitative analysis of these micropeptides, along with associated clinical data, identified 3065 that were dysregulated in cancer, with 370 of them showing a strong association with prognosis. Moreover, we employed a deep learning framework to construct a micropeptide-protein interaction network for further bioinformatics analysis, revealing that micropeptides are involved in multiple biological processes as bioactive molecules. Taken together, our atlas provides a benchmark for high-throughput prediction and functional exploration of micropeptides, providing new insights into their biological mechanisms in cancer. The HMPA is freely available at http://hmpa.zju.edu.cn.
    Keywords:  functional annotation; mass spectrometry; micropeptide database; nonclassical peptidome; structure prediction
    DOI:  https://doi.org/10.1093/bib/bbae510
  7. Mol Oncol. 2024 Oct 17.
      Glioblastoma (GB), the most common and aggressive brain tumor, demonstrates intrinsic resistance to current therapies, resulting in poor clinical outcomes. Cancer progression can be partially attributed to the deregulation of protein translation mechanisms that drive cancer cell growth. In this study, we present the translatome landscape of GB as a valuable data resource. Eight patient-derived GB sphere cultures (GSCs) were analyzed using ribosome profiling and messenger RNA (mRNA) sequencing. We investigated inter-cell-line differences through differential expression analysis at both the translatome and transcriptome levels. Translational changes post-radiotherapy were assessed at 30 and 60 min. The translation of non-coding RNAs (ncRNAs) was validated using in-house and public mass spectrometry (MS) data, whereas RNA expression was confirmed by quantitative PCR (qPCR). Our findings demonstrate that ribosome sequencing provides more detailed information than MS or transcriptional analyses. Transcriptional similarities among GSCs correlate with translational similarities, aligning with previously defined subtypes such as proneural and mesenchymal. Additionally, we identified a broad spectrum of open reading frame types in both coding and non-coding mRNA regions, including long non-coding RNAs (lncRNAs) and pseudogenes undergoing active translation. Translation of ncRNAs into peptides was independently confirmed by in-house data and external MS data. We also observed that translational regulation of histones (downregulated) and splicing factors (upregulated) occurs in response to radiotherapy. These data offer new insights into genome-wide protein synthesis, identifying translationally regulated genes and alternative translation initiation sites in GB under normal and radiotherapeutic conditions, providing a rich resource for GB research. Further functional validation of differentially expressed genes after radiotherapy is needed. Understanding translational control in GB can reveal mechanistic insights and identify currently unknown biomarkers, ultimately enhancing the diagnosis and treatment of this aggressive brain cancer.
    Keywords:  glioblastoma; non‐coding RNA; radioresistance; radiotherapy; translatome
    DOI:  https://doi.org/10.1002/1878-0261.13743