bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2023–08–27
five papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Proteomics. 2023 Aug 21. e2100211
      Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional approaches are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.
    Keywords:  genome; mass spectrometry - LC-MS/MS; microprotein; sORF; structure
    DOI:  https://doi.org/10.1002/pmic.202100211
  2. Cell Rep. 2023 Aug 24. pii: S2211-1247(23)01006-9. [Epub ahead of print]42(9): 112995
      Investigation of translation in rare cell types or subcellular contexts is challenging due to large input requirements for standard approaches. Here, we present "nanoRibo-seq" an optimized approach using 102- to 103-fold less input material than bulk approaches. nanoRibo-seq exhibits rigorous quality control features consistent with quantification of ribosome protected fragments with as few as 1,000 cells. We compare translatomes of two closely related cortical neuron subtypes, callosal projection neurons (CPN) and subcerebral projection neurons (SCPN), during their early postnatal development. We find that, while translational efficiency is highly correlated between CPN and SCPN, several dozen mRNAs are differentially translated. We further examine upstream open reading frame (uORF) translation and identify that mRNAs involved in synapse organization and axon development are highly enriched for uORF translation in both subtypes. nanoRibo-seq enables investigation of translational regulation of rare cell types in vivo and offers a flexible approach for globally quantifying translation from limited input material.
    Keywords:  CP: Neuroscience; Ribo-seq; callosal projection neurons; cortical development; mRNA translation; molecular controls over neuronal diversity; ribosome; ribosome profiling; subcerebral projection neurons; translational regulation; upstream open reading frame (uORF)
    DOI:  https://doi.org/10.1016/j.celrep.2023.112995
  3. Biochem Biophys Res Commun. 2023 Aug 18. pii: S0006-291X(23)00983-X. [Epub ahead of print]678 68-77
      Circular RNAs (circRNAs) are a unique class of non-coding RNAs and were originally thought to have no protein-coding potential due to their lack of a 5' cap and 3' poly(A) tail. However, recent studies have challenged this notion and revealed that some circRNAs have protein-coding potential. They have emerged as a key area of interest in cancer and neurodegeneration research as recent studies have identified several circRNAs that can produce functional proteins with important roles in cancer progression. The protein-coding potential of circRNAs is determined by the presence of an open reading frame (ORF) within the circular structure that can encode a protein. In some cases, the ORF can be translated into a functional protein despite the lack of traditional mRNA features. While the protein-coding potential of most circRNAs remains unclear, several studies have identified specific circRNAs that can produce functional proteins. Understanding the protein-coding potential of circRNAs is important for unravelling their biological functions and potential roles in disease. Our review provides comprehensive coverage of recent advances in the field of circRNA protein-coding capacity and its impact on cancer and neurodegenerative diseases pathogenesis and progression.
    DOI:  https://doi.org/10.1016/j.bbrc.2023.08.037
  4. Proteomics. 2023 Aug 23. e2200421
      Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
    Keywords:  bioinformatics; bottom-up proteomics; databases; mass spectrometry; protein identification; proteogenomics; top-down proteomics
    DOI:  https://doi.org/10.1002/pmic.202200421
  5. Genes (Basel). 2023 Aug 17. pii: 1637. [Epub ahead of print]14(8):
      Advances in next-generation sequencing methodologies have facilitated the assembly of an ever-increasing number of genomes. Gene annotations are typically conducted via specialized software, but the most accurate results require additional manual curation that incorporates insights derived from functional and bioinformatic analyses (e.g., transcriptomics, proteomics, and phylogenetics). In this study, we improved the annotation of the Leishmania donovani (strain HU3) genome using publicly available data from the deep sequencing of ribosome-protected mRNA fragments (Ribo-Seq). As a result of this analysis, we uncovered 70 previously non-annotated protein-coding genes and improved the annotation of around 600 genes. Additionally, we present evidence for small upstream open reading frames (uORFs) in a significant number of transcripts, indicating their potential role in the translational regulation of gene expression. The bioinformatics pipelines developed for these analyses can be used to improve the genome annotations of other organisms for which Ribo-Seq data are available. The improvements provided by these studies will bring us closer to the ultimate goal of a complete and accurately annotated L. donovani genome and will enhance future transcriptomics, proteomics, and genetics studies.
    Keywords:  Leishmania; Ribo-seq; genome; ribosome profiling; transcriptome; uORFs
    DOI:  https://doi.org/10.3390/genes14081637