bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2021‒06‒20
three papers selected by
Thomas Martinez
Salk Institute for Biological Studies


  1. Nucleic Acids Res. 2021 Jun 14. pii: gkab477. [Epub ahead of print]
      Emerging evidence places small proteins (≤50 amino acids) more centrally in physiological processes. Yet, their functional identification and the systematic genome annotation of their cognate small open-reading frames (smORFs) remains challenging both experimentally and computationally. Ribosome profiling or Ribo-Seq (that is a deep sequencing of ribosome-protected fragments) enables detecting of actively translated open-reading frames (ORFs) and empirical annotation of coding sequences (CDSs) using the in-register translation pattern that is characteristic for genuinely translating ribosomes. Multiple identifiers of ORFs that use the 3-nt periodicity in Ribo-Seq data sets have been successful in eukaryotic smORF annotation. They have difficulties evaluating prokaryotic genomes due to the unique architecture (e.g. polycistronic messages, overlapping ORFs, leaderless translation, non-canonical initiation etc.). Here, we present a new algorithm, smORFer, which performs with high accuracy in prokaryotic organisms in detecting putative smORFs. The unique feature of smORFer is that it uses an integrated approach and considers structural features of the genetic sequence along with in-frame translation and uses Fourier transform to convert these parameters into a measurable score to faithfully select smORFs. The algorithm is executed in a modular way, and dependent on the data available for a particular organism, different modules can be selected for smORF search.
    DOI:  https://doi.org/10.1093/nar/gkab477
  2. Proteomics. 2021 Jun 19. e2100008
      The recent discovery of alternative open reading frames creates a need for suitable analytical approaches to verify their translation and to characterize the corresponding gene products at the molecular level. As the analysis of small proteins within a background proteome by means of classical bottom-up proteomics is challenging, method development for the analysis of short open reading frame-encoded peptides (SEPs) have become a focal point for research. Here we highlight bottom-up and top-down proteomics approaches established for the analysis of SEPs in both pro- and eukaryotes. Major steps of analysis, including sample preparation and (small) proteome isolation, separation and mass spectrometry, data interpretation and quality control, quantification, the analysis of posttranslational modifications and exploration of functional aspects of the SEPs by means of proteomics technologies are described. These methods do not exclusively cover the analytics of SEPs but simultaneously include the low molecular weight proteome and, moreover, can also be used for the proteome-wide analysis of proteolytic processing events. This article is protected by copyright. All rights reserved.
    Keywords:  LC-MS/MS < Technology, peptidomics < Technology, sample preparation < Technology, top-down proteomics < Technology, prefractionation < Technology, post-translational modification analysis < Technology ; mass spectrometry
    DOI:  https://doi.org/10.1002/pmic.202100008
  3. Mol Cell Proteomics. 2021 Jun 12. pii: S1535-9476(21)00081-5. [Epub ahead of print] 100109
      Many small open reading frames (smORFs) embedded in lncRNA transcripts have been shown to encode biologically functional polypeptides (smORFs-encoded polypeptides, SEPs) in different organisms. Despite significant advances in genomics, bioinformatics and proteomics that largely enabled the discovery of novel SEPs, their identification across different biological samples is still hampered by their poor predictability, diminutive size and low relative abundance. Here, we take advantage of NONCODE, a repository containing the most complete collection and annotation of lncRNA transcripts from different species, to build a novel database that attempts to maximize a collection of SEPs from human and mouse lncRNA transcripts. In order to further improve SEP discovery, we implemented two effective and complementary polypeptide enrichment strategies, 30 kDa MWCO filter and C8 SPE column. These combined strategies enabled us to discover 353 and 409 SEPs from, respectively, 8 human cell lines, and 3 mouse cell lines and 8 mouse tissues. Importantly, nineteen of the identified SEPs were then verified through in-vitro expression, immunoblotting, parallel reaction monitoring (PRM) and synthetic peptides. Subsequent bioinformatic analysis revealed that some of the physical and chemical properties of these novel SEPs, including amino acid composition and codon usage, are different from those commonly found in canonical proteins. Intriguingly, nearly 65% of the identified SEPs were found to be initiated with non-AUG start codons. Overall, the strategy presented in this study encompasses an efficient workflow that enabled us to identify 762 novel SEPs across multiple cell lines and tissues, which probably represents the largest number of SEPs detected by mass spectrometry reported to date. These novel SEPs might not only provide new clues for the annotation of noncoding elements in the genome but can also serve as a valuable resource for the functional characterization of individual SEPs.
    Keywords:  Long noncoding RNA (lncRNA); NONCODE database; enrichment; mass spectrometry; smORF encoded polypeptides (SEPs)
    DOI:  https://doi.org/10.1016/j.mcpro.2021.100109