bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2020‒09‒27
four papers selected by
Thomas Martinez
Salk Institute for Biological Studies


  1. J Proteomics. 2020 Sep 16. pii: S1874-3919(20)30356-0. [Epub ahead of print] 103988
      Short open reading frame-encoded peptides (SEP) represent a widely undiscovered part of the proteome. The detailed analysis of SEP has, despite inherent limitations such as incomplete sequence coverage, challenges encountered with protein inference, the identification of posttranslational modifications and the assignment of potential N- and C-terminal truncations, predominantly been assessed using bottom-up proteomic workflows. The use of top-down based proteomic workflows is capable of providing an unparalleled level of characterization information, which is of increased importance in the case of alternatively encoded protein products. However, top-down based analysis is not without its own limitations, for which efficient separation prior to MS analysis is a major issue. We established a sample preparation approach for the combined bottom-up and top-down proteomic analysis of SEP. Key improvements were made by the application of solid phase extraction (SPE), which supported enrichment of proteins below ca. 20 kDa, followed by 2D-LC-MS top-down analysis encompassing both HCD and EThcD ion activation. Bottom-up experiments were used to support and confirm top-down data interpretation. This strategy allowed for the top-down characterization of 36 proteoforms mapping to 12 SEP from the archaeon Methanosarcina mazei strain Gö1, with the concurrent detection and identification of several posttranslational modifications in SEP. BIOLOGICAL SIGNIFICANCE: Small or short open reading frames (sORF) have been widely neglected in genome research in the past. With their increasing discovery, the question about the presence and molecular function of their translation products, the short open reading frame-encoded peptides (SEP), arises. As these small proteins are usually below the 10 kDa range, the number of peptides identifiable by bottom-up proteomics is limited which hampers both the identification and the recognition of potential posttranslational modifications. The presented top-down approach allowed for the detection of full length SEP, as well as of terminally truncated proteoforms, and further enabled the identification of disulfide bonds in these small proteins. This demonstrates, that this yet widely undiscovered part of the proteome undergoes the same modifications as classical proteins which is an essential step for future understanding of the biological functions of these molecules.
    Keywords:  Disulfide; Microprotein; Short open reading frame; Small open reading frame; Terminomics; Top down
    DOI:  https://doi.org/10.1016/j.jprot.2020.103988
  2. Noncoding RNA. 2020 Sep 23. pii: E41. [Epub ahead of print]6(4):
      Non-coding RNAs (ncRNAs) are essential players in many cellular processes, from normal development to oncogenic transformation. Initially, ncRNAs were defined as transcripts that lacked an open reading frame (ORF). However, multiple lines of evidence suggest that certain ncRNAs encode small peptides of less than 100 amino acids. The sequences encoding these peptides are known as small open reading frames (smORFs), many initiating with the traditional AUG start codon but terminating with atypical stop codons, suggesting a different biogenesis. The ncRNA-encoded peptides (ncPEPs) are gradually becoming appreciated as a new class of functional molecules that contribute to diverse cellular processes, and are deregulated in different diseases contributing to pathogenesis. As multiple publications have identified unique ncPEPs, we appreciated the need for assembling a new web resource that could gather information about these functional ncPEPs. We developed FuncPEP, a new database of functional ncRNA encoded peptides, containing all experimentally validated and functionally characterized ncPEPs. Currently, FuncPEP includes a comprehensive annotation of 112 functional ncPEPs and specific details regarding the ncRNA transcripts that encode these peptides. We believe that FuncPEP will serve as a platform for further deciphering the biologic significance and medical use of ncPEPs.
    Keywords:  long non-coding RNAs; micropeptides; ncRNA translation; ncRNA-encoded peptides; non-coding RNAs; small open reading frames; small peptides
    DOI:  https://doi.org/10.3390/ncrna6040041
  3. Genome Res. 2020 Sep 24.
      Translation initiation is a key step determining protein synthesis. Studies have uncovered a number of alternative translation initiation sites (TISs) in mammalian mRNAs and showed their roles in reshaping the proteome. However, the extent to which alternative TISs affect gene expression across plants remains largely unclear. Here, by profiling initiating ribosome positions, we globally identified in vivo TISs in tomato and Arabidopsis and found thousands of genes with more than one TIS. Of the identified TISs, >19% and >20% were located at unannotated AUG and non-AUG sites, respectively. CUG and ACG were the most frequently observed codons at non-AUG TISs, a phenomenon also found in mammals. In addition, although alternative TISs were usually found in both orthologous genes, the TIS sequences were not conserved, suggesting the conservation of alternative initiation mechanisms but flexibility in using TISs. Unlike upstream AUG TISs, the presence of upstream non-AUG TISs was not correlated with the translational repression of main open reading frames, a pattern observed across plants. Also, the generation of proteins with diverse N-terminal regions through the use of alternative TISs contributes to differential subcellular localization, as mutating alternative TISs resulted in the loss of organelle localization. Our findings uncovered the hidden coding potential of plant genomes and, importantly, the constraint and flexibility of translational initiation mechanisms in the regulation of gene expression across plant species.
    DOI:  https://doi.org/10.1101/gr.261834.120
  4. Nucleic Acids Res. 2020 Sep 21. pii: gkaa704. [Epub ahead of print]
      Circular RNAs (circRNAs) encompass a widespread and conserved class of RNAs, which are generated by back-splicing of downstream 5' to upstream 3' splice sites. CircRNAs are tissue-specific and have been implicated in diseases including cancer. They can function as sponges for microRNAs (miRNAs) or RNA binding proteins (RBPs), for example. Moreover, some contain open reading frames (ORFs) and might be translated. The functional relevance of such peptides, however, remains largely elusive. Here, we report that the ORF of circZNF609 is efficiently translated when expressed from a circZNF609 overexpression construct. However, endogenous proteins could not be detected. Moreover, initiation of circZNF609 translation is independent of m6A-generating enzyme METTL3 or RNA sequence elements such as internal ribosome entry sites (IRESs). Surprisingly, a comprehensive mutational analysis revealed that deletion constructs, which are deficient in producing circZNF609, still generate the observed protein products. This suggests that the apparent circZNF609 translation originates from trans-splicing by-products of the overexpression plasmids and underline that circRNA overexpression constructs need to be evaluated carefully, particularly when functional studies are performed.
    DOI:  https://doi.org/10.1093/nar/gkaa704