bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2020‒05‒31
seven papers selected by
Thomas Martinez
Salk Institute for Biological Studies


  1. Nat Commun. 2020 May 27. 11(1): 2523
    Whiffin N, Karczewski KJ, Zhang X, Chothani S, Smith MJ, Evans DG, Roberts AM, Quaife NM, Schafer S, Rackham O, Alföldi J, O'Donnell-Luria AH, Francioli LC, , , Cook SA, Barton PJR, MacArthur DG, Ware JS.
      Upstream open reading frames (uORFs) are tissue-specific cis-regulators of protein translation. Isolated reports have shown that variants that create or disrupt uORFs can cause disease. Here, in a systematic genome-wide study using 15,708 whole genome sequences, we show that variants that create new upstream start codons, and variants disrupting stop sites of existing uORFs, are under strong negative selection. This selection signal is significantly stronger for variants arising upstream of genes intolerant to loss-of-function variants. Furthermore, variants creating uORFs that overlap the coding sequence show signals of selection equivalent to coding missense variants. Finally, we identify specific genes where modification of uORFs likely represents an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in neurofibromatosis. Our results highlight uORF-perturbing variants as an under-recognised functional class that contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data in studying non-coding variant classes.
    DOI:  https://doi.org/10.1038/s41467-019-10717-9
  2. Genome Biol. 2020 May 29. 21(1): 128
    Patraquim P, Mumtaz MAS, Pueyo JI, Aspden JL, Couso JP.
      BACKGROUND: Ribosomal profiling has revealed the translation of thousands of sequences outside annotated protein-coding genes, including small open reading frames of less than 100 codons, and the translational regulation of many genes. Here we present an improved version of Poly-Ribo-Seq and apply it to Drosophila melanogaster embryos to extend the catalog of in vivo translated small ORFs, and to reveal the translational regulation of both small and canonical ORFs from mRNAs across embryogenesis.RESULTS: We obtain highly correlated samples across five embryonic stages, with nearly 500 million putative ribosomal footprints mapped to mRNAs, and compare them to existing Ribo-Seq and proteomic data. Our analysis reveals, for the first time in Drosophila, footprints mapping to codons in a phased pattern, the hallmark of productive translation. We propose a simple binomial probability metric to ascertain translation probability. Our results also reveal reproducible ribosomal binding apparently not resulting in productive translation. This non-productive ribosomal binding seems to be especially prevalent amongst upstream short ORFs located in the 5' mRNA leaders, and amongst canonical ORFs during the activation of the zygotic translatome at the maternal-to zygotic transition.
    CONCLUSIONS: We suggest that this non-productive ribosomal binding might be due to cis-regulatory ribosomal binding and to defective ribosomal scanning of ORFs outside periods of productive translation. Our results are compatible with the main function of upstream short ORFs being to buffer the translation of canonical canonical ORFs; and show that, in general, small ORFs in mRNAs display markers compatible with an evolutionary transitory state towards full coding function.
    Keywords:  Maternal to zygotic transition; Non-canonical translation; Poly-Ribo-Seq; Regulation of translation; Ribosomal binding; Ribosomal profiling; sORFs; smORFs; uORFs
    DOI:  https://doi.org/10.1186/s13059-020-02011-5
  3. J Proteome Res. 2020 May 24.
    Cao X, Khitun A, Na Z, Dumitrescu DG, Kubica M, Olatunji E, Slavoff SA.
      Ribosome profiling and mass spectrometry have revealed thousands of small and alternative open reading frames (sm/alt-ORFs) that are translated into polypeptides variously termed microproteins and alt-proteins in mammalian cells. Some micro-/alt-proteins exhibit stress-, cell type- and/or tissue-specific expression, and understanding this regulated expression will be critical to elucidating their functions. While differential translation has been inferred by ribosome profiling, quantitative mass spectrometry-based proteomics is needed for direct measurement of microprotein and alt-protein expression between samples and conditions. However, while label-free quantitative proteomics has been applied to detect stress-dependent expression of bacterial microproteins, this approach has not yet been demonstrated for analysis of differential expression of unannotated ORFs in the more complex human proteome. Here, we present global micro-/alt-protein quantitation in two human leukemia cell lines, K562 and MOLT4. We identify 12 unannotated proteins that are differentially expressed in these cell lines. The expression of six micro/alt-proteins was validated biochemically, and two were found to localize to the nucleus. Thus, we demonstrate that label-free comparative proteomics enables quantitation of micro-/alt-protein expression between human cell lines. We anticipate that this workflow will enable discovery of regulated sm/alt-ORF products across many biological conditions in human cells.
    DOI:  https://doi.org/10.1021/acs.jproteome.0c00254
  4. Mol Plant. 2020 May 20. pii: S1674-2052(20)30147-7. [Epub ahead of print]
    Wang S, Tian L, Liu H, Li X, Zhang J, Chen X, Jia X, Zheng X, Wu S, Chen Y, Yan J, Wu L.
      Non-conventional peptides (NCPs), which include small open reading frame-encoded peptides, play critical roles in fundamental biological processes. Here we developed an integrated peptidogenomic pipeline using high-throughput mass spectra to probe a customized six-frame translation database and applied it to large-scale identification of NCPs in plants. Altogether, 1,993 and 1,860 NCPs were unambiguously identified in maize and Arabidopsis, respectively. The NCPs showed distinct characteristics compared to conventional peptides (CPs) and were derived from introns, 3'UTRs, 5'UTRs, junctions and intergenic regions. These results revealed that translation events in unannotated transcripts occurred more broadly than previously thought. In addition, maize NCPs were found to be enriched within regions associated with phenotypic variations and domestication selection, indicating their potential function in plant genetic regulations of complex traits and evolution. Summarily, this study provides an unbiased and global view of plant NCPs. The identification of large-scale NCPs in both monocot and dicot plants reveals that a much larger portion of the plant genome can be translated to biologically functional molecules, which has important implications in functional genomic studies. The present study also provides a useful resource for the characterization of more hidden NCPs in other plants.
    Keywords:  mass spectrometry; non-conventional peptides; peptidogenomics; plants; six-frame translation; small open reading frames
    DOI:  https://doi.org/10.1016/j.molp.2020.05.012
  5. J Hepatol. 2020 May 24. pii: S0168-8278(20)30347-0. [Epub ahead of print]
    Pang Y, Liu Z, Han H, Wang B, Li W, Mao C, Liu S.
      BACKGROUD & AIMS: A substantial proportion of non-coding RNAs (ncRNAs) with small open reading frames (smORFs) are indeed translated to short peptides. It is unclear where and how short peptides promote hepatocellular carcinoma (HCC) development.METHODS: We performed RNA-immunoprecipitation followed by high-throughput sequencing (RIP-seq) assay with an antibody against ribosomal protein S6 (RPS6) on four cancer cell lines. Focusing on one lncRNA, LINC00998, we used qPCR and public databases to evaluate its expression level in HCC patients. Special vectors were constructed to confirm its coding potential. We also explored the function and mechanism of LINC00998-encoded peptide in tumor growth and metastasis.
    RESULTS: We discovered lots of lncRNAs binding to RPS6 in cancer cells. One of these lncRNAs, LINC00998, encoded one small endogenous peptide, termed SMIM30. SMIM30, rather than the RNA itself, promoted the HCC tumorigenesis by modulating cell proliferation and migration and its level was correlated with the poor survival rate of HCC patients. Furthermore, SMIM30 was transcribed by c-Myc and then drove the membrane anchoring of non-receptor tyrosine kinases-SRC/YES1. Moreover, the downstream MAPK signaling pathway was activated by SRC/YES1.
    CONCLUSIONS: Our results not only unravel a new mechanism of HCC tumorigenesis promoted by ncRNA-encoded peptides, but also suggest that the peptides can serve as a new target for HCC cancer therapy and a new biomarker for HCC diagnosis and prognosis.
    Keywords:  HCC; LINC00998; Peptide; SMIM30; ncRNAs
    DOI:  https://doi.org/10.1016/j.jhep.2020.05.028
  6. BMC Microbiol. 2020 May 24. 20(1): 130
    Eckert I, Weinberg Z.
      BACKGROUND: RNAs perform many functions in addition to supplying coding templates, such as binding proteins. RNA-protein interactions are important in multiple processes in all domains of life, and the discovery of additional protein-binding RNAs expands the scope for studying such interactions. To find such RNAs, we exploited a form of ribosomal regulation. Ribosome biosynthesis must be tightly regulated to ensure that concentrations of rRNAs and ribosomal proteins (r-proteins) match. One regulatory mechanism is a ribosomal leader (r-leader), which is a domain in the 5' UTR of an mRNA whose genes encode r-proteins. When the concentration of one of these r-proteins is high, the protein binds the r-leader in its own mRNA, reducing gene expression and thus protein concentrations. To date, 35 types of r-leaders have been validated or predicted.RESULTS: By analyzing additional conserved RNA structures on a multi-genome scale, we identified 20 novel r-leader structures. Surprisingly, these included new r-leaders in the highly studied organisms Escherichia coli and Bacillus subtilis. Our results reveal several cases where multiple unrelated RNA structures likely bind the same r-protein ligand, and uncover previously unknown r-protein ligands. Each r-leader consistently occurs upstream of r-protein genes, suggesting a regulatory function. That the predicted r-leaders function as RNAs is supported by evolutionary correlations in the nucleotide sequences that are characteristic of a conserved RNA secondary structure. The r-leader predictions are also consistent with the locations of experimentally determined transcription start sites.
    CONCLUSIONS: This work increases the number of known or predicted r-leader structures by more than 50%, providing additional opportunities to study structural and evolutionary aspects of RNA-protein interactions. These results provide a starting point for detailed experimental studies.
    Keywords:  Bioinformatics; Comparative genomics; RNA-protein interaction; Ribosomal leader; cis-regulatory RNA
    DOI:  https://doi.org/10.1186/s12866-020-01823-6
  7. Clin Transl Oncol. 2020 May 24.
    Shi Y, Jia X, Xu J.
      Circular RNAs (circRNAs) have been considered a special class of non-coding RNAs without 5' caps and 3' tails which are covalently closed RNA molecules generated by back splicing of mRNA. For a long time, circRNAs have been considered to be directly involved in various biological processes as functional RNA. In recent years, a variety of circRNAs have been found to have translational functions, and the resultant peptides also play biological roles in the emergence and progression of human disease. The discovery of these circRNAs and their encoded peptides has enriched genomics, helped us to study the causes of diseases, and promoted the development of biotechnology. The purpose of this review is to summarize the research progress of the detection methods, translation initiation mechanism, as well as functional mechanism of peptides encoded by circRNAs, with the goal of providing the directions for the discovery of biomarkers for diagnosis, prognosis, and therapeutic targets for human disease.
    Keywords:  Circular RNA; Functions; Peptides; Translation
    DOI:  https://doi.org/10.1007/s12094-020-02371-1