bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2022–12–25
four papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Cell Rep. 2022 Dec 20. pii: S2211-1247(22)01696-5. [Epub ahead of print]41(12): 111808
      Small open reading frames (sORFs) can encode functional "microproteins" that perform crucial biological tasks. However, their size makes them less amenable to genomic analysis, and their origins and conservation are poorly understood. Given their short length, it is plausible that some of these functional microproteins have recently originated entirely de novo from noncoding sequences. Here we sought to identify such cases in the human lineage by reconstructing the evolutionary origins of human microproteins previously found to have measurable, statistically significant fitness effects. By tracing the formation of each ORF and its transcriptional activation, we show that novel microproteins with significant phenotypic effects have emerged de novo throughout animal evolution, including two after the human-chimpanzee split. Notably, traditional methods for assessing coding potential would miss most of these cases. This evidence demonstrates that the functional potential intrinsic to sORFs can be relatively rapidly and frequently realized through de novo gene emergence.
    Keywords:  CP: Molecular biology; de novo genes; evolution; evolutionary genomics; functional genomics; human; human novel genes; micropeptides; microproteins; noncanonical ORFs; small ORFs
    DOI:  https://doi.org/10.1016/j.celrep.2022.111808
  2. Nat Commun. 2022 Dec 23. 13(1): 7910
      The synthesis of most proteins begins at AUG codons, yet a small number of non-AUG initiated proteoforms are also known. Here we analyse a large number of publicly available Ribo-seq datasets to identify novel, previously uncharacterised non-AUG proteoforms using Trips-Viz implementation of a novel algorithm for detecting translated ORFs. In parallel we analyse genomic alignment of 120 mammals to identify evidence of protein coding evolution in sequences encoding potential extensions. Unexpectedly we find that the number of non-AUG proteoforms identified with ribosome profiling data greatly exceeds those with strong phylogenetic support suggesting their recent evolution. Our study argues that the protein coding potential of human genome greatly exceeds that detectable through comparative genomics and exposes the existence of multiple proteins encoded by the same genomic loci.
    DOI:  https://doi.org/10.1038/s41467-022-35595-6
  3. Cancers (Basel). 2022 Dec 07. pii: 6031. [Epub ahead of print]14(24):
      Recent technological advances have facilitated the detection of numerous non-canonical human peptides derived from regulatory regions of mRNAs, long non-coding RNAs, and other cryptic transcripts. In this review, we first give an overview of the classification of these novel peptides and summarize recent improvements in their annotation and detection by ribosome profiling, mass spectrometry, and individual experimental analysis. A large fraction of the novel peptides originates from translation at upstream open reading frames (uORFs) that are located within the transcript leader sequence of regular mRNA. In humans, uORF-encoded peptides (uPeptides) have been detected in both healthy and malignantly transformed cells and emerge as important regulators in cellular and immunological pathways. In the second part of the review, we focus on various functional implications of uPeptides. As uPeptides frequently act at the transition of translational regulation and individual peptide function, we describe the mechanistic modes of translational regulation through ribosome stalling, the involvement in cellular programs through protein interaction and complex formation, and their role within the human leukocyte antigen (HLA)-associated immunopeptidome as HLA uLigands. We delineate how malignant transformation may lead to the formation of novel uORFs, uPeptides, or HLA uLigands and explain their potential implication in tumor biology. Ultimately, we speculate on a potential use of uPeptides as peptide drugs and discuss how uPeptides and HLA uLigands may facilitate translational inhibition of oncogenic protein messages and immunotherapeutic approaches in cancer therapy.
    Keywords:  HLA uLigands; cancer; immunotherapy; non-canonical peptides; translation; uORFs; uPeptides
    DOI:  https://doi.org/10.3390/cancers14246031
  4. Nucleic Acids Res. 2022 Dec 22. pii: gkac1175. [Epub ahead of print]
      During initiation, the ribosome is tasked to efficiently recognize open reading frames (ORFs) for accurate and fast translation of mRNAs. A critical step is start codon recognition, which is modulated by initiation factors, mRNA structure, a Shine Dalgarno (SD) sequence and the start codon itself. Within the Escherichia coli genome, we identified more than 50 annotated initiation sites harboring AUGUG or GUGUG sequence motifs that provide two canonical start codons, AUG and GUG, in immediate proximity. As these sites may challenge start codon recognition, we studied if and how the ribosome is accurately guided to the designated ORF, with a special focus on the SD sequence as well as adenine at the fourth coding sequence position (A4). By in vitro and in vivo experiments, we characterized key requirements for unambiguous start codon recognition, but also discovered initiation sites that lead to the translation of both overlapping reading frames. Our findings corroborate the existence of an ambiguous translation initiation mechanism, implicating a multitude of so far unrecognized ORFs and translation products in bacteria.
    DOI:  https://doi.org/10.1093/nar/gkac1175