bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2024–12–01
eight papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Trends Genet. 2024 Nov 26. pii: S0168-9525(24)00263-4. [Epub ahead of print]
      Hundreds of thousands of small open reading frames (smORFs) of less than 100 codons exist in every genome, especially in long noncoding RNAs (lncRNAs) and in the 5' leaders of mRNAs. smORFs are often discarded as nonfunctional, but ribosomal profiling (RiboSeq) reveals that thousands are translated, while characterised smORF functions have risen from anecdotal to identifiable trends: smORFs can either have a cis-noncoding regulatory function (involving low translation of nonfunctional peptides) or full coding function mediated by robustly translated peptides, often having cellular and physiological roles as membrane-associated regulators of canonical proteins. The evolutionary context reveals that many smORFs represent new genes emerging de novo from noncoding sequences. We suggest a mechanism for this process, where cis-noncoding smORF functions provide niches for the subsequent evolution of full peptide functions.
    Keywords:  lncRNAs; microproteins; short ORFs; smORF-encoded peptides; small ORFs
    DOI:  https://doi.org/10.1016/j.tig.2024.10.012
  2. Prog Neurobiol. 2024 Nov 23. pii: S0301-0082(24)00130-8. [Epub ahead of print]243 102694
      Short open reading frames (sORFs) are frequently overlooked because of their historical classification as non-coding elements or dismissed as "transcriptional noise". However, advanced genomic and proteomic technologies have allowed for screening and validating sORFs-encoded peptides, revealing their fundamental regulatory roles in cellular processes and sparking a growing interest in microprotein biology. In neuroscience, microproteins serve as neurotransmitters in signal transmission and regulate metabolism and emotions, exerting pivotal effects on neurological conditions such as nerve injury, neurogenic tumors, inflammation, and neurodegenerative diseases. This review summarizes the origins, characteristics, classifications, and functions of microproteins, focusing on their molecular mechanisms in neurological disorders. Potential applications, future perspectives, and challenges are discussed.
    Keywords:  Microproteins; Neurological diseases; SORF-encoded peptides; Short open reading frame
    DOI:  https://doi.org/10.1016/j.pneurobio.2024.102694
  3. bioRxiv. 2024 Nov 15. pii: 2024.11.14.623419. [Epub ahead of print]
      The human genome has been the subject of intense scrutiny by experimental and manual curation projects for more than two decades. Novel coding genes have been proposed from large-scale RNASeq, ribosome profiling and proteomics experiments. Here we carry out an in-depth analysis of an entire proteomics database. We analysed the proteins, peptides and spectra housed in the human build of the PeptideAtlas proteomics database to identify coding regions that are not yet annotated in the GENCODE reference gene set. We find support for hundreds of missing alternative protein isoforms and unannotated upstream translations, and evidence of cross-contamination from other species. There was reliable peptide evidence for 34 novel unannotated open reading frames (ORFs) in PeptideAtlas. We find that almost half belong to coding genes that are missing from GENCODE and other reference sets. Most of the remaining ORFs were not conserved beyond human, however, and their peptide confirmation was restricted to cancer cell lines. We show that this is strong evidence for aberrant translation, raising important questions about the extent of aberrant translation and how these ORFs should be annotated in reference genomes.
    DOI:  https://doi.org/10.1101/2024.11.14.623419
  4. Cells. 2024 Nov 18. pii: 1903. [Epub ahead of print]13(22):
      Insulin resistance, stem cell dysfunction, and muscle fiber dystrophy are all age-related events in skeletal muscle (SKM). However, age-related changes in insulin isoforms and insulin receptors in myogenic progenitor satellite cells have not been studied. Since SKM is an extra-pancreatic tissue that does not express mature insulin, we investigated the levels of insulin receptors (INSRs) and a novel human insulin upstream open reading frame (INSU) at the mRNA, protein, and anatomical levels in Baltimore Longitudinal Study of Aging (BLSA) biopsied SKM samples of 27-89-year-old (yrs) participants. Using RT-qPCR and the MS-based selected reaction monitoring (SRM) assay, we found that the levels of INSR and INSU mRNAs and the proteins were positively correlated with the age of human SKM biopsies. We applied RNAscope fluorescence in situ hybridization (FISH) and immunofluorescence (IF) to SKM cryosections and found that INSR and INSU were co-localized with PAX7-labeled satellite cells, with enhanced expression in SKM sections from an 89 yrs old compared to a 27 yrs old. We hypothesized that the SKM aging process might induce compensatory upregulation of INSR and re-expression of INSU, which might be beneficial in early embryogenesis and have deleterious effects on proliferative and myogenic satellite cells with advanced age.
    Keywords:  INSR; insulin; isoforms; satellite cells; skeletal muscle
    DOI:  https://doi.org/10.3390/cells13221903
  5. Biochem Pharmacol. 2024 Nov 24. pii: S0006-2952(24)00652-X. [Epub ahead of print]231 116652
      The peptides encoded by long noncoding RNAs (lncRNAs) have been shown to participate in cancer pathogenesis. In this study, lncRNA LINC00944 was validated to encode an endogenous 102-amino acid (aa) small peptide (named LINC00944 peptide). Functionally, LINC00944 peptide exerted an anti-growth effect in melanoma cells in vitro. Mechanistically, LINC00944 peptide interacted with the E1A binding protein p400 (EP400)/c-MYC complex. LINC00944 peptide also inhibited c-MYC protein expression. Furthermore, LINC00944 peptide repressed the transcriptional activity of MYC by reducing the EP400-MYC interaction, thereby reducing the levels of fatty acid metabolism- and glucose metabolism-related proteins. Our findings uncovered that LINC00944 peptide might be a promising adjuvant therapeutic agent for melanoma. Implications: This study provided the first evidence that LINC00944-encoded peptide played a critical role in the growth of melanoma cells.
    Keywords:  LINC00944; Melanoma; Small peptides; lncRNAs
    DOI:  https://doi.org/10.1016/j.bcp.2024.116652
  6. BMC Biol. 2024 Nov 26. 22(1): 273
       BACKGROUND: Accurate and comprehensive genomic annotation, including the full list of protein-coding genes, is vital for understanding the molecular mechanisms of human biology. We have previously shown that the genome contains a multitude of yet hidden functional exons and transcripts, some of which might represent novel mRNAs. These results resonate with those from other groups and strongly argue that two decades after the completion of the first draft of the human genome sequence, the current annotation of human genes and transcripts remains far from being complete.
    RESULTS: Using a targeted RNA enrichment technique, we showed that one of the novel functional exons previously discovered by us and currently annotated as part of a long non-coding RNA, is actually a part of a novel protein-coding gene, InSETG-4, which encodes a novel human protein with no known homologs or motifs. We found that InSETG-4 is induced by various DNA-damaging agents across multiple cell types and therefore might represent a novel component of DNA damage response. Despite its low abundance in bulk cell populations, InSETG-4 exhibited expression restricted to a small fraction of cells, as demonstrated by the amplification-based single-molecule fluorescence in situ hybridization (asmFISH) analysis.
    CONCLUSIONS: This study argues that yet undiscovered human protein-coding genes exist and provides an example of how targeted RNA enrichment techniques can help to fill this major gap in our knowledge of the information encoded in the human genome.
    Keywords:  DNA damage response; Genomic “dark matter”; Mass spectrometry; Nanopore sequencing; Novel gene; Novel protein; Rapid amplification of cDNA ends; Single-cell analysis; Single-molecule fluorescence in situ hybridization; Targeted RNA enrichment
    DOI:  https://doi.org/10.1186/s12915-024-02069-8
  7. Pharmaceutics. 2024 Nov 20. pii: 1486. [Epub ahead of print]16(11):
      Recent technological advancements, including computer-assisted drug discovery, gene-editing techniques, and high-throughput screening approaches, have greatly expanded the palette of methods for the discovery of peptides available to researchers. These emerging strategies, driven by recent advances in bioinformatics and multi-omics, have significantly improved the efficiency of peptide drug discovery when compared with traditional in vitro and in vivo methods, cutting costs and improving their reliability. An added benefit of peptide-based drugs is the ability to precisely target protein-protein interactions, which are normally a particularly challenging aspect of drug discovery. Another recent breakthrough in this field is targeted protein degradation through proteolysis-targeting chimeras. These revolutionary compounds represent a noteworthy advancement over traditional small-molecule inhibitors due to their unique mechanism of action, which allows for the degradation of specific proteins with unprecedented specificity. The inclusion of a peptide as a protein-of-interest-targeting moiety allows for improved versatility and the possibility of targeting otherwise undruggable proteins. In this review, we discuss various novel wet-lab and computational multi-omic methods for peptide drug discovery, provide an overview of therapeutic agents discovered through these cutting-edge techniques, and discuss the potential for the therapeutic delivery of peptide-based drugs.
    Keywords:  PROTACs; micropeptides; multi-omics; peptide drug delivery; peptide drug design; peptide drugs; targeted protein degradation
    DOI:  https://doi.org/10.3390/pharmaceutics16111486
  8. Nat Comput Sci. 2024 Nov 27.
      Human essential proteins (HEPs) are indispensable for individual viability and development. However, experimental methods to identify HEPs are often costly, time consuming and labor intensive. In addition, existing computational methods predict HEPs only at the cell line level, but HEPs vary across living human, cell line and animal models. Here we develop a sequence-based deep learning model, Protein Importance Calculator (PIC), by fine-tuning a pretrained protein language model. PIC not only substantially outperforms existing methods for predicting HEPs but also provides comprehensive prediction results across three levels: human, cell line and mouse. Furthermore, we define the protein essential score, derived from PIC, to quantify human protein essentiality and validate its effectiveness by a series of biological analyses. We also demonstrate the biomedical value of the protein essential score by identifying potential prognostic biomarkers for breast cancer and quantifying the essentiality of 617,462 human microproteins.
    DOI:  https://doi.org/10.1038/s43588-024-00733-1