bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2023–11–05
eight papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Biochem Biophys Res Commun. 2023 Oct 13. pii: S0006-291X(23)01104-X. [Epub ahead of print]684 149040
      In recent years, proteogenomics and ribosome profiling studies have identified a large number of proteins encoded by noncoding regions in the human genome. They are encoded by small open reading frames (sORFs) in the untranslated regions (UTRs) of mRNAs and long non-coding RNAs (lncRNAs). These sORF encoded proteins (SEPs) are often <150AA and show poor evolutionary conservation. A subset of them have been functionally characterized and shown to play an important role in fundamental biological processes including cardiac and muscle function, DNA repair, embryonic development and various human diseases. How many novel protein-coding regions exist in the human genome and what fraction of them are functionally important remains a mystery. In this review, we discuss current progress in unraveling SEPs, approaches used for their identification, their limitations and reliability of these identifications. We also discuss functionally characterized SEPs and their involvement in various biological processes and diseases. Lastly, we provide insights into their distinctive features compared to canonical proteins and challenges associated with annotating these in protein reference databases.
    Keywords:  Non-coding RNAs; Novel proteins; Protein-coding potential; SEPs
    DOI:  https://doi.org/10.1016/j.bbrc.2023.09.068
  2. Microbiol Spectr. 2023 Nov 01. e0281123
      In the past decade, small open reading frames (sORFs) coding for proteins less than 70 amino acids (aa) in length have moved into the focus of science. sORFs and the corresponding small proteins have been recently identified in all three domains of life. However, the majority of small proteins remain functionally uncharacterized. While several bacterial small proteins have already been described, the number of identified and functionally characterized small proteins in archaea is still limited. In this study, we have discovered that the small protein 36 (sP36), which consists of only 61 aa, plays a critical role in regulating nitrogen metabolism in Methanosarcina mazei. The absence of sP36 significantly delays the growth of M. mazei when transitioning from nitrogen limitation to nitrogen sufficiency, as compared to the wild type. Through our in vivo experiments, we have observed that during nitrogen limitation, sP36 is dispersed throughout the cytoplasm; however, upon shifting the cells to nitrogen sufficiency, it relocates to the cytoplasmic membrane. Furthermore, an in vitro biochemical analysis clearly showed that sP36 interacts with high affinity with the ammonium transporter AmtB1 present in the cytoplasmic membrane during nitrogen limitation as well as with the PII-like protein GlnK1. Moreover, the in vivo GlnK1 interaction with AmtB1 due to nitrogen upshifts requires the presence of sP36. Based on our findings, we propose that in response to an ammonium upshift, sP36 targets the ammonium transporter AmtB1 and inhibits its activity by mediating the interaction with GlnK1. IMPORTANCE Small proteins containing fewer than 70 amino acids, which were previously disregarded due to computational prediction and biochemical detection challenges, have gained increased attention in the scientific community in recent years. However, the number of functionally characterized small proteins, especially in archaea, is still limited. Here, by using biochemical and genetic approaches, we demonstrate a crucial role of the small protein sP36 in the nitrogen metabolism of M. mazei, which modulates the ammonium transporter AmtB1 according to nitrogen availability. This modulation might represent an ancient archaeal mechanism of AmtB1 inhibition, in contrast to the well-studied uridylylation-dependent regulation in bacteria.
    Keywords:  ammonium transport; archaea; membrane proteins; pII protein; protein regulation; small protein
    DOI:  https://doi.org/10.1128/spectrum.02811-23
  3. bioRxiv. 2023 Oct 16. pii: 2023.10.16.562581. [Epub ahead of print]
      Expansions of CAG trinucleotide repeats cause several rare neurodegenerative diseases. The disease-causing repeats are translated in multiple reading frames, without an identifiable initiation codon. The molecular mechanism of this repeat-associated non-AUG (RAN) translation is not known. We find that expanded CAG repeats create new splice acceptor sites. Splicing of proximal donors to the repeats produces unexpected repeat-containing transcripts. Upon splicing, depending on the sequences surrounding the donor, CAG repeats may become embedded in AUG-initiated open reading frames. Canonical AUG-initiated translation of these aberrant RNAs accounts for proteins that are attributed to RAN translation. Disruption of the relevant splice donors or the in-frame AUG initiation codons is sufficient to abrogate RAN translation. Our findings provide a molecular explanation for the abnormal translation products observed in CAG trinucleotide repeat expansion disorders and add to the repertoire of mechanisms by which repeat expansion mutations disrupt cellular functions.
    DOI:  https://doi.org/10.1101/2023.10.16.562581
  4. J Immunother Cancer. 2023 10;pii: e007073. [Epub ahead of print]11(10):
      Identification of tumor antigens presented by the human leucocyte antigen (HLA) molecules is essential for the design of effective and safe cancer immunotherapies that rely on T cell recognition and killing of tumor cells. Mass spectrometry (MS)-based immunopeptidomics enables high-throughput, direct identification of HLA-bound peptides from a variety of cell lines, tumor tissues, and healthy tissues. It involves immunoaffinity purification of HLA complexes followed by MS profiling of the extracted peptides using data-dependent acquisition, data-independent acquisition, or targeted approaches. By incorporating DNA, RNA, and ribosome sequencing data into immunopeptidomics data analysis, the proteogenomic approach provides a powerful means for identifying tumor antigens encoded within the canonical open reading frames of annotated coding genes and non-canonical tumor antigens derived from presumably non-coding regions of our genome. We discuss emerging computational challenges in immunopeptidomics data analysis and tumor antigen identification, highlighting key considerations in the proteogenomics-based approach, including accurate DNA, RNA and ribosomal sequencing data analysis, careful incorporation of predicted novel protein sequences into reference protein database, special quality control in MS data analysis due to the expanded and heterogeneous search space, cancer-specificity determination, and immunogenicity prediction. The advancements in technology and computation is continually enabling us to identify tumor antigens with higher sensitivity and accuracy, paving the way toward the development of more effective cancer immunotherapies.
    Keywords:  Antigens, Neoplasm; Computational Biology; Immunity
    DOI:  https://doi.org/10.1136/jitc-2023-007073
  5. mSystems. 2023 Nov 01. e0103723
      The ability to respond to acidic environments is crucial for neutralophilic bacteria. Escherichia coli has a well-characterized regulatory network that triggers a multitude of defense mechanisms to counteract excess protons. Nevertheless, systemic studies of the transcriptional and translational reprogramming of E. coli to different degrees of acid stress have not yet been performed. Here, we used ribosome profiling and RNA sequencing to compare the response of E. coli (pH 7.6) to sudden mild (pH 5.8) and severe near-lethal acid stress (pH 4.4) conditions that mimic passage through the gastrointestinal tract. We uncovered new differentially regulated genes and pathways, key transcriptional regulators, and 18 novel acid-induced candidate small open reading frames. By using machine learning and leveraging large compendia of publicly available E. coli expression data, we were able to distinguish between the response to acid stress and general stress. These results expand the acid resistance network and provide new insights into the fine-tuned response of E. coli to mild and severe acid stress. IMPORTANCE Bacteria react very differently to survive in acidic environments, such as the human gastrointestinal tract. Escherichia coli is one of the extremely acid-resistant bacteria and has a variety of acid-defense mechanisms. Here, we provide the first genome-wide overview of the adaptations of E. coli K-12 to mild and severe acid stress at both the transcriptional and translational levels. Using ribosome profiling and RNA sequencing, we uncover novel adaptations to different degrees of acidity, including previously hidden stress-induced small proteins and novel key transcription factors for acid defense, and report mRNAs with pH-dependent differential translation efficiency. In addition, we distinguish between acid-specific adaptations and general stress response mechanisms using denoising autoencoders. This workflow represents a powerful approach that takes advantage of next-generation sequencing techniques and machine learning to systematically analyze bacterial stress responses.
    Keywords:  RNA-Seq; acid resistance; machine learning; ribosome profiling; small proteins; transcription factor
    DOI:  https://doi.org/10.1128/msystems.01037-23
  6. PLoS Genet. 2023 Oct 30. 19(10): e1011004
      The last decade witnesses the emergence of the abundant family of smORF peptides, encoded by small ORF (<100 codons), whose biological functions remain largely unexplored. Bioinformatic analyses here identify hundreds of putative smORF peptides expressed in Drosophila imaginal leg discs. Thanks to a functional screen in leg, we found smORF peptides involved in morphogenesis, including the pioneer smORF peptides Pri. Since we identified its target Ubr3 in the epidermis and pri was known to control leg development through poorly understood mechanisms, we investigated the role of Ubr3 in mediating pri function in leg. We found that pri plays several roles during leg development both in patterning and in cell survival. During larval stage, pri activates independently of Ubr3 tarsal transcriptional programs and Notch and EGFR signaling pathways, whereas at larval pupal transition, Pri peptides cooperate with Ubr3 to insure cell survival and leg morphogenesis. Our results highlight Ubr3 dependent and independent functions of Pri peptides and their pleiotropy. Moreover, we reveal that the smORF peptide family is a reservoir of overlooked developmental regulators, displaying distinct molecular functions and orchestrating leg development.
    DOI:  https://doi.org/10.1371/journal.pgen.1011004
  7. Nucleic Acids Res. 2023 Oct 28. pii: gkad814. [Epub ahead of print]
      Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations.
    DOI:  https://doi.org/10.1093/nar/gkad814
  8. Cell Death Dis. 2023 Oct 30. 14(10): 708
      Lymph node metastasis (LNM) is the prominent route of gastric cancer dissemination, and usually leads to tumor progression and a dismal prognosis of gastric cancer. Although exosomal lncRNAs have been reported to be involved in tumor development, whether secreted lncRNAs can encode peptides in recipient cells remains unknown. Here, we identified an exosomal lncRNA (lncAKR1C2) that was clinically correlated with lymph node metastasis in gastric cancer in a VEGFC-independent manner. Exo-lncAKR1C2 secreted from gastric cancer cells was demonstrated to enhance tube formation and migration of lymphatic endothelial cells, and facilitate lymphangiogenesis and lymphatic metastasis in vivo. By comparing the metabolic characteristics of LN metastases and primary focuses, we found that LN metastases of gastric cancer displayed higher lipid metabolic activity. Moreover, exo-lncAKR1C2 encodes a microprotein (pep-AKR1C2) in lymphatic endothelial cells and promotes CPT1A expression by regulating YAP phosphorylation, leading to enhanced fatty acid oxidation (FAO) and ATP production. These findings highlight a novel mechanism of LNM and suggest that the microprotein encoded by exosomal lncAKR1C2 serves as a therapeutic target for advanced gastric cancer.
    DOI:  https://doi.org/10.1038/s41419-023-06220-1