bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2023‒10‒15
five papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. Front Genet. 2023 ;14 1264606
      Circular RNA (circRNA) is a special class of noncoding RNA molecules and the latest research hotspot in the field of RNA. CircRNA molecules have a closed loop structure, which is not affected by RNA exonuclease and has the characteristics of more stable expression. Previous studies have shown that circRNA molecules are rich in microRNA (miRNA) binding sites and act as miRNA sponges in cells. By interacting with miRNAs associated with tumors and other diseases, circRNAs play an important regulatory role. However, circRNAs have recently been found to have small open reading frames that enable them to encode peptides/proteins. These proteins have been reported to play an important role in the mechanism of regulation of a variety of diseases and have great potential in the diagnosis and treatment of diseases. In this review, we summarize the mechanism of action of the newly discovered circRNA-coding proteins since 2022 and briefly describe their research process. In addition, we also discuss the prediction model of the functional sites and encoded proteins of circRNAs, which provides a potential idea for future research on circRNAs.
    Keywords:  circular RNA; disease progression; encoded protein; osteosarcoma pathology; regulatory mechanism
    DOI:  https://doi.org/10.3389/fgene.2023.1264606
  2. bioRxiv. 2023 Sep 28. pii: 2023.09.26.559641. [Epub ahead of print]
      Unveiling the complete proteome of viruses is crucial to our understanding of the viral life cycle and interaction with the host. We developed Massively Parallel Ribosome Profiling (MPRP) to experimentally determine open reading frames (ORFs) in 20,170 designed oligonucleotides across 679 human-associated viral genomes. We identified 5,381 ORFs, including 4,208 non-canonical ORFs, and show successful detection of both annotated coding sequences (CDSs) and reported non-canonical ORFs. By examining immunopeptidome datasets of infected cells, we found class I human leukocyte antigen (HLA-I) peptides originating from non-canonical ORFs identified through MPRP. By inspecting ribosome occupancies on the 5'UTR and CDS regions of annotated viral genes, we identified hundreds of upstream ORFs (uORFs) that negatively regulate the synthesis of canonical viral proteins. The unprecedented source of viral ORFs across a wide range of viral families, including highly pathogenic viruses, expands the repertoire of vaccine targets and exposes new cis-regulatory sequences in viral genomes.
    DOI:  https://doi.org/10.1101/2023.09.26.559641
  3. Biochemistry. 2023 Oct 09.
      Over the past decade, advances in genomics have identified thousands of additional protein-coding small open reading frames (smORFs) missed by traditional gene finding approaches. These smORFs encode peptides and small proteins, commonly termed micropeptides or microproteins. Several of these newly discovered microproteins have biological functions and operate through interactions with proteins and protein complexes within the cell. CYREN1 is a characterized microprotein that regulates double-strand break repair in mammalian cells through interaction with Ku70/80 heterodimer. Ku70/80 binds to and stabilizes double-strand breaks and recruits the machinery needed for nonhomologous end join repair. In this study, we examined the biochemical properties of CYREN1 to better understand and explain its cellular protein interactions. Our findings support that CYREN1 is an intrinsically disordered microprotein and this disordered structure allows it to enriches several proteins, including a newly discovered interaction with SF3B1 via a distinct short linear motif (SLiMs) on CYREN1. Since many microproteins are predicted to be disordered, CYREN1 is an exemplar of how microproteins interact with other proteins and reveals an unknown scaffolding function of this microprotein that may link NHEJ and splicing.
    DOI:  https://doi.org/10.1021/acs.biochem.3c00397
  4. bioRxiv. 2023 Oct 02. pii: 2023.09.27.559809. [Epub ahead of print]
      There has been a dramatic increase in the identification of non-conical translation and a significant expansion of the protein-coding genome and proteome. Among the strategies used to identify novel small ORFs (smORFs), Ribosome profiling (Ribo-Seq) is the gold standard for the annotation of novel coding sequences by reporting on smORF translation. In Ribo-Seq, ribosome-protected footprints (RPFs) that map to multiple sites in the genome are computationally removed since they cannot unambiguously be assigned to a specific genomic location, or to a specific transcript in the case of multiple isoforms. Furthermore, RPFs necessarily result in short (25-34 nucleotides) reads, increasing the chance of ambiguous and multi-mapping alignments, such that smORFs that reside in these regions cannot be identified by Ribo-Seq. Here, we show that the inclusion of proteogenomics to create a Ribosome Profiling and Proteogenomics Pipeline (RP3) bypasses this limitation to identify a group of microprotein-encoding smORFs that are missed by current Ribo-Seq pipelines. Moreover, we show that the microproteins identified by RP3 have different sequence compositions from the ones identified by Ribo-Seq-only pipelines, which can affect proteomics identification. In aggregate, the development of RP3 maximizes the detection and confidence of protein-encoding smORFs and microproteins.
    DOI:  https://doi.org/10.1101/2023.09.27.559809
  5. Nucleic Acids Res. 2023 Oct 12. pii: gkad824. [Epub ahead of print]
      Advancements in mass spectrometry (MS)-based proteomics have greatly facilitated the large-scale quantification of proteins and microproteins, thereby revealing altered signalling pathways across many different cancer types. However, specialized and comprehensive resources are lacking for cancer proteomics. Here, we describe CancerProteome (http://bio-bigdata.hrbmu.edu.cn/CancerProteome), which functionally deciphers and visualizes the proteome landscape in cancer. We manually curated and re-analyzed publicly available MS-based quantification and post-translational modification (PTM) proteomes, including 7406 samples from 21 different cancer types, and also examined protein abundances and PTM levels in 31 120 proteins and 4111 microproteins. Six major analytical modules were developed with a view to describe protein contributions to carcinogenesis using proteome analysis, including conventional analyses of quantitative and the PTM proteome, functional enrichment, protein-protein associations by integrating known interactions with co-expression signatures, drug sensitivity and clinical relevance analyses. Moreover, protein abundances, which correlated with corresponding transcript or PTM levels, were evaluated. CancerProteome is convenient as it allows users to access specific proteins/microproteins of interest using quick searches or query options to generate multiple visualization results. In summary, CancerProteome is an important resource, which functionally deciphers the cancer proteome landscape and provides a novel insight for the identification of tumor protein markers in cancer.
    DOI:  https://doi.org/10.1093/nar/gkad824