bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2023–10–29
six papers selected by
Thomas Farid Martínez, University of California, Irvine



  1. STAR Protoc. 2023 Oct 23. pii: S2666-1667(23)00616-0. [Epub ahead of print]4(4): 102649
      Small open reading frame (smORF)-encoded microproteins, proteins containing less than 100-150 amino acids, are an emerging class of functional biomolecules. Here, we present a protocol for identifying translated smORFs in mammalian systems genome wide. We describe steps for generation of ribosome profiling (Ribo-seq) data, in silico translation of a transcriptome assembly to create an ORF database, and computational analysis of Ribo-seq to score individual smORFs for translation. Identification of translated smORFs is the first step to studying the functions of microproteins. For complete details on the use and execution of this protocol, please refer to Martinez et al.1.
    Keywords:  Bioinformatics; Cell Biology; Cell Culture; Gene Expression; Genomics; Molecular Biology; RNAseq; Sequence Analysis; Sequencing
    DOI:  https://doi.org/10.1016/j.xpro.2023.102649
  2. Trends Genet. 2023 Oct 23. pii: S0168-9525(23)00236-6. [Epub ahead of print]
      Thousands of small proteins, called microproteins, are encoded in small open reading frames (smORFs) throughout the genome. Despite assumptions that these proteins would be too small to properly fold and function, a recent study by Chen et al. identifies the surprisingly complex roles of one such microprotein.
    DOI:  https://doi.org/10.1016/j.tig.2023.10.008
  3. Cell Rep. 2023 Oct 26. pii: S2211-1247(23)01323-2. [Epub ahead of print]42(11): 113311
      Short polypeptides encoded by small open reading frames (smORFs) are ubiquitously found in eukaryotic genomes and are important regulators of physiology, development, and mitochondrial processes. Here, we focus on a subset of 298 smORFs that are evolutionarily conserved between Drosophila melanogaster and humans. Many of these smORFs are conserved broadly in the bilaterian lineage, and ∼182 are conserved in plants. We observe remarkably heterogeneous spatial and temporal expression patterns of smORF transcripts-indicating wide-spread tissue-specific and stage-specific mitochondrial architectures. In addition, an analysis of annotated functional domains reveals a predicted enrichment of smORF polypeptides localizing to mitochondria. We conduct an embryonic ribosome profiling experiment and find support for translation of 137 of these smORFs during embryogenesis. We further embark on functional characterization using CRISPR knockout/activation, RNAi knockdown, and cDNA overexpression, revealing diverse phenotypes. This study underscores the importance of identifying smORF function in disease and phenotypic diversity.
    Keywords:  CP: Genomics; CRISPR; Drosophila; gene function; gene knockout; peptide; ribosome profiling; smORF
    DOI:  https://doi.org/10.1016/j.celrep.2023.113311
  4. Plant Physiol. 2023 Oct 25. pii: kiad572. [Epub ahead of print]
      Arabidopsis (Arabidopsis thaliana) ecotype Col-0 has plastid and mitochondrial genomes encoding over 100 proteins. Public databases (e.g., Araport11) have redundancy and discrepancies in gene identifiers for these organelle-encoded proteins. RNA editing results in changes to specific amino acid residues or creation of start and stop codons for many of these proteins, but the impact of RNA editing at the protein level is largely unexplored due to the complexities of detection. Here, we assembled the non-redundant set of identifiers, their correct protein sequences, and 452 predicted non-synonymous editing sites of which 56 are edited at lower frequency. We then determined accumulation of edited and/or unedited proteoforms by searching ∼259 million raw tandem mass spectrometry spectra from ProteomeXchange, which is part of PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/). We identified all mitochondrial proteins and all except three plastid-encoded proteins (NdhG/Ndh6, PsbM, Rps16), but no proteins predicted from the four open reading frames were identified. We suggest that Rps16 and three of the open reading frames are pseudogenes. Detection frequencies for each edit site and type of edit (e.g., S to L/F) were determined at the protein level, cross-referenced against the metadata (e.g., tissue), and evaluated for technical detection challenges. We detected 167 predicted edit sites at the proteome level. Minor frequency sites were edited at low frequency at the protein level except for cytochrome C biogenesis 382 at residue 124 (Ccb382-124) Major frequency sites (>50% editing of RNA) only accumulated in edited form (>98-100% edited) at the protein level, with the exception of Rpl5-22. We conclude that RNA editing for major editing sites is required for stable protein accumulation.
    DOI:  https://doi.org/10.1093/plphys/kiad572
  5. Methods. 2023 Oct 25. pii: S1046-2023(23)00173-1. [Epub ahead of print]
      Recent advancements in omics technologies have unveiled a hitherto unknown group of short polypeptides called microproteins (miPs). Despite their size, accumulating evidence has demonstrated that miPs exert varied and potent biological functions. They act in paracrine, juxtracrine, and endocrine fashion, maintaining cellular physiology and driving diseases. The present study focuses on biochemical and biophysical analysis and characterization of twenty-four human miPs using distinct computational methods, including RIDAO, AlphaFold2, D2P2, FuzDrop, STRING, and Emboss Pep wheel. miPs often lack well-defined tertiary structures and may harbor intrinsically disordered regions (IDRs) that play pivotal roles in cellular functions. Our analyses define the physicochemical properties of an essential subset of miPs, elucidating their structural characteristics and demonstrating their propensity for driving or participating in liquid-liquid phase separation (LLPS) and intracellular condensate formation. Notably, miPs such as NoBody and pTUNAR revealed a high propensity for LLPS, implicating their potential involvement in forming membrane-less organelles (MLOs) during intracellular LLPS and condensate formation. The results of our study indicate that miPs have functionally profound implications in cellular compartmentalization and signaling processes essential for regulating normal cellular functions. Taken together, our methodological approach explains and highlights the biological importance of these miPs, providing a deeper understanding of the unusual structural landscape and functionality of these newly defined small proteins. Understanding their functions and biological behavior will aid in developing targeted therapies for diseases that involve miPs.
    Keywords:  Cellular Processes and Diseases; Intrinsically Disordered Regions; Liquid-Liquid Phase Separation; Membrane-less Organelles; Short open reading frames; microProteins or microPeptides
    DOI:  https://doi.org/10.1016/j.ymeth.2023.10.009
  6. Clin Transl Med. 2023 Oct;13(10): e1451
       BACKGROUND: Circular RNAs (circRNAs) play a significant role in the initiation and progression of various cancers, including hepatocellular carcinoma (HCC). Circular syntaxin 6 (circSTX6, also known as hsa_circ_0007905) has been identified as a microRNA (miRNA) sponge in pancreatic adenocarcinoma. However, its full range of functions in terms of protein scaffold and translation remain largely unexplored in the context of HCC.
    METHODS: The expression of circSTX6 and its encoded protein was examined in HCC tumour tissues. N6 -methyladenosine (m6 A) on circSTX6 was verified and quantified by methylated RNA immunoprecipitation (Me-RIP), RIP and dual luciferase reporter assays. The biological functions of circSTX6 and its encoded protein in HCC were clarified by in vitro and in vivo experiments. Mechanistically, the interaction between circSTX6 and heterogeneous nuclear ribonucleoprotein D (HNRNPD) was investigated by RNA pull-down, RIP and fluorescence in situ hybridization (FISH)/IF. The regulatory effects of circSTX6 and HNRNPD on activating transcription factor 3 (ATF3) mRNA were determined by mRNA stability and RIP assays. Furthermore, the presence of circSTX6-encoded protein was verified by mass spectrometry.
    RESULTS: CircSTX6 and its encoded 144 amino acid polypeptide, circSTX6-144aa, were highly expressed in HCC tumour tissues and served as independent risk factors for overall survival in HCC patients. The expression of circSTX6 was regulated by METTL14 in an m6 A-dependent manner. Functionally, circSTX6 accelerated HCC proliferation and tumourigenicity and reinforced tumour metastasis in vitro and in vivo. Mechanistically, circSTX6 acted as a sponge for HNRNPD protein, facilitating its binding to ATF3 mRNA, consequently promoting ATF3 mRNA decay. Meanwhile, circSTX6-144aa promoted HCC proliferation, migration and invasion independent of circSTX6 itself.
    CONCLUSION: Collectively, our study reveals that m6 A-modified circSTX6 drives malignancy in HCC through the HNRNPD/ATF3 axis, while its encoded circSTX6-144aa contributes to HCC progression independent of circSTX6. CirSTX6 and its encoded protein hold promise as potential biomarkers and therapeutic targets in HCC.
    Keywords:  N6-methyladenosine modification; RNA-binding protein; circRNA; hepatocellular carcinoma; mRNA decay; protein encoding
    DOI:  https://doi.org/10.1002/ctm2.1451