bims-micpro Biomed News
on Discovery and characterization of microproteins
Issue of 2023‒04‒02
six papers selected by
Thomas Farid Martínez
University of California, Irvine

  1. Front Cell Dev Biol. 2023 ;11 1117454
      Recent advances in proteogenomic techniques and bioinformatic pipelines have permitted the detection of thousands of translated small Open Reading Frames (smORFs), which contain less than 100 codons, in eukaryotic genomes. Hundreds of these actively translated smORFs display conserved sequence, structure and evolutionary signatures indicating that the translated peptides could fulfil important biological roles. Despite their abundance, only tens of smORF genes have been fully characterised; these act mainly as regulators of canonical proteins involved in essential cellular processes. Importantly, some of these smORFs display conserved functions with their mutations being associated with pathogenesis. Thus, investigating smORF roles in Drosophila will not only expand our understanding of their functions but it may have an impact in human health. Here we describe the function of a novel and essential Drosophila smORF gene named purriato (prto). prto belongs to an ancient gene family whose members have expanded throughout the Protostomia clade. prto encodes a transmembrane peptide which is localized in endo-lysosomes and perinuclear and plasma membranes. prto is dynamically expressed in mesodermal tissues and imaginal discs. Targeted prto knockdown (KD) in these organs results in changes in nuclear morphology and endo-lysosomal distributions correlating with the loss of sarcomeric homeostasis in muscles and reduction of mitosis in wing discs. Consequently, prto KD mutants display severe reduction of motility, and shorter wings. Finally, our genetic interaction experiments show that prto function is closely associated to the CASA pathway, a conserved mechanism involved in turnover of mis-folded proteins and linked to muscle dystrophies and neurodegenerative diseases. Thus, this study shows the relevance of smORFs in regulating important cellular functions and supports the systematic characterisation of this class of genes to understand their functions and evolution.
    Keywords:  Drosophila; cell proliferation; constitutive assisted selective autophagy (CASA); proteostasis; sarcomerogenesis; smORF peptides; wing imaginal disc
  2. Mol Oncol. 2023 Mar 25.
      Currently, the knowledge of long non-coding RNA (lncRNA)-encoded peptides is quite lacking in esophageal squamous cell carcinoma (ESCC). In this study, we simultaneously identified six lncRNA open reading frames (ORFs) with peptide-coding abilities including lysine-specific demethylase 4A antisense RNA 1 (KDM4A-AS1) ORF by combining weighted gene co-expression network analysis (WGCNA) for ESCC clinical samples, ribosome footprints, ORF prediction, mass spectrometry (MS) identification, and western blotting. KDM4A-AS1 ORF-encoded peptide reduced ESCC cell viability and migratory ability. Co-immunoprecipitation and MS analysis revealed that KDM4A-AS1-encoded peptide specifically bound with 103 proteins in ESCC cells, and enrichment analysis suggested that peptide-bound proteins were related to fatty acid metabolism and redox process. Cell and molecular experiments demonstrated that KDM4A-AS1-encoded peptide inhibited stearoyl-CoA desaturase and fatty acid synthase expression, increased reactive oxygen species level, and reduced mitochondrial membrane potential in ESCC cells. In summary, multiple lncRNAs with translation potential were simultaneously identified by combining multiple approaches in ESCC, providing novel identification strategies for lncRNA-encoded peptides. Moreover, lncRNA KDM4A-AS1-encoded peptide weakened ESCC cell viability and migratory capacity and functioned in fatty acid metabolism and redox process.
    Keywords:  Esophageal squamous cell carcinoma; KDM4A-AS1; lncRNAs; mass spectrometry; peptide
  3. Brief Bioinform. 2023 Mar 17. pii: bbad101. [Epub ahead of print]
      Small open reading frames (smORFs) encoding proteins less than 100 amino acids (aa) are known to be important regulators of key cellular processes. However, their computational identification remains a challenge. Based on a comprehensive analysis of known prokaryotic small ORFs, we have developed the ProsmORF-pred resource which uses a machine learning (ML)-based method for prediction of smORFs in the prokaryotic genome sequences. ProsmORF-pred consists of two ML models, one for initiation site recognition in nucleic acid sequences upstream of putative start codons and the other uses translated amino acid sequences to decipher functional protein like sequences. The nucleotide sequence-based initiation site recognition model has been trained using longer ORFs (>100 aa) in the same genome while the ML model for identification of protein like sequences has been trained using annotated smORFs from Escherichia coli. Comprehensive benchmarking of ProsmORF-pred reveals that its performance is comparable to other state-of-the-art approaches on the annotated smORF set derived from 32 prokaryotic genomes. Its performance is distinctly superior to other tools like PRODIGAL and RANSEPS for prediction of newly identified smORFs which have a length range of 10-30 aa, where prediction of smORFs has been a major challenge. Apart from identification of smORFs in genomic sequences, ProsmORF-pred can also aid in functional annotation of the predicted smORFs based on sequence similarity and genomic neighbourhood similarity searches in ProsmORFDB, a well-curated database of known smORFs. ProsmORF-pred along with its backend database ProsmORFDB is available as a user-friendly web server (
    Keywords:  ORF prediction; Random Forest; functional annotation; genome annotation; machine learning; small ORFs
  4. bioRxiv. 2023 Mar 25. pii: 2023.03.23.533704. [Epub ahead of print]
      ORFanage is a system designed to assign open reading frames (ORFs) to both known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing (RNA-seq) experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the RefSeq and GENCODE human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.
  5. Cancers (Basel). 2023 Mar 21. pii: 1880. [Epub ahead of print]15(6):
      BACKGROUND: Long non-coding RNAs (lncRNAs) are a class of RNA molecules that are longer than 200 nucleotides and were initially believed to lack encoding capability. However, recent research has found open reading frames (ORFs) within lncRNAs, suggesting that they may have coding capacity. Despite this discovery, the mechanisms by which lncRNA-encoded products are involved in cancer are not well understood. The current study aims to investigate whether lncRNA HCP5-encoded products promote triple-negative breast cancer (TNBC) by regulating ferroptosis.METHODS: We used bioinformatics to predict the coding capacity of lncRNA HCP5 and conducted molecular biology experiments and a xenograft assay in nude mice to investigate the mechanism of its encoded products. We also evaluated the expression of the HCP5-encoded products in a breast cancer tissue microarray.
    RESULTS: Our analysis revealed that the ORF in lncRNA HCP5 can encode a protein with 132-amino acid (aa), which we named HCP5-132aa. Further experiments showed that HCP5-132aa promotes TNBC growth by regulating GPX4 expression and lipid ROS level through the ferroptosis pathway. Additionally, we found that the breast cancer patients with high levels of HCP5-132aa have poorer prognosis.
    CONCLUSIONS: Our study suggests that overexpression of lncRNA HCP5-encoded protein is a critical oncogenic event in TNBC, as it regulates ferroptosis. These findings could provide new therapeutic targets for the treatment of TNBC.
    Keywords:  ROS; encoded protein; ferroptosis; lncRNA HCP5; triple-negative breast cancer
  6. Membranes (Basel). 2023 Feb 25. pii: 274. [Epub ahead of print]13(3):
      Calcium is a major signalling bivalent cation within the cell. Compartmentalization is essential for regulation of calcium mediated processes. A number of players contribute to intracellular handling of calcium, among them are the sarco/endoplasmic reticulum calcium ATP-ases (SERCAs). These molecules function in the membrane of ER/SR pumping Ca2+ from cytoplasm into the lumen of the internal store. Removal of calcium from the cytoplasm is essential for signalling and for relaxation of skeletal muscle and heart. There are three genes and over a dozen isoforms of SERCA in mammals. These can be potentially influenced by small membrane peptides, also called regulins. The discovery of micropeptides has increased in recent years, mostly because of the small ORFs found in long RNAs, annotated formerly as noncoding (lncRNAs). Several excellent works have analysed the mechanism of interaction of micropeptides with each other and with the best known SERCA1a (fast muscle) and SERCA2a (heart, slow muscle) isoforms. However, the array of tissue and developmental expressions of these potential regulators raises the question of interaction with other SERCAs. For example, the most abundant calcium pump in neonatal and regenerating skeletal muscle, SERCA1b has never been looked at with scrutiny to determine whether it is influenced by micropeptides. Further details might be interesting on the interaction of these peptides with the less studied SERCA1b isoform.
    Keywords:  SERCA1b; regulins; sarcoplasmic reticulum; transmembrane micropeptides