bims-micpro 2021-08-15 papers

IEEE/ACM Trans Comput Biol Bioinform. 2021 Aug 12. PP

Identifying lncRNA-encoded Short Peptides Using Optimized Hybrid Features and Ensemble Learning.

Siyuan Zhao, Jun Meng, Qiang Kang, Yushi Luan.

Long non-coding RNA (lncRNA) contains short open reading frames (sORFs), and sORFs-encoded short peptides (SEPs) have become the focus of scientific studies due to their crucial role in life activities. The identification of SEPs is vital to further understanding their regulatory function. Bioinformatics methods can quickly identify SEPs to provide credible candidate sequences for verifying SEPs by biological experiments. However, there is a lack of methods for identifying SEPs directly. In this study, a machine learning method to identify SEPs of plant lncRNA (ISPL) is proposed. Hybrid features including sequence features and physicochemical features are extracted manually or adaptively to construct different modal features. In order to keep the stability of feature selection, the non-linear correction applied in Max-Relevance-Max-Distance (nocRD) feature selection method is proposed, which integrates multiple feature ranking results and uses the iterative random forest for different modal features dimensionality reduction. Classification models with different modal features are constructed, and their outputs are combined for ensemble classification. The experimental results show that the accuracy of ISPL is 89.86% on the independent test set, which will have important implications for further studies of functional genomic.

DOI: https://doi.org/10.1109/TCBB.2021.3104288

Front Cell Dev Biol. 2021 ;9 687748

Mapping Microproteins and ncRNA-Encoded Polypeptides in Different Mouse Tissues.

Ni Pan, Zhiwei Wang, Bing Wang, Jian Wan, Cuihong Wan.

Small open reading frame encoded peptides (SEPs), also called microproteins, play a vital role in biological processes. Plenty of their open reading frames are located within the non-coding RNA (ncRNA) range. Recent research has demonstrated that ncRNA-encoded polypeptides have essential functions and exist ubiquitously in various tissues. To better understand the role of microproteins, especially ncRNA-encoded proteins, expressed in different tissues, we profiled the proteomic characterization of five mouse tissues by mass spectrometry, including bottom-up, top-down, and de novo sequencing strategies. Bottom-up and top-down with database-dependent searches identified 811 microproteins in the OpenProt database. De novo sequencing identified 290 microproteins, including 12 ncRNA-encoded microproteins that were not found in current databases. In this study, we discovered 1,074 microproteins in total, including 270 ncRNA-encoded microproteins. From the annotation of these microproteins, we found that the brain contains the largest number of neuropeptides, while the spleen contains the most immunoassociated microproteins. This suggests that microproteins in different tissues have tissue-specific functions. These unannotated ncRNA-coded microproteins have predicted domains, such as the macrophage migration inhibitory factor domain and the Prefoldin domain. These results expand the mouse proteome and provide insight into the molecular biology of mouse tissues.

Keywords: de novo sequencing; mouse tissue; non-coding RNAs; small open reading frame; top-down

DOI: https://doi.org/10.3389/fcell.2021.687748

Proteomics. 2021 Aug 12. e2100152

A simple organic solvent precipitation method to improve detection of low molecular weight proteins.

Parthiban Periasamy, Sureka Rajandran, Rebekah Ziegman, Mika Casey, Kyohei Nakamura, Hitesh Kore, Keshava Datta, Harsha Gowda.

Mass spectrometry-based proteomics revolutionized global proteomic profiling. Although high molecular weight abundant proteins are readily sampled in global proteomics studies, less abundant low molecular weight proteins are often underrepresented. This includes biologically important classes of low molecular weight proteins including ligands, growth factors, peptide hormones and cytokines. Although extensive fractionation can facilitate achieving better coverage of proteome, it requires additional infrastructure, mass spectrometry time and labor. There is need for a simple method that can selectively deplete high molecular weight abundant proteins and enrich for low molecular weight less abundant proteins to improve their coverage in proteomics studies. We present a simple organic-solvent based protein precipitation method that selectively depletes high molecular weight proteins and enriches low molecular weight proteins in the soluble fraction. Using this strategy, we demonstrate identification of low molecular weight proteins that are generally underrepresented in proteomics datasets. In addition, we show the utility of this approach in identifying functional cleavage products from precursor proteins and low molecular weight short open reading frame proteins encoded by non-coding regions such as lncRNAs and UTRs. As the method does not require additional infrastructure, it can complement existing proteomics workflows to increase detection and coverage of low molecular weight proteins that are less abundant. This article is protected by copyright. All rights reserved.

Keywords: Low-molecular weight protein enrichment; proteolytically cleaved functional peptides; sORF

DOI: https://doi.org/10.1002/pmic.202100152