Comput Biol Chem. 2022 Feb 10. pii: S1476-9271(22)00020-2. [Epub ahead of print]97 107640
N6-methyladenosine (m6A) is one of the abundant post-transcription modification in cellular RNA. It regulates different biological processes, such as, protein synthesis, X-chromosome inactivation, cell stability, cell-reprogramming and miRNA regulation etc. Most recently, various studies claimed that mutations in m6A sites are linked with various diseases, such as, brain-tumor, heart attack, obesity and cancer. The correct identification of m6A sites is essential to overcome these diseases. However, the state-of-the-art predictors face many challenges for precise detection of m6A sites. Even for model organisms, such as Saccharomyces cerevisiae, the detection of m6A sites is difficult due to complex patterns surrounding the m6A sites. These patterns are not widely understood and lead to non-discriminative features for detecting m6A sites. To overcome this problem, we propose a novel predictor called m6A-Finder that creates features based on global and local sequence order. The global sequence order is captured by physical properties based features, while the local sequence order is captured by the statistical features. The fusion of these features results in high dimensional vector which lead to over-fitting, to solve this problem, we use mRMR algorithm to remove redundant features. The proposed technique is evaluated on the most widely used Saccharomyces cerevisiae species dataset. Overall, the m6A-Finder achieved an accuracy of 82.02%, the sensitivity of 82.10%, specificity of 81.94% and a Matthew correlation coefficient value of +0.64.
Keywords: Feature selection; M6A modification sites; M6A-Finder; MRMR; RNA; SVM