Brief Bioinform. 2021 Dec 07. pii: bbab475. [Epub ahead of print]
Investigating differentially methylated regions (DMRs) presented in different tissues or cell types can help to reveal the mechanisms behind the tissue-specific gene expression. The identified tissue-/disease-specific DMRs also can be used as feature markers for spotting the tissues-of-origins of cell-free DNA (cfDNA) in noninvasive diagnosis. In recent years, many methods have been proposed to detect DMRs. However, due to the lack of benchmark DMRs, it is difficult for researchers to choose proper methods and select desirable DMR sets for downstream studies. The application of DMRs, used as feature markers, can be benefited by the longer length of DMRs containing more CpG sites when a threshold is given for the methylation differences of DMRs. According to this, two metrics ($Qn$ and $Ql$), in which the CpG numbers and lengths of DMRs with different methylation differences are weighted differently, are proposed in this paper to evaluate the DMR sets predicted by different methods on BS-seq data. DMR sets predicted by eight methods on both simulated datasets and real BS-seq datasets are evaluated by the proposed metrics, the benchmark-based metrics, and the enrichment analysis of biological data, including genomic features, transcription factors and histones. The rank correlation analysis shows that the $Qn$ and $Ql$ are highly correlated to the benchmark metrics for simulated datasets and the biological data enrichment analysis for real BS-seq data. Therefore, with no need for additional biological data, the proposed metrics can help researchers selecting a more suitable DMR set on a certain BS-seq dataset.
Keywords: BS-seq; differentially methylated regions; methylation difference; rank correlation analysis