Bioinform Adv. 2026 ;6(1):
vbaf316
Motivation: Understanding the role of DNA methylation in oncogenesis, diagnosis, and treatment requires data sufficient in size and accuracy, but current epigenetic data is limited, especially for population groups underrepresented in research. We propose a framework for generating highly accurate DNA methylation predictions using classified mixed model prediction, incorporating a step to cluster patients into cross-cancer and cross-race groups.
Results: Simulations show our framework more accurately predicts underlying mixed effects compared to regression prediction and naive estimates, extending previous work to the case where clusters are estimated from the data. We illustrate this framework using data from The Cancer Genome Atlas, uncovering clustering patterns and generating DNA methylation predictions for further analysis. Our work demonstrates how shared random effects can be leveraged to borrow strength across observations with similar methylation patterns.
Availability and implementation: The methods are implemented in R and available at: https://github.com/nidhipai/dnam_cmmp.