bims-dinmec Biomed News
on DNA methylation in cancer
Issue of 2026–05–24
four papers selected by
Lorena Ancona, Humanitas Research



  1. Clin Epigenetics. 2026 May 22.
       BACKGROUND: Gastric cancer remains one of the most prevalent malignancies globally. As early-stage gastric cancer is typically asymptomatic or presents with non-specific symptoms, most patients are diagnosed at advanced stages, leading to poor survival outcomes. Effective early detection strategies are important for reducing gastric cancer-related mortality. In this study, we developed a non-invasive assay utilizing cell-free DNA to distinguish patients with early-stage gastric cancer from healthy individuals.
    RESULTS: We performed low-depth whole genome sequencing to profile cell-free DNA and extracted three distinct features: fragment size patterns, coverage at transcription factor binding sites, and methylation-based profiles. These features were integrated via machine learning to construct a stacked ensemble model. The study included a training cohort (108 gastric cancer patients and 108 healthy controls), a temporally independent validation cohort (79 patients and 79 healthy controls), and an external validation cohort recruited from two independent centers (136 patients and 136 healthy controls). The ensemble model demonstrated robust performance, achieving area under the curve values of 0.986, 0.978, and 0.967 in the training, validation, and external cohorts, respectively. Specificity and sensitivity were 98.1% and 89.8% in the training cohort, 97.5% and 87.6% in the validation cohort, and 96.3% and 87.5% in the external cohort. Notably, the sensitivity for detecting stage I gastric cancer exceeded 85% across all cohorts.
    CONCLUSIONS: By integrating multi-dimensional cell-free DNA fragmentomic features, this assay provides accurate, non-invasive detection of gastric cancer, particularly at early stages. While its performance was high, the specificity reported here may be overestimated due to the use of a strictly screened healthy control group. Nevertheless, this fragmentomic-based approach represents a promising tool to complement existing screening strategies, potentially improving early diagnosis rates.
    Keywords:  Cell-free DNA; Early detection; Fragmentomics; Gastric cancer
    DOI:  https://doi.org/10.1186/s13148-026-02150-9
  2. Genome Med. 2026 May 19. pii: 66. [Epub ahead of print]18(1):
       BACKGROUND: Machine-learning (ML) driven molecular diagnostics based on omics data has a potential to revolutionize personalized medicine. However, implementation of ML into diagnostic protocols is hindered by methodological challenges which often lead to inflated performance assessment of models during development followed by poor performance of these models in implementation phase. Here, we aimed to develop and validate a pan-cancer classification framework based on DNA methylation data, that addresses methodological challenges of omics data powered ML.
    METHODS: We curated a primary dataset of DNA methylation profiles for 10756 samples, that included 54 healthy and cancer tissue types and validation dataset comprising data for 2306 samples from 28 independent studies. The classification framework was build using custom biomarkers selection strategy based on effect size metric that considers variance and class imbalance. The ML models were trained, tuned and evaluated using nested cross-validation approach. Local outlier factor algorithm was built into the inference pipelines to identify and filter samples displaying technical or biological anomalies. Additionally, for methodological validation of our framework we used methylation profiles for 3905 central nervous system (CNS) tumors.
    RESULTS: We found that relatively simple ML models outperformed complex algorithms such as deep neural network. A logistic regression classifier achieved a balanced accuracy (BACC) of 0.90 to classify 54 cancer and healthy tissue types using methylation levels at 1208 CpG sites. Similarly, our CNS tumor classifier also based on logistic regression algorithm reached a BACC of 0.94 across 59 CNS tumor subtypes. The anomaly filtering improved performance across all categories of samples tested.
    CONCLUSIONS: Our study demonstrates that DNA methylation profiling, when combined with carefully controlled ML practices allows for development of robust solutions that might substantially increase the efficacy of oncological diagnosis. Finally, we deployed our inference pipelines for public access via secure web platform - https://opp.pum.edu.pl/ .
    Keywords:  Biomarkers; Cancer; Classification; DNA methylation; Machine-learning
    DOI:  https://doi.org/10.1186/s13073-026-01650-w
  3. Clin Exp Med. 2026 May 17.
      Gastric cardia cancer (GCC) is an aggressive malignancy with poor prognosis, underscoring the need for better characterization of molecular alterations during early gastric cardia carcinogenesis. This study aimed to identify and validate tissue-based DNA methylation markers associated with early precancerous and neoplastic lesions of the gastric cardia. We integrated genome-wide DNA methylation data (850 K array) from 69 gastric cardia samples, including normal mucosa (n = 22), intestinal metaplasia (IM, n = 32), intraepithelial neoplasia (IEN, n = 7), and GCC (n = 8). Differential methylation analysis revealed stage-specific methylation patterns. Machine learning algorithms, including Least Absolute Shrinkage and Selection Operator (LASSO) regression and Random Forest, were used to refine candidate diagnostic biomarkers, followed by immunohistochemical validation of candidate gene expression in an independent cohort (n = 212). GCC progression showed increasing epigenetic dysregulation, with hyper-differentially methylated probes (DMPs) predominating in precancerous lesions (79.3-86.3%) and hypo-DMPs in GCC (87.7%). Hyper-DMPs were enriched in promoter-associated cytosine-phosphate-guanosine (CpG) islands (P < 0.001). Two DMPs, EDNRB_cg04390523 and SALL1_cg09016242, were consistently identified by both algorithms and showed good diagnostic accuracy (AUC = 0.947, 95% CI: 0.897-0.997) for distinguishing precancerous gastric cardia lesions and GCC from normal tissue in the integrated dataset. Consistent with methylation findings, EDNRB protein expression progressively decreased from normal to IM/IEN tissues (P < 0.001). This study identifies EDNRB and SALL1 promoter hypermethylation as promising tissue-based candidate biomarkers associated with early neoplastic transformation and provides a basis for further longitudinal and translational studies in gastric cardia precancerous lesions and cancer.
    Keywords:   EDNRB ; SALL1 ; DNA methylation; Gastric cardia cancer; Intestinal metaplasia; Intraepithelial neoplasia
    DOI:  https://doi.org/10.1007/s10238-026-02173-9
  4. Stud Health Technol Inform. 2026 May 21. 336 398-402
      Cervical cancer (CC) causes significant mortality due to late diagnosis and limited understanding of its molecular drivers. The complex gene co-expression patterns associated with CC remain poorly characterized. Identifying key genes that distinguish tumors from normal tissue can significantly improve early diagnosis and treatment strategies. This study combines Weighted Gene Co-expression Network Analysis (WGCNA) and machine learning to detect potential biomarkers. Bulk RNA-seq data from 228 cervical tumors and 49 normal tissues from the TCGA-CESC and GTEx datasets are analyzed. After batch correction with ComBat-seq, Differentially Expressed Genes (DEGs) are identified using |log2FC| > 2.0 and adjusted p < 0.05. These DEGs are grouped into modules using WGCNA, which are then linked to cervical cancer traits. Key genes are selected based on strong module membership and gene-trait significance. Gene Ontology (GO) analysis is performed, and then a random forest model is trained to identify biomarker genes. WGCNA groups 5,165 DEGs into four modules: Green (1,320), Black (150), Blue (1,686), and Yellow (258), excluding unassigned genes. The Green and Blue modules show strong correlations with CC. Within these modules, 513 genes in the green module and 12 genes in the blue module meet the selection criteria as candidate genes. GO analysis reveals that these key genes are associated with muscle cell differentiation, tissue migration, renal system development, cell adhesion, and vascular processes. Finally, the random forest model identifies LRRN4CL, CNRIP1, and CDCA3 as the top genes for distinguishing tumors from normal samples. This study identifies key gene modules and biomarker genes strongly linked to cervical cancer. These genes reveal critical biological processes involved in tumor progression and have potential for early diagnosis and targeted therapies.
    Keywords:  Cervical cancer; cancer detection; gene expression; machine learning; weighted gene co-expression network analysis
    DOI:  https://doi.org/10.3233/SHTI260185