Hum Genomics. 2025 May 17. 19(1): 56
BACKGROUND: Ovarian cancer has the highest mortality rate among gynecological cancers, making early detection crucial, as the five-year survival rate drops from 92% with early-stage diagnosis compared to 31% with late-stage diagnosis. Current diagnostic methods such as histopathological examination and detection of cancer antigen 125 and human epididymis protein 4 biomarkers are either invasive or lack specificity and sensitivity. However, the Papanicolaou (Pap) test, which is widely used for cervical cancer screening, shows the potential for detecting ovarian cancer by identifying tumor DNA in cervical scrapings. Since aberrant DNA methylation patterns are linked to cancer progression, DNA methylation offers a promising avenue for early diagnosis. Therefore, this study aimed to develop a methylation-based machine-learning model to stratify patients with ovarian cancer from the cervical scraping samples collected via Pap test.
RESULTS: Cervical scrapings were collected by gynecologists using conventional Pap smears. In total, 160 samples were collected: 95 normal, 37 benign, and 28 malignant. Methylation data were generated using the Illumina Infinium MethylationEPIC BeadChip array, which contains approximately 850,000 CpG loci. Methylation data were initially divided into training and testing sets in a 3:1 ratio comprising 120 and 40 samples, respectively. A two-step methylation-based model was trained using the training data for classification: a principal component analysis (PCA) model, consisting of 30 features, to classify samples as normal or tumor; then a gradient boosting model, containing 16 features, to further stratify tumor samples as benign or malignant. The two-step model achieved an accuracy of 0.88 and an F1-score of 0.86 on the testing data. Furthermore, an over-representation analysis was conducted to explore the functions associated with genes mapped from differentially methylated positions (DMPs) in comparisons between normal and tumor samples, as well as between benign and malignant samples. These results suggest that DMPs may be associated with olfactory transduction when comparing normal versus tumor samples, and immune regulation when comparing benign and malignant samples.
CONCLUSIONS: Our two-step model shows promise for predicting ovarian cancer and suggests that cervical scrapings may be a viable alternative for sample collection during screening.
Keywords: Biomarker; Cancer screening; Epigenetics; Machine learning; Methylation; Ovarian cancer; Papanicolaou test (Pap test)