J Clin Epidemiol. 2018 Jul 05. pii: S0895-4356(18)30085-4. [Epub ahead of print]
AIMS: Despite their essential role in collecting and organizing published medical literature, indexed search engines are unable to cover all relevant knowledge. Hence, current literature recommends the inclusion of clinical trial registries in systematic reviews. This study aims to provide an automated approach to extend a search on PubMed to the ClinicalTrials.gov database, relying on text mining and machine learning techniques.
STUDY DESIGN AND SETTING: The procedure starts from a literature search on PubMed. Next, it considers the training of a classifier that can identify documents with a comparable word characterization in the ClinicalTrials.gov clinical trial repository. Fourteen systematic reviews, covering a broad range of health conditions, are used as case studies for external validation. A cross-validated support-vector machine model was used as the classifier.
RESULTS: The sensitivity was 100% in all systematic reviews except one (87.5%), and the specificity ranged from 97.2 to 99.9%. The ability of the instrument to distinguish on-topic from off-topic articles ranged from an AUC of 93.4 to 99.9%.
CONCLUSION: The proposed machine learning instrument has the potential to help researchers identify relevant studies in the systematic review process by reducing workload, without losing sensitivity and at a small price in terms of specificity.
Keywords: Clinical Trial Registry; Indexed Search Engine; Machine Learning; Meta-Analysis; Systematic Review; Text Mining