bims-arines Biomed News
on AI in evidence synthesis
Issue of 2024–12–15
four papers selected by
Farhad Shokraneh



  1. JMIR Form Res. 2024 Dec 09. 8 e55827
       BACKGROUND: Systematic reviews and meta-analyses are important to evidence-based medicine, but the information retrieval and literature screening procedures are burdensome tasks. Rapid Medical Evidence Synthesis (RMES; Deloitte Tohmatsu Risk Advisory LLC) is a software designed to support information retrieval, literature screening, and data extraction for evidence-based medicine.
    OBJECTIVE: This study aimed to evaluate the accuracy of RMES for literature screening with reference to published systematic reviews.
    METHODS: We used RMES to automatically screen the titles and abstracts of PubMed-indexed articles included in 12 systematic reviews across 6 medical fields, by applying 4 filters: (1) study type; (2) study type + disease; (3) study type + intervention; and (4) study type + disease + intervention. We determined the numbers of articles correctly included by each filter relative to those included by the authors of each systematic review. Only PubMed-indexed articles were assessed.
    RESULTS: Across the 12 reviews, the number of articles analyzed by RMES ranged from 46 to 5612. The number of PubMed-cited articles included in the reviews ranged from 4 to 47. The median (range) percentage of articles correctly labeled by RMES using filters 1-4 were: 80.9% (57.1%-100%), 65.2% (34.1%-81.8%), 70.5% (0%-100%), and 58.6% (0%-81.8%), respectively.
    CONCLUSIONS: This study demonstrated good performance and accuracy of RMES for the initial screening of the titles and abstracts of articles for use in systematic reviews. RMES has the potential to reduce the workload involved in the initial screening of published studies.
    Keywords:  RMES; Rapid Medical Evidence Synthesis; artificial intelligence; automated literature screening; natural language processing; randomized controlled trials; systematic reviews; text mining
    DOI:  https://doi.org/10.2196/55827
  2. J Med Internet Res. 2024 Dec 11. 26 e56863
       BACKGROUND: Systematic reviews (SRs) are considered the highest level of evidence, but their rigorous literature screening process can be time-consuming and resource-intensive. This is particularly challenging given the rapid pace of medical advancements, which can quickly make SRs outdated. Few-shot learning (FSL), a machine learning approach that learns effectively from limited data, offers a potential solution to streamline this process. Sentence-bidirectional encoder representations from transformers (S-BERT) are particularly promising for identifying relevant studies with fewer examples.
    OBJECTIVE: This study aimed to develop a model framework using FSL to efficiently screen and select relevant studies for inclusion in SRs, aiming to reduce workload while maintaining high recall rates.
    METHODS: We developed and validated the FSL model framework using 9 previously published SR projects (2016-2018). The framework used S-BERT with titles and abstracts as input data. Key evaluation metrics, including workload reduction, cosine similarity score, and the number needed to screen at 100% recall, were estimated to determine the optimal number of eligible studies for model training. A prospective evaluation phase involving 4 ongoing SRs was then conducted. Study selection by FSL and a secondary reviewer were compared with the principal reviewer (considered the gold standard) to estimate the false negative rate.
    RESULTS: Model development suggested an optimal range of 4-12 eligible studies for FSL training. Using 4-6 eligible studies during model development resulted in similarity thresholds for 100% recall, ranging from 0.432 to 0.636, corresponding to a workload reduction of 51.11% (95% CI 46.36-55.86) to 97.67% (95% CI 96.76-98.58). The prospective evaluation of 4 SRs aimed for a 50% workload reduction, yielding numbers needed to screen 497 to 1035 out of 995 to 2070 studies. The false negative rate ranged from 1.87% to 12.20% for the FSL model and from 5% to 56.48% for the second reviewer compared with the principal reviewer.
    CONCLUSIONS: Our FSL framework demonstrates the potential for reducing workload in SR screening by over 50%. However, the model did not achieve 100% recall at this threshold, highlighting the potential for omitting eligible studies. Future work should focus on developing a web application to implement the FSL framework, making it accessible to researchers.
    Keywords:  S-BERT; deep learning; few shots learning; natural language processing; sentence-bidirectional encoder representations from transformers; study selection; systematic review
    DOI:  https://doi.org/10.2196/56863
  3. J Cancer Educ. 2024 Dec 14.
      Artificial intelligence and natural language processing tools have shown promise in oncology by assisting with medical literature retrieval and providing patient support. The potential for these technologies to generate inaccurate yet seemingly correct information poses significant challenges. This study evaluates the effectiveness, benefits, and limitations of ChatGPT for clinical use in conducting literature reviews of radiation oncology treatments. This cross-sectional study used ChatGPT version 3.5 to generate literature searches on radiotherapy options for seven tumor sites, with prompts issued five times per site to generate up to 50 publications per tumor type. The publications were verified using the Scopus database and categorized as correct, irrelevant, or non-existent. Statistical analysis with one-way ANOVA compared the impact factors and citation counts across different tumor sites. Among the 350 publications generated, there were 44 correct, 298 non-existent, and 8 irrelevant papers. The average publication year of all generated papers was 2011, compared to 2009 for the correct papers. The average impact factor of all generated papers was 38.8, compared to 113.8 for the correct papers. There were significant differences in the publication year, impact factor, and citation counts between tumor sites for both correct and non-existent papers. Our study highlights both the potential utility and significant limitations of using AI, specifically ChatGPT 3.5, in radiation oncology literature reviews. The findings emphasize the need for verification of AI outputs, development of standardized quality assurance protocols, and continued research into AI biases to ensure reliable integration into clinical practice.
    Keywords:  Artificial intelligence; Cancer; ChatGPT; Natural language processing; Radiation oncology
    DOI:  https://doi.org/10.1007/s13187-024-02547-1
  4. Campbell Syst Rev. 2024 Dec;20(4): e70009
      Objectives This is the protocol for a Campbell systematic review. The objectives are as follows: The first objective is to find and describe machine and statistical learning (ML) methods designed for moderator meta-analysis. The second objective is to find and describe applications of such ML methods in moderator meta-analyses of health, medical, and social science interventions. These two parts of the meta-review will primarily involve a systematic review and will be conducted according to guidelines specified by the Campbell Collaboration (MECCIR guidelines). The outcomes will be a list of ML methods that are designed for moderator meta-analysis (first objective), and a description of how (some of) these methods have been applied in the health, medical, and social sciences (second objective). The third objective is to examine how the ML methods identified in the meta-review can help researchers formulate new hypotheses or select among existing ones, and compare the identified methods to one another and to regular meta-regression methods for moderator analysis. To compare the performance of different moderator meta-analysis methods, we will apply the methods to data on tutoring interventions from two systematic reviews of interventions to improve academic achievement for students with or at risk-of academic difficulties, and to an independent test sample of tutoring studies published after the search period in the two reviews.
    Keywords:  machine learning; moderator analysis; tutoring; variable selection
    DOI:  https://doi.org/10.1002/cl2.70009