bims-arines Biomed News
on AI in evidence synthesis
Issue of 2024–12–01
seven papers selected by
Farhad Shokraneh



  1. Syst Rev. 2024 Nov 27. 13(1): 290
      The eighth meeting of the International Collaboration for the Automation of Systematic Reviews (ICASR) was held on September 7 and 8, 2023, at the University College London, London, England. ICASR is an interdisciplinary group whose goal is to maximize the use of technology for conducting rapid, accurate, and efficient evidence synthesis, e.g., systematic reviews, evidence maps, and scoping reviews of scientific evidence. In 2023, the major themes discussed were understanding the benefits and harms of automation tools that have become available in recent years, the advantages and disadvantages of large language models in evidence synthesis, and approaches to ensuring the validity of tools for the proposed task.
    Keywords:  Automation tools; ChatGPT; Evidence synthesis; Large language models; Systematic reviews
    DOI:  https://doi.org/10.1186/s13643-024-02666-2
  2. J Nurs Scholarsh. 2024 Nov 24.
       AIM: The aim of this study was to evaluate and compare artificial intelligence (AI)-based large language models (LLMs) (ChatGPT-3.5, Bing, and Bard) with human-based formulations in generating relevant clinical queries, using comprehensive methodological evaluations.
    METHODS: To interact with the major LLMs ChatGPT-3.5, Bing Chat, and Google Bard, scripts and prompts were designed to formulate PICOT (population, intervention, comparison, outcome, time) clinical questions and search strategies. Quality of the LLMs responses was assessed using a descriptive approach and independent assessment by two researchers. To determine the number of hits, PubMed, Web of Science, Cochrane Library, and CINAHL Ultimate search results were imported separately, without search restrictions, with the search strings generated by the three LLMs and an additional one by the expert. Hits from one of the scenarios were also exported for relevance evaluation. The use of a single scenario was chosen to provide a focused analysis. Cronbach's alpha and intraclass correlation coefficient (ICC) were also calculated.
    RESULTS: In five different scenarios, ChatGPT-3.5 generated 11,859 hits, Bing 1,376,854, Bard 16,583, and an expert 5919 hits. We then used the first scenario to assess the relevance of the obtained results. The human expert search approach resulted in 65.22% (56/105) relevant articles. Bing was the most accurate AI-based LLM with 70.79% (63/89), followed by ChatGPT-3.5 with 21.05% (12/45), and Bard with 13.29% (42/316) relevant hits. Based on the assessment of two evaluators, ChatGPT-3.5 received the highest score (M = 48.50; SD = 0.71). Results showed a high level of agreement between the two evaluators. Although ChatGPT-3.5 showed a lower percentage of relevant hits compared to Bing, this reflects the nuanced evaluation criteria, where the subjective evaluation prioritized contextual accuracy and quality over mere relevance.
    CONCLUSION: This study provides valuable insights into the ability of LLMs to formulate PICOT clinical questions and search strategies. AI-based LLMs, such as ChatGPT-3.5, demonstrate significant potential for augmenting clinical workflows, improving clinical query development, and supporting search strategies. However, the findings also highlight limitations that necessitate further refinement and continued human oversight.
    CLINICAL RELEVANCE: AI could assist nurses in formulating PICOT clinical questions and search strategies. AI-based LLMs offer valuable support to healthcare professionals by improving the structure of clinical questions and enhancing search strategies, thereby significantly increasing the efficiency of information retrieval.
    Keywords:  AI language models; artificial intelligence; clinical questions; evidence‐based practice; search strategies
    DOI:  https://doi.org/10.1111/jnu.13036
  3. J Clin Neurosci. 2024 Nov 28. pii: S0967-5868(24)00465-X. [Epub ahead of print]131 110926
       INTRODUCTION: Gliomas are the most common primary malignant intraparenchymal brain tumors with a dismal prognosis. With growing advances in artificial intelligence, machine learning and deep learning models are being utilized for preoperative, intraoperative and postoperative neurological decision-making. We aimed to compile published literature in one format and evaluate the quality of level 1a evidence currently available.
    METHODOLOGY: Using PRISMA guidelines, a comprehensive literature search was conducted within databases including Medline, Scopus, and Cochrane Library, and records with the application of artificial intelligence in glioma management were included. The AMSTAR 2 tool was used to assess the quality of systematic reviews and meta-analyses by two independent researchers.
    RESULTS: From 812 studies, 23 studies were included. AMSTAR II appraised most reviews as either low or critically low in quality. Most reviews failed to deliver in critical domains related to the exclusion of studies, appropriateness of meta-analytical methods, and assessment of publication bias. Similarly, compliance was lowest in non-critical areas related to study design selection and the disclosure of funding sources in individual records. Evidence is moderate to low in quality in reviews on multiple neuro-oncological applications, low quality in glioma diagnosis and individual molecular markers like MGMT promoter methylation status, IDH, and 1p19q identification, and critically low in tumor segmentation, glioma grading, and multiple molecular markers identification.
    CONCLUSION: AMSTAR 2 is a robust tool to identify high-quality systematic reviews. There is a paucity of high-quality systematic reviews on the utility of artificial intelligence in glioma management, with some demonstrating critically low quality. Therefore, caution must be exercised when drawing inferences from these results.
    Keywords:  AMSTAR 2; Artificial Intelligence; Brain Tumour; Glioma; Machine learning; Prognosis
    DOI:  https://doi.org/10.1016/j.jocn.2024.110926
  4. Eur J Intern Med. 2024 Nov 28. pii: S0953-6205(24)00436-9. [Epub ahead of print]
    AIMES – AI for Meta-Analysis and Evidence Synthesis
      
    Keywords:  All-cause mortality; Artificial intelligence; ChatGPT; Evidence synthesis; GLP-1 receptor agonists; Meta-analysis; Systematic review
    DOI:  https://doi.org/10.1016/j.ejim.2024.10.017
  5. JCO Clin Cancer Inform. 2024 Dec;8 e2400150
       PURPOSE: Extracting inclusion and exclusion criteria in a structured, automated fashion remains a challenge to developing better search functionalities or automating systematic reviews of randomized controlled trials in oncology. The question "Did this trial enroll patients with localized disease, metastatic disease, or both?" could be used to narrow down the number of potentially relevant trials when conducting a search.
    METHODS: Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. Five hundred trials were used to develop and validate three different models, with 100 trials being stored away for testing. The test set was also used to evaluate the performance of GPT-4o in the same task.
    RESULTS: In the test set, a rule-based system using regular expressions achieved F1 scores of 0.72 for the prediction of whether the trial allowed for the inclusion of patients with localized disease and 0.77 for metastatic disease. A transformer-based machine learning (ML) model achieved F1 scores of 0.97 and 0.88, respectively. A combined approach where the rule-based system was allowed to over-rule the ML model achieved F1 scores of 0.97 and 0.89, respectively. GPT-4o achieved F1 scores of 0.87 and 0.92, respectively.
    CONCLUSION: Automatic classification of cancer trials with regard to the inclusion of patients with localized and/or metastatic disease is feasible. Turning the extraction of trial criteria into classification problems could, in selected cases, improve text-mining approaches in evidence-based medicine. Increasingly large language models can reduce or eliminate the need for previous training on the task at the expense of increased computational power and, in turn, cost.
    DOI:  https://doi.org/10.1200/CCI-24-00150
  6. PLoS One. 2024 ;19(11): e0311358
       BACKGROUND AND METHODS: Systematic reviews, i.e., research summaries that address focused questions in a structured and reproducible manner, are a cornerstone of evidence-based medicine and research. However, certain steps in systematic reviews, such as data extraction, are labour-intensive, which hampers their feasibility, especially with the rapidly expanding body of biomedical literature. To bridge this gap, we aimed to develop a data mining tool in the R programming environment to automate data extraction from neuroscience in vivo publications. The function was trained on a literature corpus (n = 45 publications) of animal motor neuron disease studies and tested in two validation corpora (motor neuron diseases, n = 31 publications; multiple sclerosis, n = 244 publications).
    RESULTS: Our data mining tool, STEED (STructured Extraction of Experimental Data), successfully extracted key experimental parameters such as animal models and species, as well as risk of bias items like randomization or blinding, from in vivo studies. Sensitivity and specificity were over 85% and 80%, respectively, for most items in both validation corpora. Accuracy and F1-score were above 90% and 0.9 for most items in the validation corpora, respectively. Time savings were above 99%.
    CONCLUSIONS: Our text mining tool, STEED, can extract key experimental parameters and risk of bias items from the neuroscience in vivo literature. This enables the tool's deployment for probing a field in a research improvement context or replacing one human reader during data extraction, resulting in substantial time savings and contributing towards the automation of systematic reviews.
    DOI:  https://doi.org/10.1371/journal.pone.0311358
  7. J Eur Acad Dermatol Venereol. 2024 Dec;38(12): 2213-2214
      
    DOI:  https://doi.org/10.1111/jdv.20354