bims-arines Biomed News
on AI in evidence synthesis
Issue of 2025–08–24
three papers selected by
Farhad Shokraneh, Systematic Review Consultants LTD



  1. Syst Rev. 2025 Aug 18. 14(1): 167
       BACKGROUND: Systematic reviews (SRs) are a cornerstone in providing high-quality evidence that guides policy and practice across various disciplines. Despite their critical role, SRs require substantial financial investment and are constrained by time-consuming manual processes. Existing solutions primarily focus on semi-automating the title and abstract screening stages, yet these approaches still face limitations in terms of efficiency and practicality. The SR process comprises several stages beyond abstract screening, each of which is resource-intensive. To overcome these challenges, this paper introduces ReviewGenie, a novel system that automates SR stages up to and including abstract screening, utilizing artificial intelligence.
    METHOD: The SR process involves eight key stages, beginning with the definition of search keywords and the selection of target databases, and culminating in full screening. While the initial and final stages require human expertise, the intermediate stages can be automated. ReviewGenie automates all intermediary stages, including database searching, data retrieval, cleaning, deduplication, filtering, and abstract screening. The system is domain-agnostic, as evidenced by a case study focused on databases related to speech and language disorders.
    RESULTS: ReviewGenie significantly reduces the workload across various stages of the SR process, delivering notable time and cost savings while enhancing efficiency and accuracy. In the case study, where the article-fetching stage involved tens of thousands of publications, ReviewGenie achieved a 2.62% improvement in duplicate detection in less than a second, compared to the 1 to 3 h typically required for manual deduplication of 100 records. This process included cleaning abstracts before removing duplicates. Additionally, ReviewGenie reduced the number of articles from 28,674 to 3520 using an automatic filtering approach executed in seconds. This substantial reduction underscores the effectiveness of our automated method in preparing datasets for the abstract screening stage. Moreover, the artificial intelligence-driven abstract screening method resulted in cost savings exceeding $6230 compared to manual methods.
    CONCLUSIONS: ReviewGenie represents a significant advancement in reducing the burden on researchers conducting comprehensive systematic reviews. By automating intermediate stages, ReviewGenie enhances efficiency, accuracy, and cost-effectiveness, establishing itself as an indispensable tool for SRs across various disciplines.
    Keywords:  Automatic; LLM; ReviewGenie; Screening; Speech and language disorders; Systematic review
    DOI:  https://doi.org/10.1186/s13643-025-02895-z
  2. Obstet Gynecol Sci. 2025 Aug 18.
      This study aimed to explore the utility of chat generative pre-trained transformer (ChatGPT) in streamlining statistical analyses within medical research, evaluating its capabilities in data management, exploratory data analysis (EDA), statistical test selection, and result interpretation. It also addresses the critical need for appropriate disclosures and ethical considerations when integrating artificial intelligence (AI) tools into a scientific workflow. We review the current landscape of AI adoption in medical research, focusing on the role of ChatGPT in statistical analysis. Practical examples from lecture materials demonstrate its application in generating virtual datasets, performing data cleaning, conducting EDA, and assisting in the selection of appropriate statistical tests. Furthermore, guidelines for transparently disclosing AI tool usage in scientific manuscripts in accordance with the International Committee of Medical Journal Editors recommendations are discussed. ChatGPT demonstrates significant potential for accelerating various stages of statistical analysis, from initial data preparation to the interpretation of results. Its ability to rapidly generate virtual data for practice, assist in comprehensive data cleaning, and provide immediate insights through EDA can substantially enhance research efficiency. Although capable of suggesting statistical methods and interpreting outputs, human intervention remains crucial for verifying assumptions and ensuring calculation accuracy. ChatGPT can serve as a powerful assistant in medical statistical analyses, enabling researchers to conduct analyses more efficiently. However, its use requires careful data preprocessing, human verification of results, and transparent reporting to maintain scientific rigor and reproducibility. Adherence to ethical guidelines and journal policies regarding AI tool disclosure is paramount.
    Keywords:  AI; Artificial intelligence; ChatGPT; Medical statistics; Statistical analysis
    DOI:  https://doi.org/10.5468/ogs.25232
  3. Cureus. 2025 Jul;17(7): e88167
      Background Artificial intelligence (AI) is increasingly being used in healthcare, particularly for interpreting complex medical queries. However, conventional AI models often generate inaccurate or irrelevant responses that are commonly termed hallucinations, which may compromise patient safety. To address this, our study introduces a modified retrieval-augmented generation (RAG) framework tailored for the urology domain to enhance contextual relevance and accuracy in AI-generated responses. Methodology We developed a context-aware RAG system integrating PubMedBERT embeddings for encoding and retrieving urological literature stored in a Pinecone vector database. The system uses named entity recognition for domain-specific query filtering and incorporates dynamic memory to retain contextual flow during interactions. Response generation is powered by the LLaMA3-8B model via LangChain. A custom dataset of urology-related queries was used for evaluation, with a large language model-based scoring using the Deepseek-R1 model. Results The proposed framework demonstrated a significant reduction in hallucinations, with responses being more contextually relevant and evidence-based. Compared to baseline models, our system achieved an 89% performance improvement in generating medically appropriate answers. Integration of memory modules and named entity filtering further improved precision and reliability. Conclusions Our RAG-enhanced system shows strong potential for clinical use by producing trustworthy, context-aware responses in urology. It addresses key challenges in medical AI, including hallucination mitigation and domain relevance. Future work will focus on reducing inference latency and improving automated validation without manual oversight.
    Keywords:  artificial intelligence; context awareness; hallucinations in ai; medical ai; retrieval-augmented generation; urology
    DOI:  https://doi.org/10.7759/cureus.88167