Gynecol Obstet Fertil Senol. 2026 May 25. pii: S2468-7189(26)00139-X. [Epub ahead of print]
OBJECTIVE: Systematic reviews are a cornerstone of evidence-based medicine but remain time-consuming and prone to human error. Large Language Models (LLMs), such as ChatGPT or Claude, offer new opportunities for partial automation of these tasks. This articles aims to provide a critical synthesis of the current uses of LLMs across the key stages of systematic reviews and meta-analyses in healthcare.
METHODS: We conducted a narrative review based on a literature search in PubMed and Scopus (2019-2025), including empirical studies, scoping reviews, preprints, and technical reports discussing the use of LLMs in any stage of the systematic review process.
RESULTS: LLMs can support research question formulation, search strategy development, reference screening, data extraction, meta-analysis scripting, and result synthesis. Reported performances are often high, especially for screening and quantitative data extraction, with sensitivities of 95-98%. However, significant limitations persist: hallucinations, bias, misinterpretations, and variability across models. Independent validation remains scarce.
CONCLUSION: LLMs show promising potential to accelerate several stages of systematic reviews, provided their use is methodologically controlled. A semi-automated approach, combining AI capabilities with human expertise, currently appears the safest. Prompt structuring, result validation, and transparent reporting of AI involvement are essential to ensure the quality and reliability of the synthesized evidence.
Keywords: Artificial Intelligence; Evidence-Based Medicine; Grand modèle de langue; Intelligence artificielle; Large Language Models; Meta-Analysis as topic; Médecine factuelle; Méta-analyse comme sujet; Revue systématique; Systematic Review