bims-arines 2025-05-25 papers

Stud Health Technol Inform. 2025 May 15. 327 904-905

Large Language Model-Assisted Systematic Review: Validation Based on Cochrane Review Data.

Large Language Models (LLMs) offer potential for automating systematic reviews, a labor-intensive process in evidence-based medicine. We evaluated GPT-4o, GPT-4o-mini, and Llama 3.1:8B on abstract screening and risk of bias assessment using 12 Cochrane drug intervention reviews. GPT-4o achieved the best screening performance (recall 0.894, precision 0.492). We propose a one-shot inclusivity adjustment method enabling threshold modulation without repeated inferences. For risk of bias, accuracy varied by domain, highest in random sequence generation (0.873), and lowest in selective reporting (0.418). Our findings demonstrate LLMs' practical utility and current limitations in automating systematic reviews.

Keywords: Abstract Screening; Large Language Models; Risk of Bias; Systematic Review

DOI: https://doi.org/10.3233/SHTI250501

J Am Med Inform Assoc. 2025 Jun 01. 32(6): 983-984

Harnessing the power of large language models for clinical tasks and synthesis of scientific literature.

Suzanne Bakken.

DOI: https://doi.org/10.1093/jamia/ocaf071

Nature. 2025 May;641(8064): 852

AI-generated literature reviews threaten scientific progress.

Xusen Cheng, Lulu Zhang.

Keywords: Machine learning

DOI: https://doi.org/10.1038/d41586-025-01603-0