BMC Med Res Methodol. 2025 Sep 29. 25(1): 219
BACKGROUND: Supervised learning can accelerate article screening in systematic reviews, but still requires labor-intensive manual annotation. While large language models (LLMs) like GPT-3.5 offer a rapid and convenient alternative, their reliability is challenging. This study aims to design an efficient and reliable annotation method for article screening.
METHODS: Given that relevant articles are typically a small subset of those retrieved articles during screening, we propose a human-LLM collaborative annotation method that focuses on verifying positive annotations made by the LLM. Initially, we optimized the prompt using a manually annotated standard dataset, refining it iteratively to achieve near-perfect recall for the LLM. Subsequently, the LLM, guided by the optimized prompt, annotated the articles, followed by human verification of the LLM-identified positive samples. This method was applied to screen articles on precision oncology randomized controlled trials, evaluating both its efficiency and reliability.
RESULTS: For prompt optimization, a standard dataset of 200 manually annotated articles was equally divided into a tuning set and a validation set (1:1 ratio). Through iterative prompt optimization, the LLM achieved near-perfect recall in the tuning and validation sets, with 100% and 85.71%, respectively. Using the optimized prompt, we conducted collaborative annotation. To evaluate its performance, we manually reviewed a random sample of 300 articles that had been annotated using the collaborative annotation method. The results showed that the collaborative annotation achieved an F1 score of 0.9583, reducing the annotation workload by approximately 80% compared to manual annotation alone. Additionally, we trained a BioBERT-based supervised model on the collaborative annotation data, which outperformed the model trained on data annotated solely by the LLM, further validating the reliability of the collaborative annotation method.
CONCLUSIONS: The human-LLM collaborative annotation method demonstrates potential for enhancing the efficiency and reliability of article screening, offering valuable support for systematic reviews and meta-analyses.
Keywords: Article screening; Human-LLM collaboration; Precision oncology randomized controlled trials