Stud Health Technol Inform. 2025 Aug 07. 329 239-243
Fabio Dennstädt,
Paul Windisch,
Irina Filchenko,
Johannes Zink,
Paul Martin Putora,
Ahmed Shaheen,
Roberto Gaio,
Nikola Cihoric,
Marie Wosny,
Stefanie Aeppli,
Max Schmerder,
Mohamed Shelan,
Janna Hastings.
BACKGROUND: Automated classification of medical literature is increasingly vital, especially in oncology. As shown in previous work, LLMs can be used as part of a flexible framework to accurately classify biomedical literature and trials. In the present study, we aimed to explore to what extent a consensus-based approach could improve classification performance.
METHODS: The three LLMs Mixtral-8x7B, Meta-Llama-3.1-70B, and Qwen2.5-72B were used to classify oncological trials across four data sets with nine questions. Metrics (accuracy, precision, recall, F1-score) were assessed for individual models and consensus results.
RESULTS: Consensus was achieved in 93.93% of cases, improving accuracy (98.34%), precision (97.01%), recall (98.11%), and F1-score (97.55%) over individual models.
CONCLUSIONS: The consensus-based LLM framework delivers high accuracy and adaptability for classifying oncological trials, with potential applications in biomedical research and trial management.
Keywords: knowledge synthesis; large language models; natural language processing; oncology; text classification