bims-arines Biomed News
on AI in evidence synthesis
Issue of 2024–10–06
eleven papers selected by
Farhad Shokraneh



  1. Hu Li Za Zhi. 2024 Oct;pii: JN.202410_71(5).04. [Epub ahead of print]71(5): 21-28
      The current uses, potential risks, and practical recommendations for using chat generative pre-trained transformers (ChatGPT) in systematic reviews (SRs) and meta-analyses (MAs) are reviewed in this article. The findings of prior research suggest that, for tasks such as literature screening and information extraction, ChatGPT can match or exceed the performance of human experts. However, for complex tasks such as risk of bias assessment, its performance remains significantly limited, underscoring the critical role of human expertise. The use of ChatGPT as an adjunct tool in SRs and MAs requires careful planning and the implementation of strict quality control and validation mechanisms to mitigate potential errors such as those arising from artificial intelligence (AI) 'hallucinations'. This paper also provides specific recommendations for optimizing human-AI collaboration in SRs and MAs. Assessing the specific context of each task and implementing the most appropriate strategies are critical when using ChatGPT in support of research goals. Furthermore, transparency regarding the use of ChatGPT in research reports is essential to maintaining research integrity. Close attention to ethical norms, including issues of privacy, bias, and fairness, is also imperative. Finally, from a human-centered perspective, this paper emphasizes the importance of researchers cultivating continuous self-iteration, prompt engineering skills, critical thinking, cross-disciplinary collaboration, and ethical awareness skills with the goals of: continuously optimizing human-AI collaboration models within reasonable and compliant norms, enhancing the complex-task performance of AI tools such as ChatGPT, and, ultimately, achieving greater efficiency through technological innovative while upholding scientific rigor.
    Keywords:  ChatGPT; human-AI collaboration; meta-analysis; practical recommendations; systematic review
    DOI:  https://doi.org/10.6224/JN.202410_71(5).04
  2. JBJS Rev. 2024 Oct 01. 12(10):
      » Generative artificial intelligence (AI), a rapidly evolving field, has the potential to revolutionize orthopedic care by enhancing diagnostic accuracy, treatment planning, and patient management through data-driven insights and personalized strategies.» Unlike traditional AI, generative AI has the potential to generate relevant information for orthopaedic surgeons when instructed through prompts, automating tasks such as literature reviews, streamlining workflows, predicting health outcomes, and improving patient interactions.» Prompt engineering is essential for crafting effective prompts for large language models (LLMs), ensuring accurate and reliable AI-generated outputs, and promoting ethical decision-making in clinical settings.» Orthopaedic surgeons can choose between various prompt types-including open-ended, focused, and choice-based prompts-to tailor AI responses for specific clinical tasks to enhance the precision and utility of generated information.» Understanding the limitations of LLMs, such as token limits, context windows, and hallucinations, is crucial for orthopaedic surgeons to effectively use generative AI while addressing ethical concerns related to bias, privacy, and accountability.
    DOI:  https://doi.org/e24.00122
  3. Hu Li Za Zhi. 2024 Oct;pii: JN.202410_71(5).05. [Epub ahead of print]71(5): 29-35
      Network meta-analysis (NMA), an increasingly appealing method of statistical analysis, is superior to traditional analysis methods in terms of being able to compare multiple medical treatment methods in one analysis run. In recent years, the prevalence of NMA in the medical literature has increased significantly, while advances in NMA-related statistical methods and software tools continue to improve the effectiveness of this approach. Various commercial and free statistical software packages, some of which employ generative artificial intelligence (GAI) to generate code, have been developed for NMA, leading to numerous innovative developments. In this paper, the use of generative AI for writing R programming language scripts and the netmeta package for performing NMA are introduced. Also, the web-based tool ShinyNMA is introduced. ShinyNMA allows users to conduct NMA using an intuitive "clickable" interface accessible via a standard web browser, with visual charts employed to present results. In the first section, we detail the netmeta package documentation and use ChatGPT (chat generative pre-trained transformer) to write the R scripts necessary to perform NMA with the netmeta package. In the second section, a user interface is developed using the Shiny package to create a ShinyNMA tool. This tool provides a no-code option for users unfamiliar with programming to conduct NMA statistical analysis and plotting. With appropriate prompts, ChatGPT can produce R scripts capable of performing NMA. Using this approach, an NMA analysis tool is developed that meets the research objectives, and potential applications are demonstrated using sample data. Using generative AI and existing statistical packages or no-code tools is expected to make conducting NMA analysis significantly easier for researchers. Moreover, greater access to results generated by NMA analyses will enable decision-makers to review analysis results intuitively in real-time, enhancing the importance of statistical analysis in medical decision-making. Furthermore, enabling non-specialists to conduct clinically meaningful systematic reviews may be expected to sustainably improve analytical capabilities and produce higher-quality evidence.
    Keywords:  ChatGPT; generative artificial intelligence; network meta-analysis
    DOI:  https://doi.org/10.6224/JN.202410_71(5).05
  4. Korean J Radiol. 2024 Oct;25(10): 869-873
      
    Keywords:  Artificial intelligence; Chatbot; Generative; Large language model; Prompt; Prompt engineering; Query; Stochasticity
    DOI:  https://doi.org/10.3348/kjr.2024.0695
  5. F1000Res. 2024 ;13 791
       Background: Large Language Models (LLMs), as in the case of OpenAI TM ChatGPT-4 TM Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalised through a fine-tuning process to meet the student demands on every particular subject, like statistics. Recently, OpenAI launched the possibility of fine-tuning their model with a natural language web interface, enabling the creation of customised GPT versions deliberately conditioned to meet the demands of a specific task.
    Methods: This preliminary research aims to assess the potential of the customised GPTs. After developing a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behaviour was evaluated and compared with that of ChatGPT-4 Turbo. Firstly, each professor collected 15-30 genuine student questions from "Statistics and Probability" and "Business Statistics" courses across seven degrees, primarily from second-year courses. These questions, often ambiguous and imprecise, were posed to ChatGPT-4 Turbo and BSVP, with their initial responses recorded without follow-ups. In the third stage, professors blindly evaluated the responses on a 0-10 scale, considering quality, depth, and personalization. Finally, a statistical comparison of the systems' performance was conducted.
    Results: The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP responded in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, when explicitly asked for something like, "I would like to practice a programming exercise similar to those in R practice 4," BSVP could provide a far superior response. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo.
    Conclusions: It appears that customised assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo.
    Keywords:  Artificial Intelligence; ChatGPT; customisation; higher education; statistics; virtual instructor
    DOI:  https://doi.org/10.12688/f1000research.153129.2
  6. Eur J Pain. 2024 Oct 03.
       BACKGROUND: The public release of ChatGPT in November 2022 sparked a boom and public interest in generative artificial intelligence (AI) that has led to journals and journal families hastily releasing generative AI policies, ranging from asking authors for acknowledgement or declaration to the outright banning of use.
    RESULTS: Here, we briefly discuss the basics of machine learning, generative AI, and how it will affect scientific publishing. We focus especially on potential risks and benefits to the scientific community as a whole and journals specifically.
    CONCLUSION: While the concerns of editors, for example about manufactured studies, are valid, some recently implemented or suggested policies will not be sustainable in the long run. The quality of generated text and code is quickly becoming so high that it will not only be impossible to detect the use of generative AI but would also mean taking a powerful tool away from researchers that can make their life easier every day.
    SIGNIFICANCE: We discuss the history and current state of AI and highlight its relevance for medical publishing and pain research. We provide guidance on how to act now to increase good scientific practice in the world of ChatGPT and call for a task force focusing on improving publishing pain research with use of generative AI.
    DOI:  https://doi.org/10.1002/ejp.4736
  7. Proc Natl Acad Sci U S A. 2024 Oct 08. 121(41): e2322420121
      The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that to develop a holistic understanding of these systems, we must consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts, we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. Using this approach-which we call the teleological approach-we identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. To test our predictions, we evaluate five LLMs (GPT-3.5, GPT-4, Claude 3, Llama 3, and Gemini 1.0) on 11 tasks, and we find robust evidence that LLMs are influenced by probability in the hypothesized ways. Many of the experiments reveal surprising failure modes. For instance, GPT-4's accuracy at decoding a simple cipher is 51% when the output is a high-probability sentence but only 13% when it is low-probability, even though this task is a deterministic one for which probability should not matter. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system-one that has been shaped by its own particular set of pressures.
    Keywords:  artificial intelligence; cognitive science; large language models
    DOI:  https://doi.org/10.1073/pnas.2322420121
  8. JMIR Med Inform. 2024 Sep 30. 12 e64143
       Unlabelled: Cardiovascular drug development requires synthesizing relevant literature about indications, mechanisms, biomarkers, and outcomes. This short study investigates the performance, cost, and prompt engineering trade-offs of 3 large language models accelerating the literature screening process for cardiovascular drug development applications.
    Keywords:  AI; GPT; LLM; artificial intelligence ; biomarker; biomedical; biomedical informatics; cardio; cardiology; cardiovascular; cross-sectional study; drug; drug development; large language model; screening optimization
    DOI:  https://doi.org/10.2196/64143
  9. Artif Intell Med. 2024 Sep 26. pii: S0933-3657(24)00231-8. [Epub ahead of print]157 102989
      Systematic Review (SR) are foundational to influencing policies and decision-making in healthcare and beyond. SRs thoroughly synthesise primary research on a specific topic while maintaining reproducibility and transparency. However, the rigorous nature of SRs introduces two main challenges: significant time involved and the continuously growing literature, resulting in potential data omission, making most SRs become outmoded even before they are published. As a solution, AI techniques have been leveraged to simplify the SR process, especially the abstract screening phase. Active learning (AL) has emerged as a preferred method among these AI techniques, allowing interactive learning through human input. Several AL software have been proposed for abstract screening. Despite its prowess, how the various parameters involved in AL influence the software's efficacy is still unclear. This research seeks to demystify this by exploring how different AL strategies, such as initial training set, query strategies etc. impact SR automation. Experimental evaluations were conducted on five complex medical SR datasets, and the GLM model was used to interpret the findings statistically. Some AL variables, such as the feature extractor, initial training size, and classifiers, showed notable observations and practical conclusions were drawn within the context of SR and beyond where AL is deployed.
    Keywords:  Abstract screening; Active learning; Evidence-based medicine; Human-in-the-loop; Machine learning; Systematic reviews
    DOI:  https://doi.org/10.1016/j.artmed.2024.102989
  10. Korean J Radiol. 2024 Oct;25(10): 865-868
      
    Keywords:  Artificial intelligence; Chatbot; Checklist; Generative; Guideline; Healthcare; Large language model; Large multimodal model; Medicine; Radiology; Reporting
    DOI:  https://doi.org/10.3348/kjr.2024.0843
  11. 3D Print Addit Manuf. 2024 Aug;11(4): 1495-1509
      Bioprinting is a rapidly evolving field, as represented by the exponential growth of articles and reviews published each year on the topic. As the number of publications increases, there is a need for an automatic tool that can help researchers do more comprehensive literature analysis, standardize the nomenclature, and so accelerate the development of novel manufacturing techniques and materials for the field. In this context, we propose an automatic keyword annotation model, based on Natural Language Processing (NLP) techniques, that can be used to find insights in the bioprinting scientific literature. The approach is based on two main data sources, the abstracts and related author keywords, which are used to train a composite model based on (i) an embeddings part (using the FastText algorithm), which generates word vectors for an input keyword, and (ii) a classifier part (using the Support Vector Machine algorithm), to label the keyword based on its word vector into a manufacturing technique, employed material, or application of the bioprinted product. The composite model was trained and optimized based on a two-stage optimization procedure to yield the best classification performance. The annotated author keywords were then reprojected on the abstract collection to both generate a lexicon of the bioprinting field and extract relevant information, like technology trends and the relationship between manufacturing-material-application. The proposed approach can serve as a basis for more complex NLP-related analysis toward the automated analysis of the bioprinting literature.
    Keywords:  automatic author keyword annotation; bioprinting; literature analysis; natural language processing
    DOI:  https://doi.org/10.1089/3dp.2022.0316