bims-arines Biomed News
on AI in evidence synthesis
Issue of 2024–11–17
four papers selected by
Farhad Shokraneh



  1. Value Health. 2024 Nov 11. pii: S1098-3015(24)06754-8. [Epub ahead of print]
    ISPOR Working Group on Generative AI
       OBJECTIVE: To provide an introduction to the uses of generative Artificial Intelligence (AI) and foundation models, including large language models (LLMs), in the field of health technology assessment (HTA).
    METHODS: We reviewed applications of generative AI in three areas: systematic literature reviews, real world evidence (RWE) and health economic modeling.
    RESULTS: (1) Literature reviews: generative AI has the potential to assist in automating aspects of systematic literature reviews by proposing search terms, screening abstracts, extracting data and generating code for meta-analyses; (2) Real World Evidence (RWE): generative AI can facilitate automating processes and analyze large collections of real-world data (RWD) including unstructured clinical notes and imaging; (3) Health economic modeling: generative AI can aid in the development of health economic models, from conceptualization to validation. Limitations in the use of foundation models and LLMs include challenges surrounding their scientific rigor and reliability, the potential for bias, implications for equity, as well as nontrivial concerns regarding adherence to regulatory and ethical standards, particularly in terms of data privacy and security. Additionally, we survey the current policy landscape and provide suggestions for HTA agencies on responsibly integrating generative AI into their workflows, emphasizing the importance of human oversight and the fast-evolving nature of these tools.
    CONCLUSIONS: While generative AI technology holds promise with respect to HTA applications, it is still undergoing rapid developments and improvements. Continued careful evaluation of their applications to HTA is required. Both developers and users of research incorporating these tools, should familiarize themselves with their current capabilities and limitations.
    Keywords:  Artificial Intelligence; Economic Modeling Methods; Generative AI; Large Language Models; Real World Evidence; Systematic Reviews
    DOI:  https://doi.org/10.1016/j.jval.2024.10.3846
  2. J Hand Surg Eur Vol. 2024 Nov 14. 17531934241295493
      The aim of the present study was to train a natural language processing model to recognize key text elements from research abstracts related to hand surgery, enhancing the efficiency of systematic review screening. A sample of 1600 abstracts from a systematic review of distal radial fracture treatment outcomes was annotated to train the natural language processing model. To assess time-saving potential, 200 abstracts were processed by the trained models in two experiments, where reviewers accessed natural language processing predictions to include or exclude articles. The natural language processing model achieved an overall accuracy of 0.91 in recognizing key text elements, excelling in identifying study interventions. Use of the natural language processing reduced mean screening time by 31% without compromising accuracy. Precision varied, improving in the second experiment, indicating context-dependent performance. These findings suggest that natural language processing models can streamline abstract screening in systematic reviews by effectively identifying original research and extracting relevant text elements.Level of evidence: IV.
    Keywords:  NLP; abstract; efficiency; natural language processing; screening; systematic review
    DOI:  https://doi.org/10.1177/17531934241295493
  3. Arthroscopy. 2024 Nov 07. pii: S0749-8063(24)00883-1. [Epub ahead of print]
       PURPOSE: The purpose of the study is to demonstrate the value of custom methods, namely Retrieval Augmented Generation(RAG)-based Large Language Models(LLMs) and Agentic Augmentation, over standard LLMs in delivering accurate information using an anterior cruciate ligament(ACL) injury case.
    METHODS: A set of 100 questions and answers based on the 2022 AAOS ACL guidelines were curated. Closed-source(Open AI GPT4/GPT 3.5 and Anthropic's Claude3) and open-source models(LLama3 8b/70b and Mistral8x7b) were asked questions in base form and again with AAOS guidelines embedded into a RAG system. The top-performing models were further augmented with Artificial Intelligence(AI) Agents and re-evaluated. Two fellowship-trained surgeons blindly evaluated the accuracy of the responses of each cohort. ROUGE and METEOR scores were calculated to assess semantic similarity in the response.
    RESULTS: All non-custom LLM models started below 60% accuracy. Applying RAG improved the accuracy of every model by an average 39.7%. The highest performing model with just RAG was Meta's Open-Source Llama3 70b(94%). The highest performing model with RAG and AI Agents was Open AI's GPT4(95%).
    CONCLUSION: RAG improved accuracy by an average of 39.7%, with the highest accuracy rate of 94% in the Meta Llama3 70b. Incorporating AI agents into a previously RAG-augmented LLM improved ChatGPT4 accuracy rate to 95%. Thus, Agentic and RAG augmented LLMs can be accurate liaisons of information, supporting our hypothesis.
    CLINICAL RELEVANCE: Despite literature surrounding the use of LLM in medicine, there has been considerable and appropriate skepticism given the variably accurate response rates. This study establishes the groundwork to identify whether custom modifications to LLMs using RAG and Agentic augmentation can better deliver accurate information in orthopaedic care. With this knowledge, online medical information commonly sought in popular LLMs, such as ChatGPT, can be standardized and provide relevant online medical information to better support shared decision making between surgeon and patient.
    DOI:  https://doi.org/10.1016/j.arthro.2024.10.042
  4. Nature. 2024 Nov;635(8038): 276-278
      
    Keywords:  Lab life; Machine learning; Research data; Research management
    DOI:  https://doi.org/10.1038/d41586-024-03676-9