bims-helfai Biomed News
on AI in health care
Issue of 2026–01–25
eighteen papers selected by
Sergei Polevikov



  1. Gerontologist. 2026 Jan 22. pii: gnaf308. [Epub ahead of print]
       BACKGROUND AND OBJECTIVES: Artificial Intelligence (AI) combined with smart sensors or conversational agents are becoming part of our lives, and have the potential to improve ageing in place by supporting independent living. Trust and willingness to use AI seem essential for actual embedding. Explainable AI (XAI), originating from the recognition that AI infrastructures often operate in an opaque, "black boxed" way, might assist in understanding the underlying logic of AI-made decisions. However, it is unknown what older adults think about XAI and what they consider as needed explainability.
    RESEARCH DESIGN AND METHODS: I conducted 28 semi-structured interviews to explore XAI in the worlds of older adults. Inductive analysis was applied to analyse what do older adults know about AI and how do they imagine our society with AI? What is XAI to them and how do they value explainability?
    FINDINGS: The analysis resulted in nine themes. Four regarding the knowledge on AI of older adults, including definitions, knowledge acquisition, attitudes, and expectations. Five themes were leading for the views on XAI in which XAI is not desirable, XAI is necessary or collaboration is preferred.
    DISCUSSION AND IMPLICATIONS: The visions of XAI are different from current technological discourses. For older adults, XAI is not only technological, but a constellation between humans and machines. Most argue that a form of joint decision-making is important. As a follow-up, it seems recommend to explore the enactment of XAI in real-life, and investigate the form or degree of XAI needed and for whom.
    Keywords:  age in place; black box; decision-making; gerontechnology
    DOI:  https://doi.org/10.1093/geront/gnaf308
  2. Hosp Pediatr. 2026 Jan 22. pii: e2025008569. [Epub ahead of print]
       OBJECTIVE: Pediatric hospitalists manage increasing volumes of complex patients. Large language models (LLMs) may offer opportunities to reduce clinician workload through clinical documentation summarization. The objective of this study was to assess the quality of unedited LLM-generated discharge summaries compared with the quality of physician-authored discharge summaries.
    METHODS: Our study provided an anonymized, comparative evaluation of 35 unedited LLM-generated and 35 physician-authored discharge summaries graded by pediatric hospitalists and primary care pediatricians. Hospitalists used the validated Physician Documentation Quality Instrument (PDQI)-9, and primary care pediatricians used a shortened version of the instrument. Clinical Risk Group (CRG), length of stay, and primary documentation author training level were collected for each summary. Total and subdomain scores were compared along with the association of scores and clinical factors.
    RESULTS: Baseline encounter and documentation characteristics were similar between groups. LLM-generated discharge summaries were significantly longer than physician-authored discharge summaries (mean word count 403 vs 329, P < .001). Pediatric hospitalists rated the physician-authored summaries higher in overall score (27.4 vs 23.7, P < .001) and in all 9 PDQI subdomains. Primary care pediatricians rated physician-authored summaries higher in overall score (18.1 vs 15.6, P < .0001) and in 5 of 6 PDQI subdomains, with no significant difference in internal consistency. Spearman correlation showed an associated decrease in physician-authored score with increased CRG (ρ = -0.24, P = .01).
    CONCLUSIONS: Physicians outperformed LLMs in creating discharge summaries. Future studies should focus on the quality of physician-modified LLM-generated documentation and the effects on documentation quality, physician workload, and overall physician well-being.
    DOI:  https://doi.org/10.1542/hpeds.2025-008569
  3. J Educ Eval Health Prof. 2025 ;22 37
       PURPOSE: Artificial intelligence (AI)-driven simulation is an emerging approach in healthcare education that enhances learning effectiveness. This review examined its impact on the development of non-technical skills among medical learners.
    METHODS: Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a systematic review was conducted using the following databases: Web of Science, ScienceDirect, Scopus, and PubMed. The quality of the included studies was assessed using the Mixed.
    Methods: Appraisal Tool. The protocol was previously registered in PROSPERO (CRD420251038024).
    RESULTS: Of the 1,442 studies identified in the initial search, 20 met the inclusion criteria, involving 2,535 participants. The simulators varied considerably, ranging from platforms built on symbolic AI methods to social robots powered by computational AI. Among the 15 AI-driven simulators, 10 used ChatGPT or its variants as virtual patients. Several studies evaluated multiple non-technical skills simultaneously. Communication and clinical reasoning were the most frequently assessed skills, appearing in 12 and 6 studies, respectively, which generally reported positive outcomes. Improvements were also noted in decision-making, empathy, self-confidence, critical thinking, and problem-solving. In contrast, emotional regulation, assessed in a single study, showed no significant difference. Notably, none of the studies examined reflection, reflective practice, teamwork, or leadership.
    CONCLUSION: AI-driven simulation shows substantial potential for enhancing non-technical skills in medical education, particularly communication and clinical reasoning. However, its effects on several other non-technical skills remain unclear. Given heterogeneity in study designs and outcome measures, these findings should be interpreted cautiously. These considerations highlight the need for further research to support integrating this innovative approach into medical curricula.
    Keywords:  Artificial intelligence; Medical students; Non-technical skills; Simulation training
    DOI:  https://doi.org/10.3352/jeehp.2025.22.37
  4. J Gen Intern Med. 2026 Jan 21.
       BACKGROUND: AI chatbots are proliferating in healthcare systems. It is essential to explore how physicians use these tools in order to understand their influence on clinical care and outcomes. Our goal was to understand how physicians conceive of and incorporate AI into clinical decision-making.
    METHODS: We conducted semistructured interviews with generalist physicians from inpatient and outpatient settings in the USA. Prior to the interview, participants were asked to use an AI chatbot, ChatGPT-4, to complete three mock clinical cases. Physicians were interviewed regarding their perspectives on the AI chatbot. Interviews were analyzed using reflexive thematic analysis and conducted via video conference meeting, where they were recorded and transcribed.
    RESULTS: We interviewed 22 physicians with 2-32 years of experience (median = 3 years). We identified a central organizing concept of "physician as filter" defining how physicians used the AI chatbot. This idea was composed of four themes. Theme 1: Physicians perceive clinical decision-making as a problem-solving activity, applying internally held knowledge to externally gathered information. Theme 2: AI chatbot systems are part of a continuum of information resources. Theme 3: Trust in the AI chatbot's outputs depends on the user's own clinical knowledge. Theme 4: Clinical decision-making is understood as the personalization of clinical knowledge and context.
    CONCLUSIONS: AI chatbots may help physicians with formulating a clinical problem and generating a hypothesis by expanding their repertoire of possible cases. Despite the "wealth of information" provided by AI chatbots, physician trust in the outputs is limited, especially when AI chatbots do not provide references. Physician users described filtering chatbot outputs, using their own clinical knowledge and experience, to determine what information is relevant. In describing how providers perceive AI chatbots, we hope to guide further investigation of physician AI interaction and chatbot development that facilitates improved clinical reasoning.
    DOI:  https://doi.org/10.1007/s11606-025-10145-0
  5. Nurse Educ Pract. 2026 Jan 13. pii: S1471-5953(26)00026-0. [Epub ahead of print]91 104724
       AIMS: This systematic review aimed to synthesize the current evidence on artificial intelligence (AI)-enhanced clinical reasoning among nurse practitioners (NPs).
    BACKGROUND: NPs require strong clinical reasoning skills and AI-based tools may support the development of these competencies; however, empirical evidence regarding their effectiveness remains limited. The strengthened literature review clearly identifies a critical gap, the absence of a prior systematic review specifically examining AI-enhanced clinical reasoning among NPs and will provid a strong rationale for the present review.
    DESIGN: Systematic review following PRISMA 2020 guidelines.
    METHODS: Searches were conducted in PubMed, Embase and CINAHL through July 2025. Of the 429 records retrieved, 13 met inclusion criteria. Eligible studies examined AI interventions targeting clinical reasoning among NPs. Risk of bias was assessed using the Critical Appraisal Skills Programme checklists and Joanna Briggs Institute tools. Data were extracted on study design, population, AI application domain, outcomes and quality appraisal.
    RESULTS: Thirteen studies were included: seven quantitative quasi-experimental, intervention validation, or retrospective cohort studies; three qualitative studies; and three systematic reviews. AI applications ranged from real-time monitoring and decision-support systems to simulation platforms and large language models, which supported clinical reasoning domains such as data gathering, hypothesis generation, diagnostic justification and reflective judgment. Quantitative studies showed improvements in diagnostic accuracy, consistency, efficiency and data collection, while qualitative studies found that NPs view AI as a supportive tool that enhances diagnostic reasoning and patient-centered care, while emphasizing the need for transparency, interpretability and workflow integration.
    CONCLUSIONS: AI tools may strengthen NPs' clinical reasoning by improving diagnostic accuracy, decision consistency and care efficiency, but their safe use requires rigorous validation, standardized evaluation, ethical safeguards and digital literacy training. Limitations include heterogeneous AI applications across professional groups and a predominance of simulation-based evidence over real-world clinical evaluations.
    Keywords:  Artificial Intelligence; CDSS; Clinical decision support systems; Clinical reasoning; Nurse practitioners; Systematic review
    DOI:  https://doi.org/10.1016/j.nepr.2026.104724
  6. Front Public Health. 2025 ;13 1709611
       Background: Many individuals seek health-related guidance through ChatGPT OpenAI (San Francisco, CA, USA), due to its convenience and perceived reliability, often in place of, or as a supplement to, professional medical advice. This raises concerns about the accuracy of information provided and the potential for misinterpretation. On the other hand, ChatGPT offers a promising avenue for complementing traditional health prevention processes.
    Aims: This study aimed to develop and validate self-completion questionnaire among adults that evaluates the use of role of ChatGPT in primary and secondary health prevention, to explore the extent to which users utilize ChatGPT for disease prevention and health maintenance.
    Method: Questionnaire items were derived from a systematic literature review and comprised demographics, internet-use metrics, and validated items from the Brief Health Literacy Screening Tool. ChatGPT usage was structured into three domains: knowledge, attitudes, and behaviors. Test-retest reliability was quantified by Kendall's tau, and internal consistency by Cronbach's Alpha.
    Results: During the validation phase, the questionnaire was administered to a sample of 22 participants (16 female, six male), each of whom completed it twice, resulting in a total of 44 responses. Knowledge items demonstrated significant test-retest stability (Kendall's τ, p < 0.01). For behavior items, seven achieved perfect reliability (τ = 1.00), and five exceeded τ > 0.70. Attitude items similarly showed high stability, with three at τ = 1.00 and three above τ > 0.70. Internal consistency was acceptable (raw Cronbach's α = 0.771).
    Discussion: Our reliability analysis demonstrated that the items of the instrument exhibit good internal consistency, with Cronbach's Alpha values exceeding the commonly accepted threshold for exploratory research. Moreover, the questionnaire's design is inherently model-independent, allowing for its straightforward adaptation to assess user interactions with a variety of conversational artificial intelligence systems beyond ChatGPT.
    Conclusion: In conclusion, this study presents an initially validated questionnaire that captures how individuals employ ChatGPT for both primary and secondary disease prevention. The tool addresses key dimensions of artificial intelligence use and enables meaningful comparisons across populations with different social and educational backgrounds.
    Keywords:  ChatGPT; a large language model; disease prevention; self-completion questionnaire development; validation
    DOI:  https://doi.org/10.3389/fpubh.2025.1709611
  7. Comput Biol Med. 2026 Jan 19. pii: S0010-4825(26)00030-2. [Epub ahead of print]203 111469
      Generative AI, an artificial intelligence, significantly transforms the healthcare sector. Recent breakthroughs in Generative AI include the use of language models and leveraging modern pre-trained Transformer models such as ChatGPT, Bard, LLaMA, DALL-E, and Bing. In medical applications, the advent of Large Language Models (LLMs) is a significant tool for predicting diseases, identifying risk factors, and enhancing diagnostic accuracy by analyzing a massive volume of unevenly distributed medical resources. This study provides a comprehensive review of existing literature on the use of LLMs in healthcare. It elucidates the 'status quo' of language models for general readers, healthcare professionals, and researchers. Specifically, this study investigates the capabilities of LLMs, including the transformation of healthcare consultation, enhancement of patient management and treatment, evolution of medical education, optimal resource utilization, and advancement of clinical research. The article organizes the literature based on human organs that will help readers quickly find relevant LLM applications for specific medical fields. The outcome of this survey will help medical professionals, researchers, and the healthcare industry understand the benefits, challenges, observed limitations, future challenges and applications of LLMs in healthcare.
    Keywords:  Generative AI (GenAI); Generative pretrained transformer (GPT); Large language models (LLM); Natural language processing (NLP); Reinforcement learning (RL)
    DOI:  https://doi.org/10.1016/j.compbiomed.2026.111469
  8. Pol Merkur Lekarski. 2025 ;53(6): 831-837
       OBJECTIVE: Aim: To analyze the significance of modern computer technologies within the value systems of teachers and students at medical universities. Through expert assessments, it explores the advantages and risks associated with artificial intelligence. Special attention is given to the impact of the ongoing war on the value orientations of future doctors and the organization of professional training under new conditions.
    PATIENTS AND METHODS: Materials and Methods: The research employed modern methods of analysis, synthesis, generalization, and comparison, alongside expert assessments and creative essays. It also drew upon the experience of Ukrainian and Polish colleagues in creatively integrating AI into the educational process, particularly in training future doctors for effective patient communication during anamnesis-taking and for understanding the risks of iatrogenesis. To achieve the research goals, a digital information search was conducted using the scientometric database Scopus.
    CONCLUSION: Conclusions: In the face of new challenges, there is an increasing need to examine the dynamics of professional self-development, motivation, and the value systems of future doctors. Given the innovative nature of AI, its potential applications in medicine, its significant benefits, and its possible risks, it is essential to study this issue in depth and systematically prepare medical students for the rational use of modern information technologies. The importance of international collaboration, comparative research, and the practical implementation of the Club of Rome's recommendations on integral thinking continues to grow.
    Keywords:   artificial intelligence ; communication ; creativity ; iatrogenics ; medical history ; medicine ; professional training ; values
    DOI:  https://doi.org/10.36740/Merkur202506119
  9. JAMA Psychiatry. 2026 Jan 21.
       Importance: Despite increasingly widespread use of artificial intelligence (AI)-driven ambient scribes in medicine, the extent to which they are associated with clinician practice is not well studied.
    Objective: To characterize differences in documentation and treatment of psychiatric symptoms in primary care outpatient notes generated using ambient scribes compared with human or no scribes.
    Design, Setting, and Participants: This cohort study used a matched retrospective case-control design to evaluate primary care annual visit notes from the Massachusetts General and Brigham and Women's Hospital systems between February 2023 and February 2025. A random sample of notes from 4 types of visits, matched 1:1 using sociodemographic and clinical features, was used: those using an ambient scribe, those using a human scribe, those occurring during the same period without a scribe (contemporaneous), and those occurring prior to scribe deployment. Data analysis was performed from April 25 to May 1, 2025.
    Exposure: Use of an AI ambient scribe.
    Main Outcomes and Measures: Neuropsychiatric symptom documentation, in terms of estimated Research Domain Criteria (RDoC), using a Health Insurance Portability and Accountability Act-compliant large language model (GPT-4o version gpt-4o-11-20; OpenAI); antidepressant prescriptions and diagnostic codes; and referral for mental health follow-up.
    Results: Among 20 302 notes, the mean (SD) age of the patients was 48 (14) years and 11 960 (59%) were for visits by female patients; 1026 (5%) met criteria for moderate or greater depressive symptoms by Patient Health Questionnaire-9 score. Estimated levels of RDoC symptoms in all 6 domains were significantly greater in the AI-scribed notes compared with other groups. In a multiple logistic regression model, likelihood of a psychiatric intervention (referral, new diagnosis, or antidepressant prescription) was significantly lower among AI-scribed visits compared with contemporaneous unscribed visits (adjusted odds ratio, 0.83; 95% CI, 0.72-0.95), but not for human-scribed visits compared with contemporaneous unscribed visits (adjusted odds ratio, 0.97; 95% CI, 0.85-1.11).
    Conclusions and Relevance: In this retrospective cohort study using a matched case-control design examining outpatient primary care notes, incorporation of AI ambient scribes in primary care was associated with greater levels of neuropsychiatric symptom documentation but lesser likelihood of documented management of psychiatric symptoms. Further study will be required to determine whether these changes are associated with differential outcomes.
    DOI:  https://doi.org/10.1001/jamapsychiatry.2025.4303
  10. Acad Radiol. 2026 Jan 19. pii: S1076-6332(26)00002-4. [Epub ahead of print]
       INTRODUCTION: Artificial intelligence (AI) is increasingly being used to support diagnostic accuracy and efficiency in radiology. While its technical potential is well recognized, little is known about how patients perceive these tools or whether their expectations align with clinical adoption. We aimed to synthesize literature capturing patient perceptions of the use of AI in radiology.
    METHODS: This was a scoping review of empirical literature that has explored patient perceptions of AI in radiology. We conducted a comprehensive search across Medline, Embase and Google Scholar for studies published before December 2025. Eligible studies focused on patient views regarding AI in any radiologic context. Data were synthesized using descriptive and thematic analysis.
    RESULTS: Out of 5284 abstracts screened, 18 studies were included, representing 11 countries and 6574 patients. Six key themes emerged: (i) Trust and confidence in AI, (ii) Need for human oversight, (iii) Understanding and literacy, (iv) Emotional reactions to AI, (v) Accountability, and (vi) Expectations from AI. Patients expressed cautious interest in AI applications but emphasized the need for radiologist involvement. They also showed a preference for using AI as a supportive tool rather than as a replacement for clinicians.
    CONCLUSION: Patients are central to the integration of AI in radiology, yet literature examining patient perceptions of the use of AI in radiology is scarce. In the era of AI-driven technology, understanding and incorporating patient views is essential to the successful and ethical implementation of AI in radiology.
    Keywords:  Artificial intelligence; Patient perceptions; Patient-centered; Radiology; Trust
    DOI:  https://doi.org/10.1016/j.acra.2026.01.001
  11. Patient Educ Couns. 2026 Jan 17. pii: S0738-3991(26)00025-X. [Epub ahead of print]145 109492
       OBJECTIVE: To examine the influence of the emerging use of generative artificial intelligence (GenAI) within electronic health records and among the public on the patient-centeredness of communication in healthcare.
    METHOD: In this scoping review, we conducted a systematic search for peer-reviewed studies in PubMed and PsycInfo that empirically examined GenAI involvement in clinical communication. We then mapped study findings onto a well-established framework for patient-centered communication.
    RESULTS: Our search yielded 67 studies for analysis. Results suggest that integration of GenAI into healthcare communication has the potential to increase clinician efficiency in interacting with patients, to expand channels for patients to obtain information about their healthcare, and to enhance empathy in clinical communication. However, findings also indicate variability in the quality of information produced by GenAI, the potential for GenAI to recast the clinician as a technical supervisor rather than a humanistic care provider, and several issues of equity and privacy raised by engagement with GenAI.
    CONCLUSION: As GenAI becomes more prevalent in healthcare, rigorous examination of GenAI is needed to ensure that its development and implementation aids rather than hinders patient-centered communication. We conclude with an agenda for further research on GenAI grounded in the PCC framework underlying our review.
    PRACTICE IMPLICATIONS: Findings from this review highlight the current potential benefits and limitations of GenAI as a third party to clinical communication. Continued efforts toward developing and applying GenAI for effective healthcare communication should focus on protecting patients from potential drawbacks and maximizing nascent benefits for patient-centered communication.
    Keywords:  Digital health technology; Electronic health records; Generative artificial intelligence; Patient-centered communication
    DOI:  https://doi.org/10.1016/j.pec.2026.109492
  12. Nurs Philos. 2026 Jan;27(1): e70064
      Artificial intelligence (AI) plays an increasingly significant role in nursing care. Many scholars have not only discussed the nature of AI and the effects of its application in nursing practice, but also raised the question of whether AI can replace the nurse in healthcare. In this paper, I aim to demonstrate that the nature of AI is fundamentally distinct from the nature of the nurse, and therefore, AI cannot replace the nurse in the care of the patient. I present my argument in two main parts. In the first, I refer to the scholastic principle agere sequitur esse and to the classical conception of the human being as developed by (neo)Thomists. This means that the argument is grounded in the thought of Thomas Aquinas and certain representatives of neo-Thomism. According to the strategy adopted in this paper, the human being is understood as a person who elicits from within themselves specific personal acts. In the second part, I emphasize that, in contrast to the nurse, AI lacks several essential components that appear to be crucial for meeting the personal needs of patients. I conclude that, according to (neo)Thomistic assumptions, AI can neither replace the nurse nor be considered an equal partner, because it is not endowed with the personal capacities and components that are essential for providing proper care to the patient. The (neo)Thomistic perspective allows one to regard AI solely as a therapeutic tool.
    Keywords:  agere sequitur esse; artificial intelligence; human being; moral agent; nursing philosophy
    DOI:  https://doi.org/10.1111/nup.70064
  13. West J Emerg Med. 2025 Dec 20. 27(1): 194-204
       INTRODUCTION: ChatGPT and other large language models (LLM) have increased in popularity. Despite the rapid rise in the implementation of such technologies, frameworks for implementing appropriate prompting techniques in medical applications are limited. In this paper we establish the nomenclature of "variable" and "clause" in the prompting of a LLM, while providing example interviews that outline the utility of such an approach in medical applications.
    METHODS: In this study assessing the LLM ChatGPT-4, we define terms used in prompting procedures including "input prompt," "variable," "demographic variable and clause," "independent variable and clause," "dependent variable and clause," "generative clause," and "output." This methodology was implemented with three sample patient cases from both a patient and physician perspective.
    RESULTS: As demonstrated in our three cases, precise combinations of variables and clauses that consider the patient's age, gender, weight, height, and education level can yield unique outputs. The software can do so quickly and in a personalized, patient-specific manner. Our findings demonstrate that LLMs can be used to generate comprehensive sets of educational material to augment current limitations, with the potential of improving healthcare outcomes as the use of LLM is further explored.
    CONCLUSION: The framework we describe represents a unique attempt to standardize a methodology for medical inputs into a large language model. Doing so expands the potential for outlining patient-specific information that can be implemented in a query by either a patient or a physician. Most notably, future projects should consider the specialty- and presentation-specific input changes that may yield the best outputs for the desired goals.
    DOI:  https://doi.org/10.5811/westjem.46577
  14. Cureus. 2025 Dec;17(12): e99472
      Introduction Magnetic resonance imaging (MRI) of the knee is the gold standard for evaluating meniscal injuries. While specialized artificial intelligence (AI) models have demonstrated high diagnostic capability in detecting meniscal tears, the performance of general-purpose large language models (LLMs) with multimodal vision capabilities remains underexplored. Previous iterations, such as generative pre-trained transformer 4 (GPT-4) (OpenAI, San Francisco, CA, USA) with vision, have shown limited success in direct musculoskeletal image interpretation. This study evaluates the diagnostic performance of the latest-generation LLM, generative pre-trained transformer 5 (GPT-5), in detecting meniscal tears on knee MRI. Objectives This study aimed to evaluate the diagnostic performance of GPT-5 (a general-purpose multimodal LLM) in detecting meniscal tears on knee MRI in a zero-shot setting, using a publicly available dataset. Materials and methods One hundred knee MRI examinations (50 with meniscal tears, 50 without) were randomly selected from the MRNet validation dataset, with ground-truth labels extracted from the dataset. Sagittal T2-weighted and coronal T1-weighted series were reviewed for completeness and image quality and then converted to Portable Network Graphics (PNG) slices. GPT-5 (gpt-5-2025-08-07) analyzed each case in zero-shot fashion using a fixed prompt requesting a binary ("yes/no") determination of meniscal tear presence without any clinical context. Model predictions were compared with ground truth, and accuracy, precision, recall, specificity, and F1-scores were calculated with 95% confidence intervals. Results GPT-5 achieved an overall accuracy of 76% (95% CI: 0.668-0.833). The model demonstrated a sensitivity (recall) of 84% (95% CI: 0.715-0.917) and a specificity of 68% (95% CI: 0.542-0.792). The precision for detecting tears was 72.4%, and the F1-score was 0.778. Conclusion In this pilot study, GPT-5 demonstrates potential in the zero-shot interpretation of knee MRIs for meniscal tear detection, outperforming previous multimodal LLMs. However, the results should be interpreted with caution due to study limitations, and clinical utility is currently limited by a high false-positive rate and lack of visual explainability. Nevertheless, this pilot evaluation provides an initial proof of concept, and with larger datasets, rigorous validation, improved calibration, and enhanced explainability, future multimodal LLMs may evolve into supportive, human-in-the-loop tools in musculoskeletal radiology.
    Keywords:  ai and machine learning; gpt-5; knee mri; large language model; meniscal tear
    DOI:  https://doi.org/10.7759/cureus.99472
  15. J Hand Surg Am. 2026 Jan 23. pii: S0363-5023(25)00669-0. [Epub ahead of print]
       PURPOSE: As the Google Artificial Intelligence (AI) Overview function becomes engrained in the search engine, it is important to understand the accuracy of these search results as they pertain to Current Procedural Terminology (CPT) codes. We hypothesized that the identified codes will not be 100% accurate and that there will be subgroups of procedures with relatively lower accuracy than others.
    METHODS: One hundred common CPT codes used in hand and upper-extremity surgery were selected and searched in Google using 150 different simplified phrases describing the procedure. The additional 50 phrases were intended to provide more detail on the importance of phrasing variation, for example, "arthrodesis" versus "fusion". The accuracy of the codes was assessed using American Academy of Orthopaedic Surgeons CodeX and the American Society for Surgery of the Hand Coding App. The accuracy was calculated as a percentage.
    RESULTS: Google AI Overview's response provided an accurate CPT Code 90% of the time. Tendon procedures (78%) and nerve procedures (71%) had lower accuracy than the remainder of the groups. Bony procedures had an 88% accuracy rate, and the remainder of the categories had over 90% accuracy. Ten percent of queries resulted in multiple codes; 14 of these 15 queries contained the correct code. Fifty-nine acronyms were used, and searches with acronyms resulted in accurate codes at a similar rate to those without acronyms.
    CONCLUSIONS: This study demonstrates that providers and billers can use Google AI Overview as a tool but should not rely on it as being 100% accurate.
    CLINICAL RELEVANCE: As AI becomes more involved in everyday clinical practice, hand surgeons should be aware of the potential strengths and weaknesses as it pertains to coding.
    Keywords:  Artificial intelligence; Current Procedural Terminology; Google; coding; hand surgery
    DOI:  https://doi.org/10.1016/j.jhsa.2025.12.002
  16. Curr Opin Pediatr. 2026 Jan 20.
       PURPOSE OF REVIEW: Recent advances in artificial intelligence (AI) coincide with decreases in confidence in vaccines. This review examines studies that analyze and mitigate vaccine skepticism by implementing artificial intelligence strategies in various ways.
    RECENT FINDINGS: Studies have explored public attitudes towards vaccines using AI to analyze language in social media postings and interactions, scrutinize AI responses to vaccine-related queries, and attempt to use AI to directly influence vaccine hesitancy. Findings show that AI can be effective in addressing vaccine hesitancy in various ways, including, but not limited to, directly interacting with vaccine-hesitant groups, identifying reasons for vaccine hesitancy, and predicting vaccine hesitancy among specific populations.
    SUMMARY: AI will undoubtedly continue to evolve and improve over the coming years. Continued advances and new applications can help mitigate unwarranted vaccine hesitancy in a variety of ways, such as educating people with messaging tailored to end users or using AI to identify the specific concerns of vaccine-hesitant individuals and groups. It will require an integrative approach to a complex issue - vaccine hesitancy is not a monolith; there is a range of degrees of vaccine hesitancy, and various factors go into a person's vaccine knowledge and beliefs.
    Keywords:  ChatGPT; artificial intelligence; artificial intelligence chatbots; misinformation; natural language processing; social media mining; survey analysis; vaccine confidence; vaccine uptake
    DOI:  https://doi.org/10.1097/MOP.0000000000001546
  17. Patient Educ Couns. 2026 Jan 16. pii: S0738-3991(26)00016-9. [Epub ahead of print]145 109483
       BACKGROUND/PURPOSE: The prevalence of medical and technical jargon makes the dissemination of healthcare information to patients challenging. Plain language (PL) writing, a style of written communication that strives to employ clear and concise language, is a critical tool for overcoming these hurdles, but is hindered by resource constraints. The emergence of generative Artificial Intelligence (AI) has led to research aimed at assessing its PL writing capacity. The aim of this scoping review was to explore what is known about the use of generative AI platforms for writing in PL.
    METHODS: A scoping review was conducted via searches across nine databases in Summer 2024. Studies were included if they evaluated a generative AI platform for the use of writing patient education materials (PEMs) in PL and measured best practices (e.g. readability).
    RESULTS/FINDINGS: In total, 47 articles were included. Most studies were conducted in the United States (n = 29, 61.7 %). Prompt engineering strategies included specifying a reading grade level, audience, health condition, and resource type. AI-generated PEMs improved readability in 28.3 % of the 46 studies that measured Reading Grade level and in 46.2 % of the 26 studies that measured Reading Ease.
    DISCUSSION AND CONCLUSION: This review highlights the potential of generative AI for writing PEMs and assisting in the promotion of health literacy in patient care. AI models varied in their ability to generate or edit PEMs into plain language. Further research is needed to determine whether this can be done to industry standards and outside English-language contexts.
    Keywords:  Artificial intelligence; Comprehension; Delivery of health care; Health literacy; Health promotion; Language; Patient care; Patient education; Plain language
    DOI:  https://doi.org/10.1016/j.pec.2026.109483
  18. J Neurosurg. 2025 Dec 19. 1-9
       OBJECTIVE: The rapid development of artificial intelligence (AI) presents an opportunity to streamline the peer-review process and provide key information to guide academic journals, editorial staff, and reviewers, as well as authors. This study aimed to fine-tune several standard large language and transformer models (LLMs) on the basis of the text of peer-reviewer comments and editorial outcome decisions to find text-based associations with journal decisions for acceptance versus rejection.
    METHODS: This study, with participation from the Journal of Neurosurgery Publishing Group (JNSPG), included anonymized final decision and reviewer comments to all article submissions made to the Journal of Neurosurgery (JNS) and subsidiary journals from 2021 to 2023. All final decisions were grouped as binary (acceptance/revision vs rejection/transfer). Leading words (i.e., "acceptance" or "rejection") were removed from textual reviewer comments, which were then analyzed using various machine learning and LLMs, including BERT, GPT-2, GPT-3, GPT-4o, and GRU variants, to predict the final manuscript decision outcome. Performance was measured using receiver operating characteristic (ROC) curves. Shapley Additive Explanations (SHAP) analysis was conducted to evaluate the impact of individual words on model predictions.
    RESULTS: In the ROC analysis, the fine-tuned GPT-4mini and GPT-3 models achieved the highest area under the curve (AUC) values of 0.91, followed by BERT and GPT-2 with AUC values of 0.84. These were followed by bidirectional GRU and GPT-3 (untrained) with AUC values of 0.75 and 0.70, respectively. Unidirectional GRU and GPT-4o (untrained) demonstrated the lowest AUC values of 0.68 and 0.67, respectively. In the SHAP analysis, the logistic regression model identified words like future," "interesting," and "written" as significant positive predictors of acceptance, whereas "clear," "unclear," and "does" were associated with rejections. The GRU model identified "study," "useful," and "journal" as significant positive predictors, and "unclear," "reading," and "incidence" as negative predictors.
    CONCLUSIONS: This proof-of-concept study demonstrates that fine-tuned AI models, particularly GPT-3, can predict manuscript acceptance with reasonable accuracy using only textual reviewer comments. Emerging themes that lend weight to article outcome include article clarity, utility, suitability, cohort size, and diligence in addressing reviewer queries. These findings suggest that, when fine-tuned, AI modeling holds significant potential in assisting and facilitating the peer-review process.
    Keywords:  artificial intelligence; journal; large language model; peer review
    DOI:  https://doi.org/10.3171/2025.8.JNS242667