bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–10–05
35 papers selected by
Thomas Krichel, Open Library Society



  1. IEEE Trans Vis Comput Graph. 2025 Oct 03. PP
      Exploring and comprehending relevant academic literature is a vital yet challenging task for researchers, especially given the rapid expansion in research publications. This task fundamentally involves sensemaking-interpreting complex, scattered information sources to build understanding. While emerging immersive analytics tools have shown cognitive benefits like enhanced spatial memory and reduced mental load, they predominantly focus on information synthesis (e.g., organizing known documents). In contrast, the equally important information foraging phase-discovering and gathering relevant literature-remains underexplored within immersive environments, hindering a complete sensemaking workflow. To bridge this gap, we introduce LITFORAGER, an interactive literature exploration tool designed to facilitate information foraging of research literature within an immersive sensemaking workflow using network-based visualizations and multimodal interactions. Developed with WebXR and informed by a formative study with researchers, LITFORAGER supports exploration guidance, spatial organization, and seamless transition through a 3D literature network. An observational user study with 15 researchers demonstrated LITFORAGER's effectiveness in supporting fluid foraging strategies and spatial sensemaking through its multimodal interface.
    DOI:  https://doi.org/10.1109/TVCG.2025.3616732
  2. BMC Med Res Methodol. 2025 Sep 29. 25(1): 216
       OBJECTIVE: To develop and compare the accuracy of different PubMed search strategies (filters) for systematic and non-systematic reviews in dental journals.
    METHODS: This validation study included articles published in 2019 in 15 dental journals. Two search filters were developed: (1) a high-sensitivity filter to retrieve all possible review articles, and (2) a high-specificity filter for systematic reviews. Two previously published filters were used as benchmarks. The gold standard method for identifying the study methodology was manual reading of the full text. Accuracy, sensitivity and specificity were calculated.
    RESULTS: Among the 2246 articles published, 6.7% (n = 150) were systematic reviews and 5.9% (n = 132) were other types of reviews. The high-sensitivity filter retrieved 147 of 150 systematic reviews and showed a sensitivity of 98.0% (95%CI: 94.3-99.6) and specificity of 88.9% (95%CI: 87.5-90.2). The high-specificity filter had 96.7% (95%CI: 92.4-98.9) sensitivity and 99.1% (95%CI: 98.6-99.5) specificity for retrieving systematic reviews. The accuracy of this filter for systematic reviews was 97.9% (95%CI: 96.4-99.4), which was higher than the PubMed benchmark filter (p < 0.05) and similar to another longer filter.
    CONCLUSION: This study provides two new highly accurate search filters for PubMed that can be used by clinicians, researchers and policymakers.
    Keywords:  Databases; Review literature as topic; Search filters; Systematic reviews as topic; Validation study
    DOI:  https://doi.org/10.1186/s12874-025-02666-3
  3. Med Pr. 2025 Sep 24. pii: 208319. [Epub ahead of print]
       BACKGROUND: The constantly growing number of elderly and disabled people in Polish society forces decision-makers, investors, and architects to take action to eliminate obstacles in access to public buildings. Educational and cultural facilities, including libraries, are particularly important for ensuring the well-being of this particular group. These facilities should be characterized by universal accessibility, including appropriate location that facilitates integration with the environment not only geographically but also socially, the reduction of all barriers, and the implementation of solutions that enable the use of all their areas by people with special needs. The spatial solutions of these facilities should counteract social exclusion. The aim of the article is an attempt to diagnose the current state of adaptation of public libraries to the needs of elderly and disabled people in one of the representative cities in Poland.
    MATERIAL AND METHODS: The article describes the conducted survey research. The main research tool used was correlated quantitative-statistical and qualitative research methods. The undertaken activities were both theoretical (analysis of the relevant literature and sample solutions) and practical (field research, field trips). The research concludes with a comparative analysis of the correlations between specific variable parameters of the outdoor and indoor spaces within the analyzed facilities.
    RESULTS: This article presents the results of research related to the accessibility of public libraries in Białystok for seniors. It was found that most libraries in the city are located in facilities that were not originally planned, designed and implemented to perform their current function. The research results indicate that in a significant number of facilities there are architectural barriers that make it difficult and often impossible to overcome them not only for people in wheelchairs, but also for people with limited mobility and other psychomotor dysfunctions.
    CONCLUSIONS: The analysis of the results of the conducted research confirms the accepted thesis about the insufficient adaptation of buildings in which libraries are located in the city of Białystok to the needs of senior citizens and disabled. The research results can be used in planning remedial actions and improving the quality of the architectural space of existing and planned public libraries. Med Pr Work Health Saf. 2025;76(4).
    Keywords:  accessibility; architectural design; library; public building architecture; senior-friendly space; universal design
    DOI:  https://doi.org/10.13075/mp.5893.01629
  4. Health Info Libr J. 2025 Sep 30.
      Technological advancements and emergence of 5G technology have significantly improved health library services. Although Wi-Fi offers many benefits in establishing smart libraries, the enhanced connectivity among a large number of devices reduces latency between input and output, and robust security demonstrates the enhanced potential for 5G technology in health libraries. In this paper, we highlight five dimensions to support health libraries in the development and evaluation of 5G technologies in facilitating remote health library and information services. The five dimensions are: technological infrastructure, technology integration into health libraries, remote health information services, user readiness, and external support.
    Keywords:  health libraries; mobile health; technological innovations
    DOI:  https://doi.org/10.1111/hir.70002
  5. Med Ref Serv Q. 2025 Oct 02. 1-16
      The health sciences librarian and data management librarian partnered with a health professions faculty member to develop integrated library instruction for a distance PhD dissertation seminar. The librarians employed a variety of best practices techniques for maximizing adult learning, adapted the course for each cohort, and maintained the librarians' presence in the course, even after the COVID-19 pandemic forced permanent changes in hybrid instruction. Feedback from faculty and cohorts is continuously sought, and library instructors are looking to future implications for library instruction with the advent of new technologies, such as affordable course content and generative artificial intelligence.
    Keywords:  Doctoral education; PhD programs; embedded librarianship; online instruction; transactional distance
    DOI:  https://doi.org/10.1080/02763869.2025.2568128
  6. ArXiv. 2023 Oct 04. pii: arXiv:2307.00589v2. [Epub ahead of print]
      Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching. In response, we introduce MedCPT, a first-of-its-kind Contrastively Pre-trained Transformer model for zero-shot semantic IR in biomedicine. To train MedCPT, we collected an unprecedented scale of 255 million user click logs from PubMed. With such data, we use contrastive learning to train a pair of closely-integrated retriever and re-ranker. Experimental results show that MedCPT sets new state-of-the-art performance on six biomedical IR tasks, outperforming various baselines including much larger models such as GPT-3-sized cpt-text-XL. In addition, MedCPT also generates better biomedical article and sentence representations for semantic evaluations. As such, MedCPT can be readily applied to various real-world biomedical IR tasks.
  7. Med Ref Serv Q. 2025 Sep 30. 1-19
      This article describes a needs assessment performed by an academic/health sciences library to study the feasibility of a library makerspace for campus users and to determine if there was any interest for such a makerspace. A literature review, an environmental scan, a campus-wide survey, and focus groups were conducted. Results showed that there was significant campus interest. Findings emphasized that staffing and programming should precede equipment acquisition to ensure success.
    Keywords:  Community; Makerspace; focus group; library makerspace; programming; survey
    DOI:  https://doi.org/10.1080/02763869.2025.2563527
  8. Reg Anesth Pain Med. 2025 Oct 01. pii: rapm-2025-107204. [Epub ahead of print]
      
    Keywords:  Analgesics, Opioid; EDUCATION; Meta-Analysis; Outcome Assessment, Health Care
    DOI:  https://doi.org/10.1136/rapm-2025-107204
  9. Rev Med Chil. 2025 Sep;pii: S0034-98872025000900641. [Epub ahead of print]153(9): 641-645
      Recently, there has been a surge in technological tools designed to automate tasks across various areas of health sciences, including the identification of evidence used in the development of evidence syntheses that inform clinical practice guideline (CPG) recommendations. Simultaneously, there has been a significant increase in the production of systematic reviews, meaning that much of the relevant evidence is already included in existing reviews.
    AIM: To compare the performance of the semi-automated Epistemonikos Evidence Matrix tool with that of a traditional manual literature search in identifying studies for the development of clinical practice guidelines.
    MATERIALS AND METHODS: During the development of three CPGs (focused on HIV/AIDS, pediatric asthma management, and stroke management), we compared studies identified through a traditional search strategy in MEDLINE, Embase, and the Cochrane Library with those found using a strategy based on existing systematic reviews, via the Epistemonikos database. The traditional search employed keyword-based strategies and a specific filter for randomized controlled trials. In contrast, the Epistemonikos-based strategy relied on the semi-automated Evidence Matrix tool, which identifies studies shared across two or more systematic reviews.
    RESULTS: Across the three guidelines, 8,466 potentially relevant articles were identified using the traditional method, compared to 6,771 using the Epistemonikos-based method. Of these, 155 studies (1.8%) were deemed truly relevant in the traditional search, versus 103 (1.5%) in the Epistemonikos-based approach (p= 0.14). The approach based on existing reviews demonstrated significantly higher precision (94% vs. 78%, p<0.01) but lower sensitivity (58% vs. 88%, p<0.01) compared to the traditional search.
    CONCLUSIONS: The evidence search strategy based on existing systematic reviews is an efficient and reliable alternative for identifying relevant studies to support evidence-based decision-making.
    DOI:  https://doi.org/10.4067/s0034-98872025000900641
  10. Cochrane Evid Synth Methods. 2025 Nov;3(6): e70050
       Background: Elicit AI aims to simplify and accelerate the systematic review process without compromising accuracy. However, research on Elicit's performance is limited.
    Objectives: To determine whether Elicit AI is a viable tool for systematic literature searches and title/abstract screening stages.
    Methods: We compared the included studies in four evidence syntheses to those identified using the subscription-based version of Elicit Pro in Review mode. We calculated sensitivity, precision and observed patterns in the performance of Elicit.
    Results: The sensitivity of Elicit was poor, averaging 39.5% (25.5-69.2%) compared to 94.5% (91.1-98.0%) in the original reviews. However, Elicit identified some included studies not identified by the original searches and had an average of 41.8% precision (35.6-46.2%) which was higher than the 7.55% average of the original reviews (0.65-14.7%).
    Discussion: At the time of this evaluation, Elicit did not search with high enough sensitivity to replace traditional literature searching. However, the high precision of searching in Elicit could prove useful for preliminary searches, and the unique studies identified mean that Elicit can be used by researchers as a useful adjunct.
    Conclusion: Whilst Elicit searches are currently not sensitive enough to replace traditional searching, Elicit is continually improving, and further evaluations should be undertaken as new developments take place.
    Keywords:  artificial Intelligence (AI); evidence synthesis; literature searching; research methodology; systematic review
    DOI:  https://doi.org/10.1002/cesm.70050
  11. Nature. 2025 Sep 29.
      
    Keywords:  Authorship; Careers; Lab life; Scientific community; Technology
    DOI:  https://doi.org/10.1038/d41586-025-02867-2
  12. JMIR Form Res. 2025 Oct 01. 9 e75335
       BACKGROUND: Menopause is a significant time in a woman's life, but only recently has there been an open discussion about it in the media, workplaces, and general society. With increasing frequency, women are using the internet to research menopause, making it essential that online sources provide safe, high-quality, and relevant information.
    OBJECTIVE: This study aimed to investigate the current state of the online information landscape for menopause from the perspective of information seekers, exploring (1) information-seeking behavior and (2) perceptions of online resources for menopause.
    METHODS: A 10- to 15-minute online survey was conducted asking about the respondents' use of and opinions about online resources specifically for menopause. We distributed the survey via social media, email, and word of mouth. Quantitative data were explored using means and frequencies. Group differences between menopausal groups were analyzed using chi-square, Fisher exact, or Kruskall-Wallis tests as appropriate. Qualitative data were analyzed using data-driven thematic analysis.
    RESULTS: Data from 627 participants were analyzed (early perimenopause: n=171, 27.3%, late perimenopause: n=125, 19.9%, postmenopause: n=262, 41.8%, and surgical menopause: n=69, 11%). The majority of respondents had used the internet as a source of information (581/627, 92.7%), with the internet being the first choice of information source (489/581, 84.2%). The most searched-for information online was about menopause symptoms (479/581, 82.4%), menopause treatment options (442/581, 76.1%), and self-help tips or strategies (318/581, 54.7%). The majority of participants trusted online information to some extent (615/627, 98.1%), with many also considering online information accurate to some extent (555/627, 88.5%). Many participants reported finding some but not all of the information they were looking for online (379/581, 65.2%). Thematic analysis revealed 10 themes related to information quality and accessibility and sought-after information (eg, symptom specifics, treatment, and nonformal management strategies). Analysis also indicated that information is lacking for several groups, including those in medically induced or surgical menopause.
    CONCLUSIONS: The study showed that online informational resources are widely accessed and widely perceived as useful and trustworthy. However, it is crucial that the quality of online information is evaluated, especially considering the large number of users who rely on it as their first or only informational source. Online searches were usually performed to find information related to symptoms, treatment, and self-help recommendations, with differences in search behaviors observed across menopausal stages and groups, highlighting the need for tailored informational resources. Thematic analysis revealed gaps in the provision of online information both in terms of content and quality. Participants noted a lack of comprehensive symptom information, inadequate information for groups such as those experiencing medical or surgical menopause, and concerns about outdated content and a lack of source transparency. Future research with more diverse samples is needed to better understand variations in online health information-seeking behaviors across groups.
    Keywords:  information; internet; menopause; online; perimenopause
    DOI:  https://doi.org/10.2196/75335
  13. Health Inf Manag. 2025 Sep 29. 18333583251378960
       BACKGROUND: Continuing professional development (CPD) involves ongoing learning to maintain and enhance professional competence. CPD for health information managers (HIMs) is embodied in the Health Information Management Association of Australia's (HIMAA) Professional Competency Standards. The CPD engagement of Australia's HIMs has not been explored.
    OBJECTIVE: To examine HIMs' engagement with professional body-initiated CPD, and the associated enablers and barriers.
    METHOD: A cross-sectional survey was administered to 72 Victorian graduate HIMs from four cohorts: 1985, 1995, 2005 and 2015. It elicited their perceptions of CPD and engagement with HIMAA events and activities.
    RESULTS: When asked if engaging in CPD was important: 70.8% agreed; 9.7% disagreed; 19.4% were unsure. Motivations for engagement were networking, continuous learning and skill development and gaining insights from the work of others. Barriers to participation included movement outside of HIM roles, lack of time or interest and perceived irrelevance. The most common activities were special interest/working group(s) (27.8%) and other (sub)committees (18.1%). Seventy-two percent had attended an (inter)national conference or seminar/webinar. Notably, 57% had not participated in any HIMAA-related activities.
    DISCUSSION: HIMs must continually build their knowledgebase and skills to align with the evolving health information ecosystem. A large proportion of participants acknowledged the importance of CPD; however, of concern, are the 29.1% who disagreed or were unsure, and the 57% who did not participate in any professional body-initiated CPD activities.
    CONCLUSION: There is room to strengthen HIMs' engagement with CPD to ensure their commitment to career-centred lifelong learning, maintain professional competence by meeting a core competency outlined in the Professional Competency Standards and contribute to the continuing development of the profession.Implications for health information management practice:HIMAA should continue to diversify CPD content and formats to sustain and further enhance engagement, to ensure HIMs remain competent, and responsive to the changing health information environment.
    Keywords:  continuing professional development; health information management; health information management profession; health information management workforce; health information manager; lifelong learning; professional competency; professional development
    DOI:  https://doi.org/10.1177/18333583251378960
  14. Br J Oral Maxillofac Surg. 2025 Sep 03. pii: S0266-4356(25)00214-1. [Epub ahead of print]
      Temporomandibular disorders (TMD) are complex conditions that burden patients and healthcare systems. Disparities in health literacy may hinder patient comprehension of online educational materials, potentially influencing outcomes. Artificial intelligence (AI)-driven chatbots offer a promising solution to improve the readability of patient information materials. We assessed the readability of available online materials on TMD in the United Kingdom (UK) and evaluate the ability of three AI-chatbots to improve readability. A search was done of all UK public hospital websites with Oral and Maxillofacial Surgery (OMFS) or Ear, Nose, and Throat (ENT) units for TMD-related patient information. Readability was assessed using five standard scoring systems. Three AI-chatbots (ChatGPT, Claude, and Google Gemini) were used to revise the content to an 11-year-old (sixth-grade) reading level. A total of 31 out of 122 of UK hospital Trusts provided online TMD materials. Of these, 12/31 and 1/31 met the target readability according to the Flesch-Kincaid Grade Level (FKGL), and the Gunning Fog Index/Coleman-Liau Index (GFI/CLI), respectively, with mean (SD) readability at 64.68 (6.79) for Flesch Reading Ease Score (FRES). After AI modification by Gemini, 96.8% met the target readability per FKGL, 54.8% per GFI, and 29.0% per CLI. Gemini improved the mean (SD) score significantly to 82.59 (5.73) (p < 0.001) for FRES, meeting the target readability level. Online patient information on TMD exceeds the recommended Year Six (sixth grade in the US) reading level. AI chatbots, particularly Gemini, can significantly enhance the readability of these materials, enabling them to meet health literacy standards according to certain readability tools.
    Keywords:  Artificial intelligence; ChatGPT; Claude; Education; Google Gemini; Health literacy; Online information; Readability; Temporomandibular joint; Temporomandibular joint disorders
    DOI:  https://doi.org/10.1016/j.bjoms.2025.08.008
  15. J Appl Oral Sci. 2025 ;pii: S1678-77572025000100447. [Epub ahead of print]33 e20250321
      Artificial intelligence (AI) is transforming access to dental information via large language models (LLMs) such as ChatGPT and Google Gemini. Both models are increasingly being used in endodontics as a source of information for patients. Therefore, as developers release new versions, the validity of their responses must be continuously compared to professional consultations.
    OBJECTIVE: This study aimed to evaluate the validity of the responses provided by the most advanced LLMs [Google Gemini Advanced (GGA) and ChatGPT-4o] to frequently asked questions (FAQs) in endodontics.
    METHODOLOGY: A cross-sectional analytical study was conducted in five phases. The top 20 endodontic FAQs submitted by users to chatbots and collected from Google Trends were compiled. In total, nine academically certified endodontic specialists with educational roles scored GGA and ChatGPT-4o responses to the FAQs using a five-point Likert scale. Validity was determined using high (4.5-5) and low (≥4) thresholds. The Fisher's exact test was used for comparative analysis.
    RESULTS: At the low threshold, both models obtained 95% validity (95% CI: 75.1%- 99.9%; p=.05). At the high threshold, ChatGPT-4o achieved 35% (95% CI: 15.4%- 59.2%) and GGA, 40% (95% CI: 19.1%- 63.9%) validity (p=1).
    CONCLUSIONS: ChatGPT-4o and GGA responses showed high validity under lenient criteria that significantly decreased under stricter thresholds, limiting their reliability as a stand-alone source of information in endodontics. While AI chatbots show promise to improve patient education in endodontics, their validity limitations under rigorous evaluation highlight the need for careful professional monitoring.
    DOI:  https://doi.org/10.1590/1678-7757-2025-0321
  16. J Craniofac Surg. 2025 Oct 02.
       INTRODUCTION: Ear microtia is a congenital deformity that can range from mild underdevelopment to complete absence of the external ear. Often unilateral, it causes visible facial asymmetry leading to psychosocial distress for patients and families. Caregivers report feeling guilty and anxious, while patients experience increased rates of depression and social challenges. This is often a difficult time for the patient and their families, who often turn to AI chatbots for guidance before and after receiving definitive surgical care. This study evaluates the quality and readability of leading AI-based chatbots when responding to patient-centered questions about the condition.
    METHODS: Popular AI chatbots (ChatGPT 4o, Google Gemini, DeepSeek, and OpenEvidence) were asked 25 queries about microtia developed from the FAQ section on hospital websites. Responses were evaluated using modified DISCERN criteria for quality and SMOG scoring for readability. ANOVA and post hoc analyses were performed to identify significant differences.
    RESULTS: Google Gemini achieved the highest DISCERN score (M=37.16, SD=2.58), followed by OpenEvidence (M=32.19, SD=3.54). DeepSeek (M=30.76, SD=4.29) and ChatGPT (M=30.32, SD=2.97) had the lowest DISCERN scores. OpenEvidence had the worst readability (M=18.06, SD=1.12), followed by ChatGPT (M=16.32, SD=1.41). DeepSeek was the most readable (M=14.63, SD=1.60), closely followed by Google Gemini (M=14.73, SD=1.27). Overall, the average DISCERN and SMOG scores across all platforms were 32.19 (SD=4.43) and 15.93 (SD=1.94), respectively, indicating a good quality and an undergraduate reading level.
    CONCLUSIONS: None of the platforms consistently met both quality and readability standards, though Google Gemini performed relatively well. As reliance on AI for early health information grows, ensuring the accessibility of chatbot responses will be crucial for supporting informed decision-making and enhancing the patient experience.
    Keywords:  Artificial intelligence; chatbots; health communication; information quality; microtia; patient education; readability
    DOI:  https://doi.org/10.1097/SCS.0000000000011988
  17. Medicine (Baltimore). 2025 Sep 26. 104(39): e44728
      Owing to shame and stigmatization, hidradenitis suppurativa (HS) patients may seek information about their disease on artificial intelligence (AI) chatbots. We aimed to evaluate the readability, quality, and accuracy of HS-related information provided by 3 AI chatbots: ChatGPT-4o, Copilot, and Perplexity. The 24 most frequently queried keywords regarding HS were identified using Google Trends. In this observational and cross-sectional study, we asked ChatGPT-4o, Copilot, and Perplexity chatbots for these keywords. The readability was evaluated using Flesch readability ease and Flesch-Kincaid grade level scores. SpaCy software v3.8.2, TERA (The Text Ease and Readability Assessor), TAALED (Tool for the Automatic Analysis of Lexical Diversity, βv1.4.1), and TAALES (Tool for the Automatic Analysis of Lexical Sophistication, v2.2) were used for the further linguistic analysis. The ensuring quality information for patients (EQIP) and DISCERN tools were used to assess the quality. The accuracy was assessed using a 6-point Likert scale. Perplexity exhibited the highest text length (P < .001). Copilot exhibited better readability scores (P < .001). Perplexity had higher FKGL (P = .001) and lower FRES (P = .001) than the others regarding the outputs in the "test, operation, investigation, or procedure & drug, medication, or product" category. Nevertheless, none of the chatbots achieved the necessary level of readability. Lexical diversity was lower in Perplexity than in the other chatbots (P < .001). Referential cohesion was highest in Perplexity, whereas deep cohesion was highest in Copilot (P < .001 and P = .009, respectively). Age of acquisition was lowest in Copilot responses (P < .001). Copilot achieved the highest EQIP score with "good quality with minor problems" (P < .001). ChatGPT had the lowest DISCERN scores (P = .001). Although all chatbot models displayed favorable accuracy results, Perplexity had higher accuracy scores compared to Copilot (P = .020). All the AI models had difficult reading levels as college or postgraduate. Copilot seemed to generate lexically simpler outputs, while Perplexity produced longer responses with more referential cohesion but lower lexical diversity. ChatGPT, Copilot, and Perplexity seem insufficient for providing extensive, easily understandable, and exactly accurate medical information about HS.
    Keywords:  artificial intelligence; chatbot; hidradenitis suppurativa; quality; readability
    DOI:  https://doi.org/10.1097/MD.0000000000044728
  18. J Prosthet Dent. 2025 Sep 26. pii: S0022-3913(25)00737-1. [Epub ahead of print]
       STATEMENT OF PROBLEM: Patients seeking information about maxillofacial prosthodontic care increasingly turn to artificial intelligence (AI)-driven chatbots for guidance. However, the readability, accuracy, and clarity of these AI-generated responses have not been adequately evaluated within the context of maxillofacial prosthodontics.
    PURPOSE: The purpose of this study was to assess and compare the readability and performance of chatbot-generated responses to frequently asked questions about intraoral and extraoral maxillofacial prosthodontics.
    MATERIAL AND METHODS: A total of 20 frequently asked intraoral and extraoral questions were collected from 7 maxillofacial prosthodontists. These questions were submitted to 4 AI chatbots: ChatGPT, Gemini, Copilot, and DeepSeek. A total of 80 responses were evaluated. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL). Seven maxillofacial prosthodontists were calibrated to score the chatbot responses on 5 domains, relevance, clarity, depth, focus, and coherence, using a 5-point scale. The obtained data were analyzed using 2-way ANOVA with post hoc Tukey tests, Pearson correlation analyses, and intraclass correlation coefficients (ICCs) (α=.05).
    RESULTS: FKGL scores differed significantly among chatbots (P=.002). DeepSeek had the lowest FKGL, indicating better readability, while ChatGPT had the highest. Word counts, relevance, clarity, content depth, focus, and coherence varied significantly among platforms (P<.005). ChatGPT, Gemini, and DeepSeek consistently scored higher, while Copilot had the lowest scores across all domains. For questions on intraoral prostheses, FKGL scores negatively correlated with word count (P=.013). For questions on extraoral prostheses, word count positively correlated with all qualitative metrics except for FKGL (P<.005).
    CONCLUSIONS: Significant differences were found in both readability and response quality among commonly used AI chatbots. Although the DeepSeek and ChatGPT platforms produced higher-quality content, none consistently met health literacy guidelines. Clinician oversight is essential when using AI-generated materials to answer frequently asked questions by patients requiring maxillofacial prosthodontic care.
    DOI:  https://doi.org/10.1016/j.prosdent.2025.09.009
  19. Orbit. 2025 Sep 30. 1-8
       PURPOSE: To evaluate the performance of ChatGPT-4 and Gemini, two large language models (LLMs), in addressing frequently asked questions (FAQs) about eye removal surgeries.
    METHODS: A set of 24 FAQs related to enucleation and evisceration was identified through a Google search and categorized into preoperative, procedural, and postoperative topics. Each question was submitted three times to ChatGPT-4o and Gemini, and responses were evaluated for consistency, accuracy, appropriateness, and potential harm. Readability was assessed using Flesch Reading Ease and Flesch-Kincaid Grade Level scores.
    RESULTS: Gemini exhibited higher response consistency compared to ChatGPT (p = 0.043), while ChatGPT produced longer responses (mean length: 169.3 vs. 109.9 words; p < 0.001). Gemini's responses were more readable, with a higher Flesch Reading Ease score (39.0 vs. 31.3, p = 0.001) and lower Flesch-Kincaid Grade Level (11.6 vs. 14.0, p < 0.001). Both LLMs demonstrated comparable accuracy and low potential for harm, with 79.2% of Gemini responses and 77.1% of ChatGPT responses rated as completely correct. The sources cited by Gemini included academic institutions (91.7%) and medical practices (8.3%), while ChatGPT exclusively referenced academic sources.
    CONCLUSIONS: ChatGPT and Gemini showed comparable accuracy and low harm potential when addressing patient questions about eye removal surgeries. Gemini provided more consistent and readable responses, but both LLMs exceeded the recommended readability levels for patient education. These findings highlight the potential of LLMs to assist in patient communication and clinical education while underscoring the need for careful oversight in their implementation.
    Keywords:  Large language models; eye enucleation; eye evisceration; eye removal; generative artificial intelligence
    DOI:  https://doi.org/10.1080/01676830.2025.2559735
  20. Sportverletz Sportschaden. 2025 Sep 29.
      The quality of websites containing medical content is becoming increasingly important as the Internet is a major source of information on health issues and can influence the course of a disease. This study analysed 250 websites on chronic ankle instability, which affects 40% of patients after acute ankle sprain. Based on the results, a guide for patients was developed.The EQIP36 score for medical information materials, along with a 25-item evaluation tool, was used to evaluate the quality of the websites. The reading level was determined by the Flesch-Kincaid index and the calculated readability. A survey of medical laypersons and specialists was conducted to further analyse the top 3 websites.Out of 250 websites surveyed, 42 were included in the analysis, with significant differences in quality observed. None of the websites adequately fulfilled the quality requirements. Websites affiliated with health system sources predominated and demonstrated higher quality, whereas commercially operated sites, as well as those with advertising or links to social media, showed below-average content completeness. None of the websites reached the recommended reading level. The survey showed a mixed level of satisfaction. Participants without prior medical knowledge criticised the use of medical jargon and the inclusion of distressing surgical images.The available online resources for ankle instability are insufficient in quality and lack consistency. Deficits in content, readability and structure impair effective use by patients. Stakeholders involved in publishing health information should prioritise improving its clarity and quality to better support patients in self-managing their health conditions.
    DOI:  https://doi.org/10.1055/a-2657-7455
  21. Niger J Clin Pract. 2025 Sep 01. 28(9): 1049-1055
       BACKGROUND: Video-based learning is used in surgical education due to its flexibility and cost-effectiveness. Endoscopic submucosal dissection (ESD) is a technically challenging procedure, and YouTube is an important source of educational videos on this topic. The laparoscopic video educational guide and scoring (LAP-VEGaS) system can objectively evaluate the educational quality of these videos.
    AIM: To evaluate the educational quality of the most-viewed ESD videos on YouTube using the LAP-VEGaS score and to examine the relationship between quality and participation metrics.
    METHODS: On August 01, 2024, the 20 most popular videos were selected based on view count by searching for "endoscopic submucosal dissection" on YouTube. Two independent evaluators scored the videos using LAP-VEGaS (0-18). Videos scoring 9 or higher were classified as high quality (HQ; n = 12), while those scoring below 9 were classified as low quality (LQ; n = 8).
    RESULTS: The average number of views for the videos was 20,567 ± 38,269, the time elapsed since upload was 2751 ± 1264 days, the duration was 537 ± 301 s, and the average LAP-VEGaS score was 9.1 ± 5.2. HQ videos were longer (579 ± 295 s vs. 434 ± 190 s; P ≈ 0.08) and had a higher like rate (0.984 ± 0.02 vs. 0.94 ± 0.07; P < 0.05). Strong positive correlations were observed between likes and view rate (r = 0.979; P < 0.001) and video power index (r = 0.984; P < 0.001). Moderate correlations were found between the LAP-VEGaS score and duration (r = 0.515; P = 0.021) and similarity ratio (r = 0.492; P = 0.035).
    CONCLUSION: The educational quality of popular ESD videos is heterogeneous, and unlike the number of views, interaction metrics, such as the like/view ratio and video duration, more reliably reflect educational value. Therefore, objective evaluation tools such as LAP-VEGaS are recommended to facilitate the selection of HQ content by educators and learners.
    Keywords:  Endoscopic submucosal dissection; LAP-VEGaS Score; YouTube; surgical education; video-based learning
    DOI:  https://doi.org/10.4103/njcp.njcp_84_25
  22. J Indian Soc Pedod Prev Dent. 2025 Jul 01. 43(3): 347-353
       BACKGROUND: In today's era of social media, several YouTube videos are available on the use of silver diamine fluoride (SDF) in children. However, the content needs to be evaluated critically so that parents/caregivers and general dentists know the accurate and reliable information about this caries preventive agent.
    AIM: The aim of this study was to evaluate the quality and content of YouTube videos about the use of SDF for parents/caregivers and general dentists.
    METHODS: A systematic YouTube™ search was conducted using the keywords "SDF, pediatric dentists, children, dental caries" with the filter set to "sort by relevance." The selected videos were evaluated in terms of content and quality using a customized five-point scale and Modified Global Quality Score. The accuracy and reliability of the videos were assessed using the Journal of the American Medical Association (JAMA) criteria. Metrics recorded included number of views, duration, days since upload, comments, likes, dislikes, interaction index, and viewing rate. Independent sample t-test, Chi-square test, and correlation coefficient were employed for the analysis of the quantitative variables.
    RESULTS: Out of 200 videos initially retrieved, 66 videos were selected for final analysis. Most of the videos (53, 83.3%) were uploaded by pediatric dentists/dentists. The median total content score was 3 (interquartile range = 3) with 23 (35%) scored as high and 43 (65%) as low-content score. Overall, only 13 (20%) videos were completely reliable as per the JAMA criteria and 14 (21.2%) videos were graded as the high-quality videos.
    CONCLUSIONS: The analysis revealed a lack of high-quality, reliable information on SDF for educating the parents and caregivers. Improved content quality is needed to inform parents/caregivers and general dentists about SDF's benefits and limitations.
    Keywords:  Children; YouTube™; dental caries; pediatric dentists; silver diamine fluoride
    DOI:  https://doi.org/10.4103/jisppd.jisppd_136_25
  23. Sci Diabetes Self Manag Care. 2025 Sep 28. 26350106251371082
      PurposeThe purpose of the study was to evaluate the quality, reliability, and informational adequacy of YouTube videos related to the installation and replacement of continuous glucose monitor (CGM) systems.MethodsThis descriptive and correlational study evaluated 460 videos retrieved using the keywords "CGM installation" and "CGM replacement" and analyzed 35 videos that met the inclusion criteria. Videos were assessed using 3 tools: the DISCERN instrument, the Global Quality Scale (GQS), and the 24-item CGM Informational Survey (CIS) developed by the researchers.ResultsThe majority of videos (80%) were user-generated, and only 2.9% were uploaded by health care professionals. The average GQS score was 2.80, DISCERN 34.57, and CIS 11.86, indicating moderate to low quality and informativeness. Video duration showed strong positive correlations with CIS (r = .80), DISCERN (r = .64), and GQS (r = .71) scores (P < .001). Videos with high information scores were significantly longer and more comprehensive than low-scoring ones. No significant correlation was found between follower count and content quality. The most frequently shared YouTube videos were related to the Dexcom CGM System (34.3%) and the FreeStyle Libre CGM System (25.7%).ConclusionsYouTube videos related to CGM installation and replacement are largely insufficient in terms of medical accuracy and completeness. Given the growing reliance on digital health information, it is essential for health care professionals to produce accurate, standardized, and accessible video content to support safe diabetes self-management and improve public health literacy.
    DOI:  https://doi.org/10.1177/26350106251371082
  24. Front Public Health. 2025 ;13 1657233
       Background: Cardiopulmonary resuscitation (CPR) is an emergency medical procedure designed to restore circulation and respiratory function in patients who have suffered cardiac arrest. This study aimed to comprehensively analyze the upload sources, content characteristics, and video quality of CPR-related videos on YouTube, Bilibili, and TikTok, with a view to providing a reference for improving public first aid awareness and skills.
    Methods: In December 2024, we searched each platform using "Cardiopulmonary resuscitation" and "CPR" (including "" for Bilibili and TikTok), retrieving the top 100 videos per platform. After screening, 239 videos (YouTube: 80; Bilibili: 72; TikTok: 87) met inclusion criteria. Meanwhile, we quantitatively assessed the video quality using the Patient Education Material Assessment Tool (PEMAT), Video Information and Quality Index (VIQI), and Global Quality Score (GQS) assessment tools. We assessed the correlation between video quality scores and viewer interaction data (likes, comments, favorites, and retweets).
    Results: A total of 239 videos were included for analysis (YouTube: 80; Bilibili: 72; TikTok: 87). Short-form CPR videos have increased yearly. Uploaders differed by platform: YouTube-mainly professional institutions; Bilibili-Non-professional individuals; TikTok-Non-professional Institutions. TikTok had the highest uploader certification rate (72.97%), and videos by professional individuals gained the most interactions. Content varied: YouTube focused on CPR knowledge (85.00%), TikTok on News and Reports (48.28%), and Bilibili was mixed. The automated external defibrillator (AED)-related videos on TikTok received the most likes. YouTube videos had the highest quality scores, especially those from professionals. However, correlation between quality scores and interaction data showed no strong positive correlation.
    Conclusion: Social media plays a growing role in CPR education, yet overall video quality-especially in accuracy and completeness-needs improvement. Involving more professionals in content creation and enhancing platform recommendation algorithms could help disseminate reliable first aid information more effectively.
    Keywords:  GQS; PEMAT; VIQI; cardiopulmonary resuscitation; information quality; public education; public health; social media
    DOI:  https://doi.org/10.3389/fpubh.2025.1657233
  25. Breastfeed Med. 2025 Sep 29.
      Background: Social media platforms, particularly Instagram, are increasingly used by new and expecting parents to seek health-related information, including guidance on breastfeeding. While this offers opportunities for accessible support, concerns persist regarding the accuracy and quality of content shared online. Objective: This study aimed to evaluate the accuracy and general quality of breastfeeding-related information shared on Instagram and to examine how these attributes vary by post characteristics, including format, content topic, and account type. Methods: A cross-sectional observational study was conducted using 80 top-performing Instagram posts identified through four popular breastfeeding-related hashtags. Posts were manually screened and assessed for eligibility. Accuracy was evaluated against official guidelines from the Centers for Disease Control and Prevention, World Health Organization, and American Academy of Pediatrics using a 4-point scale. General quality was assessed using the Global Quality Scale (GQS), a validated 5-point tool. Ordered logistic regression was used to assess associations between post characteristics and outcomes. Results: Overall, 38.8% of posts were completely accurate, while 36.3% were either mostly or completely inaccurate. The mean accuracy score was 3.7 (SD = 1.41), and the mean GQS was 4.0 (SD = 1.03). Image-based posts were significantly more accurate than videos (odds ratio [OR] = 2.52; 95% CI: 1.11-5.74), and posts by health care professionals had significantly higher quality scores (OR = 5.23; 95% CI: 1.66-16.54). Accuracy and quality scores were strongly correlated (ρ = 0.68, p < 0.001). Conclusion: While Instagram can serve as a valuable platform for breastfeeding education, content quality and accuracy vary widely. Posts by health care professionals tend to be more reliable. Public health efforts should focus on amplifying evidence-based content and mitigating misinformation to better support maternal and child health online.
    Keywords:  Instagram; breastfeeding; information accuracy; social media
    DOI:  https://doi.org/10.1177/15568253251383914
  26. JMIR Ment Health. 2025 Sep 30. 12 e77383
       Background: TikTok [ByteDance] is a significant source of mental health-related content, including discussions on selective serotonin reuptake inhibitors (SSRIs). While the app fosters community building, its algorithm also amplifies misinformation as influencers without relevant expertise often dominate conversations about SSRIs. These videos frequently highlight personal experiences, potentially overshadowing evidence-based information from health care professionals. Despite these concerns, TikTok holds potential as a tool for improving mental health literacy when used by professionals to provide credible information.
    Objective: This study aimed to examine TikTok videos on SSRIs, hypothesizing that content will predominantly emphasize negative experiences and that videos by nonmedical professionals will attract higher engagement. By analyzing creators, engagement metrics, content tone, and video tone, this study aimed to shed light on social media's role in shaping perceptions of SSRIs and mental health literacy.
    Methods: A sample of 99 TikTok videos was collected on December 8, 2024. Apify, a web scraper, compiled pertinent engagement metrics (URLs, likes, comments, and shares). Views were manually recorded. In total, 3 researchers evaluated video and content tones and documented findings in Qualtrics. User profiles were analyzed to classify creators as a "medical professional" or "nonmedical professional" based on verification of their credentials. Statistical analyses evaluated the hypotheses.
    Results: The number of videos created by both nonmedical and medical professionals was roughly even. Approximately one-third (35/99, 35%) mentioned a specific SSRI (ie, fluoxetine, fluvoxamine, vilazodone, sertraline, paroxetine, citalopram, or escitalopram). Compared to medical professionals, nonmedical creators produced significantly more videos with a positive video tone (P<.001). TikToks made by both groups of creators, however, had negative content tones (P=.78). Nonmedical professionals received significantly greater overall views (P=.01), likes (P=.01), and comments (P=.03), but overall shares were not significantly different (P=.18). Daily interaction metrics revealed that nonmedical professionals received more daily interaction, but these differences were not significant in terms of views (P=.09), likes (P=.06), comments (P=.15), or shares (P=.28).
    Conclusions: Results showed that while both creator groups focused on negative SSRI side effects and experiences (content tone), the way they presented this information (video tone) differed. Medical professionals generally maintained a neutral video tone, whereas nonmedical professionals were more likely to adopt a positive video tone. This may explain why nonmedical professionals' videos had significantly more cumulative views, likes, and comments than medical professionals' videos. These findings are consistent with other research suggesting that the TikTok algorithm and users are more likely to favor and engage with videos that evoke a strong emotional response and are perceived as relatable to viewers. This study highlights the need for medical professionals to improve their approach to content creation on TikTok by using a more positive video tone to increase engagement.
    Keywords:  SSRI; TikTok; antidepressant; engagement; medical professional; selective serotonin reuptake inhibitor; social media; video tone
    DOI:  https://doi.org/10.2196/77383
  27. Sci Rep. 2025 Sep 29. 15(1): 33532
      Health Information Seeking Behavior [HISB] has developed into one of the crucial components of a person's awareness and responsibility for their health. However, populations at risk of statelessness are often excluded from opportunities and services, particularly those related to health. Using Longo's Model and the Health Belief Model [HBM], this study investigated their HISB and associated determinants, with emphasis on the individual's socio-demographic, psychosocial and health belief factors. The study's data came from a cross-sectional household survey undertaken in the Awutu Senya East Municipality and Gomoa East District of Ghana's Central Region between March 9 and June 26, 2021. Descriptive statistics and binary logistic regression models helped establish the prevalence and predictors of HISB from a sample of 384 at-risk individuals. Prevalence of health information seeking [HIS] was nearly 44% and associated with sex, age, level of education, and internet literacy. Additionally, various constructs of psychosocial resources [self-esteem and trust in health information] and health beliefs [perceived severity, benefits, and perceived barriers] were associated with HIS within our sample. To improve positive HISB, healthcare providers and health promoters must tailor health information to different socio-demographic groups, focus on building trust and rapport with patients, offer social support, and address psychosocial barriers to HIS. Finally, providing accurate and relevant health information must be prioritised.
    Keywords:  Health information-seeking behavior; People at risk of statelessness; Social determinants of health; Social inclusion; Vulnerable populations
    DOI:  https://doi.org/10.1038/s41598-025-18110-x
  28. BMJ Open. 2025 Oct 02. 15(10): e096812
      IntroductionDuring the perinatal and postpartum periods, appropriate health information is crucial for women and their partners. Although previous studies and reviews have identified various sources of health information, these studies have neither sufficiently clarified the relationships between the sources and topics of health information and the acquisition of health information nor included women's partners as participants. Thus, this scoping review protocol aims to map and synthesise evidence on the acquisition of health information by women and their partners during the perinatal and postpartum periods and clarify the relationships between the sources and topics of health information and the acquisition of health information. We aim to generate thorough knowledge of relationships and patterns from the answers to our research questions as follows: (1) What are the relationships between the sources and topics of health information that women and their partners acquire during the perinatal and postpartum periods? (2) What are the patterns of acquisition of health information by women and their partners? (3) What are the patterns of acquisition routes and timing of health information by women and their partners?
    METHODS AND ANALYSIS: This scoping review will be conducted in accordance with the Joanna Briggs Institute Manual for Evidence Synthesis for scoping reviews. We will search PubMed, CINAHL, Embase and Ichushi (Japanese electronic database) for relevant articles. The search in Google will be for grey literature. We will include sources of evidence that were investigated after 2020 and written in English and Japanese. Article screening will be conducted by two independent reviewers and reported in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for scoping review flow diagram. Results will be presented on tables and described narratively.
    ETHICS AND DISSEMINATION: This scoping review covers only secondary data that are publicly available and therefore does not require ethical review approval. The results will be disseminated in a peer-reviewed journal and presented in a conference.
    TRIAL REGISTRATION NUMBER: UMIN000056170.
    Keywords:  Health Education; OBSTETRICS; PUBLIC HEALTH; Postpartum Period; Pregnancy; Review
    DOI:  https://doi.org/10.1136/bmjopen-2024-096812
  29. Hand (N Y). 2025 Sep 28. 15589447251371246
       BACKGROUND: Using Google's "People Also Ask" feature, this study aims to characterize and compare frequently asked patient questions related to open (OCTR), endoscopic (ECTR), and ultrasound-guided (CTR-US) carpal tunnel release.
    METHODS: Search terms related to each surgical approach were entered into a new, incognito Google Chrome browser. For each term, the top 200 questions and corresponding websites were extracted. Questions were grouped using Rothwell's classification system, while websites were scored using the JAMA Benchmark Criteria. Fisher's exact tests were used to compare classifications, while t-tests were used to compare scores between approaches.
    RESULTS: Of the 1200 question-website combinations initially extracted, a total of 477 combinations were included for analysis. There were 197 OCTR, 225 ECTR, 30 CTR-US, and 25 comparative combinations. The most common question subclassification was risk/complications for OCTR (19.8%), evaluation of surgery for ECTR (15.7%), and technical details for CTR-US (40%). More than 25% of answers to OCTR and ECTR questions came from social media or commercial sources, while no answers did for CTR-US. There were no significant differences across approaches in total JAMA scores. Despite only being associated with medical practice websites, comparative questions exhibited significantly lower JAMA scores (P < .01).
    CONCLUSIONS: While patient questions and their associated websites on OCTR and ECTR exhibited similar classifications and quality respectively, surgeons should still emphasize the relative risks associated with ECTR. Furthermore, there is a need for high-quality online information on CTR-US and on comparisons between approaches.
    Keywords:  carpal tunnel syndrome; nerve compression; research and health outcomes; search analytics; surgery
    DOI:  https://doi.org/10.1177/15589447251371246
  30. J Med Internet Res. 2025 Sep 29. 27 e67640
       BACKGROUND: Nutrition misinformation is pervasive on frequently accessed online sources such as social media platforms and websites. Young adults are at a high risk of viewing or engaging with this content due to their high internet and social media usage.
    OBJECTIVE: This study aimed to understand young adults' preferences, perceptions, and use of online nutrition content.
    METHODS: Young Australian adults (aged 18-25 years) were recruited and interviewed individually via video calling (Zoom; Zoom Video Communications) between December 2023 and February 2024. Participants were recruited via convenience sampling using Facebook advertising. The interviewer followed a semistructured format, and questions were guided using a piloted template. Reflexive thematic analysis was conducted using NVivo (Lumivero) to explore the preferences, perceptions, and use of online nutrition content among the sample.
    RESULTS: The sample (N=20; mean age 22.9 y, SD 2.3 y) was predominantly female (n=13, 65%) and had, or was studying toward, a tertiary qualification (16/17, 94%). Most participants used social media (19/20, 95%) and internet websites (16/20, 80%) to access nutrition content. Other platforms used included generative artificial intelligence (n=1), apps (n=1), eBooks (n=1), newsletters (n=1), and podcasts (n=1). When exploring perceptions, most participants agreed that online nutrition content was quick and easy to find and informative. Furthermore, perceived reliability and engagement depended on several factors such as the creator's credentials, length and format of content, consensus on topics, and sponsorships. Short-form content was not considered reliable, despite its engaging nature. Content containing sponsorships or product endorsements was met with skepticism. However, participants were more likely to trust content reportedly created by health professionals, but it was unknown whether they were accessing verified professionals. The oversaturation of content demotivated participants from evaluating the reliability of content. When asked about preferences, participants valued both short- and long-form content, and evidence-based content such as statistics and references and preferred casual and entertaining content that incorporated high-quality and dynamic editing techniques such as voiceovers.
    CONCLUSIONS: The study identified the online nutrition content sources and topics young Australian adults access and the key factors that influence their perceptions and preferences. Young Australian adults acknowledge that misinformation is not exclusive to certain platforms. The accessibility and engagement of content and the ambiguity of professional "credentials" may lead them to trust information that is potentially of low quality and accuracy. Findings also show that there needs to be a balance between engaging formats and presenting evidence-based information when designing online nutrition content to engage these audiences while combatting nutrition misinformation. Future research should explore how these factors impact usage of online nutrition content and dietary behaviors among young Australian adults. Further consultation with this cohort can inform tailored interventions that aim to enhance their food and nutrition literacy and diet quality.
    Keywords:  Australia; internet; misinformation; nutrition; online; online nutrition content; perceptions; preferences; qualitative; social media; young adults
    DOI:  https://doi.org/10.2196/67640
  31. Cognition. 2025 Sep 26. pii: S0010-0277(25)00270-7. [Epub ahead of print]266 106329
      Research has shown that searching for information online can increase a person's likelihood to search for other information online, a phenomenon known as the Internet Fixation Effect. In the current study, we conducted four experiments examining the boundary conditions of the Internet Fixation Effect and whether it can be attributed, at least in part, to how online searching affects metacognitive judgments. We replicated the Internet Fixation Effect but failed to find any evidence that it can be attributed to participants becoming less confident in what they know and can access internally. Instead, we interpret our results as suggesting that the Internet Fixation Effect may be better explained by participants becoming more habitually reliant on the Internet as a transactive memory partner within the context of an increasingly integrated and extended memory system.
    Keywords:  Internet fixation; Metacognition; Online searching; Transactive memory
    DOI:  https://doi.org/10.1016/j.cognition.2025.106329
  32. Sci Rep. 2025 Sep 29. 15(1): 33466
      This study evaluated ChatGPT's ability to simplify scientific abstracts for both public and clinician use. Ten questions were developed to assess ChatGPT's ability to simplify scientific abstracts and improve their readability for both the public and clinicians. These questions were applied to 43 abstracts. The abstracts were selected through a convenience sample from Google Scholar by four interdisciplinary reviewers from physiotherapy, occupational therapy, and nursing backgrounds. Each abstract was summarized by ChatGPT on two separate occasions. These summaries were then reviewed independently by two different reviewers. Flesch Reading Ease scores were calculated for each summary and original abstract. A subgroup analysis explored differences in accuracy, clarity, and consistency across various study designs. ChatGPT's summaries scored higher on the Flesch Reading Ease test than the original abstracts in 31 out of 43 papers, showing a significant improvement in readability (p = 0.005). Systematic reviews and meta-analyses consistently received higher scores for accuracy, clarity, and consistency, while clinical trials scored lower across these parameters. Despite its strengths, ChatGPT showed limitations in "Hallucination presence" and "Technical terms usage," scoring below 7 out of 10. Hallucination rates varied by study type, with case reports having the lowest scores. Reviewer agreement across parameters demonstrated consistency in evaluations. ChatGPT shows promise for translating knowledge in clinical settings, helping to make scientific research more accessible to non-experts. However, its tendency toward hallucinations and technical jargon requires careful review by clinicians, patients, and caregivers. Further research is needed to assess its reliability and safety for broader use in healthcare communication.
    Keywords:  ChatGPT; Flesch reading ease score; Hallucination presence; Healthcare dissemination; Technical terms
    DOI:  https://doi.org/10.1038/s41598-025-11086-8
  33. Sci Rep. 2025 Oct 03. 15(1): 34525
      Modern healthcare interoperability demands objective methods for quantitatively evaluating the coverage and granularity of biomedical terminology systems to support evidence-based selection and integration decisions. We introduce novel metrics-structural size (an integrated measure of width and depth), mapping burden ratio (a measure of relative granularity between systems), and content overlap-to quantitatively evaluate the semantic integration potentials of five major terminology systems: SNOMED CT; Logical Observation Identifiers Names and Codes (LOINC); International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM); Gene Ontology (GO); and Current Procedural Terminology. The Unified Medical Language System Metathesaurus was employed to establish semantic equivalency between concepts from different systems. SNOMED CT exhibited superior granularity across most clinical domains, with some exceptions (ICD-10-CM in "Qualifier value," GO in "Observable entity," and LOINC in "Staging and scales.") These findings address the challenge of semantic degradation in health information exchange by quantifying the degree to which meaning might be lost when translating between terminology systems. The proposed metrics empower healthcare organizations to develop targeted extensions or integration strategies that maintain semantic consistency across systems, providing objective tools for terminology system selection, integration planning, and semantic interoperability assessment.
    Keywords:  Ontology; Quantitative assessment; SNOMED CT; Terminology system; Unified medical language system
    DOI:  https://doi.org/10.1038/s41598-025-17737-0