bims-librar 2025-02-09 papers

bims-librar

Biomed News

on Biomedical librarianship

Issue of 2025–02–09
29 papers selected by
Thomas Krichel, Open Library Society

Tea trolley teaching in critical care: Integrating evidence-based practice with library services.
The ethics of artificial intelligence use in university libraries in Zimbabwe.
Supplementary databases increased literature search coverage beyond PubMed and Embase.
Textual evidence systematic reviews series paper 2: challenges and strategies in developing a search strategy for systematic reviews of textual evidence.
Characteristics of search methods in dental meta-research studies: a methodological study.
LitSumm: large language models for literature summarization of noncoding RNAs.
Distribution of trial registry numbers within full-text of PubMed Central articles: implications for linking trials to publications and indexing trial publication types.
A Study on Consumer-Centric Health Information Provision Strategy Using SWOT-AHP -Focusing on the National Health Information Portal.
Assessing the language availability, readability, suitability and comprehensibility of heat-health messaging content on health authority webpages and online resources in Canada.
Developing and Validating a User-Friendly Quality Benchmark: Enhancing the Integrity of Online Health Information for Patients and Clinicians.
Evaluating the fidelity of AI-generated information on long-acting reversible contraceptive methods.
A Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-4o and Google Gemini in Answering Questions About Birth Control Methods.
Quality of information on Wilms tumour from artificial intelligence chatbots: what are your patients and their families reading?
ChatGPT 35 Better Improves Comprehensibility of English, Than Spanish, Generated Responses to Osteosarcoma Questions.
A Comparative Analysis of Three Large Language Models on Bruxism Knowledge.
Assessing ChatGPT Responses to Frequently Asked Questions Regarding Pediatric Supracondylar Humerus Fractures.
ChatGPT is a comprehensive education tool for patients with patellar tendinopathy, but it currently lacks accuracy and readability.
Use of ChatGPT to obtain health information in Australia, 2024: insights from a nationally representative survey.
Readability of foot and ankle patient-reported outcomes: Alignment with National institutes of health and american medical association standards.
The readability of online patient education materials for cutaneous autoimmune connective tissue diseases.
Analysis of YouTube Videos on Herbal Approaches Used in Coping with Cancer.
Health Information on Breast Cancer Surgery on YouTube®.
YouTube as an information source in deep margin elevation: Reliability, accuracy and quality analysis.
Quality and Reliability Analysis of YouTube as a Source of Patient Information on de Quervain's Tenosynovitis.
Evaluating Quality and Reliability of Most-Viewed TikTok Videos About Spironolactone.
Quality of information in gestational diabetes mellitus videos on TikTok: Cross-sectional study.
Social media and educational resources in masculinizing top surgery: The effect of age on patient preferences, subjective understanding and readability.
The impact of health information seeking and social influence on functional food purchase intention.
The process of obtaining information about COVID-19 among students of physiotherapy and rehabilitation department.

Nurs Crit Care. 2025 Feb 02.

Tea trolley teaching in critical care: Integrating evidence-based practice with library services.

H McGivern, S Bridge, S Sutherland, E Reynolds, J Ede.

  Tea trolley teaching is a tried and tested method of providing bedside education to hospital staff. This project aimed to integrate the tea trolley teaching model, already established in our local critical care unit, with library services. The goal was to equip clinical staff with the necessary training to retrieve literature and support evidence-based practice. Our evaluation highlights the value of this combined intervention of teaching research skills to upskill staff working in our intensive care units. This paper describes a scalable model of critical care bedside education that integrates library-focused teaching to upskill nurses in some of the prerequisite skills needed for evidence-based practice (EBP).

Keywords:  critical care; health services research; leadership; research implementation; research in practice

DOI:  https://doi.org/10.1111/nicc.13264
Front Res Metr Anal. 2024 ;9 1522423

The ethics of artificial intelligence use in university libraries in Zimbabwe.

Stephen Tsekea, Edward Mandoga.

   Introduction: The emergence of artificial intelligence (AI) has revolutionised higher education teaching and learning. AI has the power to analyse large amounts of data and make intelligent predictions thus changing the whole teaching and learning processes. However, such a rise has led to institutions questioning the morality of these applications. The changes have left librarians and educators worried about the major ethical questions surrounding privacy, equality of information, protection of intellectual property, cheating, misinformation and job security. Libraries have always been concerned about ethics and many go out of their way to make sure communities are educated about the ethical question. However, the emergence of artificial intelligence has caught them unaware.
Methods: This research investigates the preparedness of higher education librarians to support the ethical use of information within the higher and tertiary education fraternity. A qualitative approach was used for this study. Interviews were done with thirty purposively selected librarians and academics from universities in Zimbabwe.
Results: Findings indicated that many university libraries in Zimbabwe are still at the adoption stage of artificial intelligence. It was also found that institutions and libraries are not yet prepared for AI use and are still crafting policies on the use of AI.
Discussion: Libraries seem prepared to adopt AI. They are also prepared to offer training on how to protect intellectual property but have serious challenges in issues of transparency, data security, plagiarism detection and concerns about job losses. However, with no major ethical policies having been crafted on AI use, it becomes challenging for libraries to full adopt its usage.

Keywords:  artificial intelligence; ethics; higher education; higher education integrity; university libraries

DOI:  https://doi.org/10.3389/frma.2024.1522423
J Clin Epidemiol. 2025 Jan 29. pii: S0895-4356(25)00037-X. [Epub ahead of print] 111704

Supplementary databases increased literature search coverage beyond PubMed and Embase.

Tove Faber Frandsen, Caroline Moos, Cecilia Isabella Linnemann Herrera Marino, Mette Brandt Eriksen.

INTRODUCTION: In health sciences, comprehensive literature searches are crucial for ensuring the accuracy and completeness of systematic reviews. Relying on only a few databases can lead to the omission of relevant studies. The variability in database coverage for different specialties means that important literature might be missed if searches are not broadened. Supplementary databases can enhance the thoroughness of literature reviews, but the efficiency and necessity of these additional searches remain subject to debate. This study aims to explore methods for retrieving publications not indexed in PubMed and Embase, examining coverage of various specialties to determine the most effective search strategies for systematic reviews.
METHODS: We selected reviews from the following Cochrane review groups: Public Health, Incontinence, Hepato-Biliary, and Stroke groups. All reviews published in these groups between 2017 and 2022 were analyzed. Publications included in these reviews were manually searched for in PubMed and Embase. If the publication was not found, additional databases such as Cochrane Library, PsycInfo, CINAHL, and ClinicalTrials.gov were searched. Descriptive statistics were used to analyze the data.
RESULTS: The mean coverage of publications in PubMed and Embase across all four speciality groups was 71.5%, with individual group coverage ranging from 64.5% to 75.9%. An average of 5.8% of publications could not be retrieved in any of the databases studied. Additional databases varied in their coverage.
CONCLUSION: While PubMed and Embase provide substantial coverage, supplementary databases can increase retrieval of more relevant studies and are essential for a comprehensive literature search.

DOI: https://doi.org/10.1016/j.jclinepi.2025.111704
JBI Evid Synth. 2025 Feb 05.

Textual evidence systematic reviews series paper 2: challenges and strategies in developing a search strategy for systematic reviews of textual evidence.

Deborah Edwards, Adam Cooper, Alexa McArthur, Brittany V Barber, Emily Gregg, Lori E Weeks, Zoe Jordan.

OBJECTIVE: The objective of this paper is to highlight and address challenges as well as provide strategies for developing searches for systematic reviews of textual evidence.
INTRODUCTION: When conducting a JBI review of textual evidence, it is important to consider different sources of published and unpublished material. While systematic search methodologies have been well-established for searching traditional peer-reviewed literature, applying those same rigorous methods to literature outside of academic journals can be more challenging. This paper highlights and addresses the challenges of developing searches for systematic reviews of textual evidence and provides strategies on how to conduct these. It takes into consideration the unique complexities of locating published material outside of academic journals and presents guidance for developing more robust searches incorporating textual evidence.
DISCUSSION: Researchers should acknowledge the value of textual evidence, including opinions, narratives, and policies, as crucial for informing health care practices. It is also essential to clearly define the types of textual evidence needed and establish comprehensive search parameters to ensure thorough coverage. To enhance the search process, researchers should follow a structured 3-phase approach: first, identify relevant keywords; second, conduct tailored searches in bibliographic databases; and third, perform supplementary searches. Furthermore, it is recommended they collaborate with information specialists and experts to refine and strengthen their search techniques. Researchers should also explore a variety of sources, including dedicated databases, conference proceedings, theses, dissertations, and media reports, to gather valuable textual evidence. Finally, it is important to systematically document all search processes to support transparency and reproducibility in the review.
CONCLUSION: Searching broadly across bibliographic databases and including textual evidence from non-academic journals may provide the best available and most appropriate evidence to address specific questions.

DOI: https://doi.org/10.11124/JBIES-24-00292
J Clin Epidemiol. 2025 Jan 30. pii: S0895-4356(25)00026-5. [Epub ahead of print] 111693

Characteristics of search methods in dental meta-research studies: a methodological study.

Leonie Weeber, Naichuan Su, Clovis Mariano Faggion.

   BACKGROUND: Meta-research studies, defined as. "research on research", should transparently report search methods used to identify the assessed research. Currently, there is no published evaluation of search methods reporting in meta-research studies. The aim of this study was to assess the characteristics of search methods in dental meta-research studies and to identify factors associated with the completeness of the reported search strategies.
METHODS: With a focus on the assessment of reporting quality and methodological quality, we searched in the Web of Science Core Collection database for dental meta-research studies published from the database's inception to February 13, 2024. The extracted data included the examined meta-research studies, characteristics of their authors and journals and search methods reporting of the examined studies. Logistic regression models were applied to examine the associations between relevant variables and search strategy reporting completeness.
RESULTS: The search generated 3,774 documents, and 224 meta-research studies were included in the final analysis. Nearly all studies (99.6%) disclosed their general search methods, but only 130 studies (58%) provided both keywords and Boolean operators. Regression analyses indicated that meta-research studies published more recently, with prospective registration, with a shorter time between the searches and publication, a lack of language restrictions and librarian involvement were more likely to report a more complete search strategy.
CONCLUSION: The results highlight the importance of unrestricted language searches, structured methodologies and librarian support in improving the quality and transparency of reporting search strategies in dental meta-research.

Keywords:  Search methods; methodological study; methodology; search strategies; systematic reviews

DOI:  https://doi.org/10.1016/j.jclinepi.2025.111693
Database (Oxford). 2025 Feb 05. pii: baaf006. [Epub ahead of print]2025

LitSumm: large language models for literature summarization of noncoding RNAs.

Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney.

Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.

DOI: https://doi.org/10.1093/database/baaf006
Trials. 2025 Jan 31. 26(1): 34

Distribution of trial registry numbers within full-text of PubMed Central articles: implications for linking trials to publications and indexing trial publication types.

Arthur M Holt, Ang Michael Troy, Neil R Smalheiser.

   BACKGROUND: Linking registered clinical trials with their published results continues to be a challenge. A variety of natural language processing (NLP)-based and machine learning-based models have been developed to assist users in identifying these connections. To date, however, no system has attempted to detect mentions of registry numbers within the full-text of articles.
METHODS: Articles from the PubMed Central full-text Open Access dataset were scanned for mentions of ClinicalTrials.gov and international clinical trial registry identifiers. We analyzed the distribution of trial registry numbers within sections of the articles and characterized their publication type indexing and other metrics.
RESULTS: Registry numbers mentioned in article metadata (e.g., the abstract) or in the Methods section of full-text are highly predictive of clinical trial articles. When a clinical trial article mentioned ClinicalTrials.gov identifier numbers (NCT) only in the Methods section, in every case examined, it was reporting clinical outcomes from that registered trial, and thus can reliably be used to link that trial to that publication. Conversely, registry numbers mentioned in Tables arise almost entirely from reviews (including systematic reviews and meta-analyses). Registry numbers mentioned in other full-text sections have relatively little predictive value for linking trials to their publications. Clinical trial articles that mention CONSORT or SPIRIT guidelines have a higher rate of mentioning registry numbers in article metadata, and hence are more easily linked to their underlying trials, than articles overall.
CONCLUSIONS: The appearance and location of trial registry numbers within the full-text of biomedical articles provide valuable features for connecting clinical trials to their publications. They also potentially provide information to assist automated tools in assigning publication types to articles.

Keywords:  Bibliographic database; Clinical trial registry; Clinical trial reporting; ClinicalTrials.gov; PubMed; PubMed Central; Publication types; Systematic review; Trial registration

DOI:  https://doi.org/10.1186/s13063-025-08741-w
Health Care Anal. 2025 Feb 04.

A Study on Consumer-Centric Health Information Provision Strategy Using SWOT-AHP -Focusing on the National Health Information Portal.

Jaeeun Baek.

  Approximately 70% of Koreans access health and medical information online. Health information providers play a crucial role in enhancing public health by ensuring that individuals can effectively consume and utilize this information according to their information-seeking behaviors. However, existing tools for evaluating health information websites have significant limitations. These tools are often one-size-fits-all and lack strategic recommendations for delivering consumer-centered health information. There is a clear need for alternative approaches beyond merely identifying the quality factors that satisfy consumers. A Strengths, Weaknesses, Opportunities, and Threats-Analytic Hierarchy Process (SWOT-AHP) evaluates both internal and external environmental factors of a health information website, which provides strategies based on the prioritization and weighting of quality factors. Specifically, the 'National Health Information Portal,' a platform provided by the Korea Disease Control and Prevention Agency, was assessed through a comprehensive review of prior research and a SWOT analysis, followed by an AHP survey involving 15 experts specializing in health information websites. The findings of the analysis indicate that the most effective development strategy is the SO (Strengths-Opportunities) strategy. This study highlights the need to move beyond uniform evaluation tools and consider the dynamic and complex nature of the Internet, emphasizing the importance of developing prioritized strategies based on evaluations from both consumers and providers.

Keywords:  Consumer health information; Consumer-centered strategies; Health information website assessment; SWOT-AHP

DOI:  https://doi.org/10.1007/s10728-024-00505-y
PEC Innov. 2025 Jun;6 100368

Assessing the language availability, readability, suitability and comprehensibility of heat-health messaging content on health authority webpages and online resources in Canada.

Emily J Tetzlaff, Kristina-Marie T Janetos, Katie E Wagar, Farah Mourad, Melissa Gorman, Victor Gallant, Glen P Kenny.

   Objectives: Heat-health communication initiatives are a key public health protection strategy. Therefore, understanding the potential challenges that all Canadians and specific groups, such as those facing literacy barriers and non-native language speakers, may experience in accessing or interpreting information, is critical.
Methods: This study reviewed and evaluated the language availability, readability, suitability, and comprehensibility of heat-related webpages and online resources (n = 417) published on public health authority websites in Canada (n = 73). Six validated readability scales and a comprehensibility instrument were used.
Results: Most content was presented in English (90 %); however, only 7 % of the online resources were available in more than one language. The average reading grade level of the content (grade 8) exceeded the recommended level (grade 6), and only 22 % of the content was deemed superior for suitability and comprehensibility.
Conclusions: Our study evaluating web-based materials about extreme heat published by Canadian health authorities provides evidence that the current language availability, readability, suitability, and comprehensibility may be limiting the capacity for members of the public to discern key messaging.
Innovation: To ensure all Canadians can access and interpret information related to heat-health protection, public health authorities may consider translating their materials into additional languages and incorporating a readability evaluation to improve public understanding.

Keywords:  Accessibility; Extreme heat; Heat wave; Heat-health protection; Public health

DOI:  https://doi.org/10.1016/j.pecinn.2024.100368
J Patient Exp. 2025 ;12 23743735241309468

Developing and Validating a User-Friendly Quality Benchmark: Enhancing the Integrity of Online Health Information for Patients and Clinicians.

Lubna Daraz, Cicek Dogu.

  The quality of online health information remains one of the leading causes in combating misinformation for patients and the public. However, assessing online health content is challenging for those without medical expertise. This article briefly outlines the development and validation of an evidence-based online health information evaluation tool. A systematic approach with five phases was adopted: (1) synthesizing the current state of the reliability of online health information, (2) conducting content analysis of existing quality assessment tools, (3) drafting a comprehensive list of quality criteria, (4) developing and validating a quality benchmark, and (5) disseminating the results. Collaborative input from healthcare providers, patients, caregivers, and the public developed and validated a quality benchmark. The quality benchmark consists of 5 quality criteria and 8 accompanying descriptions that define each quality criterion. A printable version of the benchmark is provided in the article to facilitate easy implementation by both patients and healthcare providers. The benchmark is recommended for use and intended to empower patients with a skill set to navigate through online misinformation, facilitating access to credible health information and promoting improved health outcomes.

Keywords:  benchmark; health information; healthcare providers; internet; patients; quality

DOI:  https://doi.org/10.1177/23743735241309468
Eur J Contracept Reprod Health Care. 2025 Feb 06. 1-4

Evaluating the fidelity of AI-generated information on long-acting reversible contraceptive methods.

Grace Riley, Elizabeth Wang, Camille Flynn, Ashley Lopez, Aparna Sridhar.

   INTRODUCTION: Artificial intelligence (AI) has many applications in health care. Popular AI chatbots, such as ChatGPT, have the potential to make complex health topics more accessible to the general public. The study aims to assess the accuracy of current long-acting reversible contraception information provided by ChatGPT.
METHODS: We presented a set of 8 frequently-asked questions about long-acting reversible contraception (LARC) to ChatGPT, repeated over three distinct days. Each question was repeated with the LARC name changed (e.g., 'hormonal implant' vs 'Nexplanon') to account for variable terminology. Two coders independently assessed the AI-generated answers for accuracy, language inclusivity, and readability. Scores from the three duplicated sets were averaged.
RESULTS: A total of 264 responses were generated. 69.3% of responses were accurate. 16.3% of responses contained inaccurate information. The most common inaccuracy was outdated information regarding the duration of use of LARCs. 14.4% of responses included misleading statements based on conflicting evidence, such as claiming intrauterine devices increase one's risk for pelvic inflammatory disease. 45.1% of responses used gender-exclusive language and referred only to women. The average Flesch readability ease score was 42.8 (SD 7.1), correlating to a college reading level.
CONCLUSION: ChatGPT offers important information about LARCs, though a minority of responses are found to be inaccurate or misleading. A significant limitation is AI's reliance on data from before October 2021. While AI tools can be a valuable resource for simple medical queries, users should be cautious of the potential for inaccurate information.
SHORT CONDENSATION: ChatGPT generally provides accurate and adequate information about long-acting contraception. However, it occasionally makes false or misleading claims.

Keywords:  Long-acting reversible contraception; artificial intelligence

DOI:  https://doi.org/10.1080/13625187.2025.2450011
Cureus. 2025 Jan;17(1): e76745

A Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-4o and Google Gemini in Answering Questions About Birth Control Methods.

Erhan Muluk.

  Background Birth control methods (BCMs) are often underutilized or misunderstood, especially among young individuals entering their reproductive years. With the growing reliance on artificial intelligence (AI) platforms for health-related information, this study evaluates the performance of ChatGPT-4o and Google Gemini in addressing commonly asked questions about BCMs. Methods Thirty questions, derived from the American College of Obstetrics and Gynecologists (ACOG) website, were posed to both AI platforms. Questions spanned four categories: general contraception, specific contraceptive types, emergency contraception, and other topics. Responses were evaluated using a five-point rubric assessing Relevance, Completeness, and Lack of False Information (RCL). Overall scores were calculated by averaging the rubric scores. Statistical analysis, including the Wilcoxon Signed-Rank test, Friedman test, and Kruskal-Wallis test, was performed to compare metrics. Results ChatGPT-4o and Google Gemini provided high-quality responses to birth control-related queries, with overall scores averaging 4.38 ± 0.58 and 4.37 ± 0.52, respectively, both categorized as "very good" to "excellent." ChatGPT-4o demonstrated higher scores in the lack of false information, based on descriptive statistics (4.70 ± 0.60 vs. 4.47 ± 0.73), while Google Gemini outperformed in relevance, with a statistically significant difference (4.53 ± 0.57 vs. 4.30 ± 0.70, p = 0.035, large effect size). Completeness scores were comparable (p = 0.655). Statistical analyses revealed no significant differences in overall performance (p = 0.548), though Google Gemini demonstrated a potential trend of stronger performance in the "Other Topics" category. Within-model variability showed ChatGPT-4o had more pronounced differences among metrics (moderate effect size, Kendall's W = 0.357), while Google Gemini exhibited smaller variability (Kendall's W = 0.165). These findings suggest that both platforms offer reliable and complementary tools for addressing knowledge gaps in contraception, with nuanced strengths that warrant further exploration. Conclusions ChatGPT-4o and Google Gemini provided reliable and accurate responses to BCM-related queries, with slight differences in strengths. These findings underscore the potential of AI tools, in addressing public health information needs, particularly for young individuals seeking guidance on contraception. Further studies with larger datasets may elucidate nuanced differences between AI platforms.

Keywords:  artificial intelligence; birth control methods; chatgpt-4o; contraception; google gemini; health information

DOI:  https://doi.org/10.7759/cureus.76745
Urology. 2025 Feb 04. pii: S0090-4295(25)00103-7. [Epub ahead of print]

Quality of information on Wilms tumour from artificial intelligence chatbots: what are your patients and their families reading?

Peter Stapleton, Jordan Santucci, Thomas P Cundy, Niranjan Sathianathen.

OBJECTIVE: To assess the ability of AI chatbots to deliver quality and understandable information on Wilms tumours to patients and their families.
METHOD: Google trends were used to evaluate the most asked questions related to Wilms tumour. 4 AI chatbots (ChatGPT version 3.5, Perplexity, Chat Sonic, and Bing AI) were then used to assess these questions and their responses reviewed. Validated instruments were used to assess the quality (DISCERN instrument from 1 low to 5 high), understandability and actionability (PEMAT, from 0 to 100%), the reading level of the information and whether there was misinformation compared to guidelines (5-point Likert scale).
RESULTS: All AI chat bots provided a high level of patient health information with a median DISCERN score of 4 (IQR 3-5). Additionally, there was little to no misinformation in outputs with a median of 1 (IQR 1-1). The median word count per output from the AIs was 275 (IQR 156 - 322), with an advanced ease of reading level comparable to a high school or college student, median Flesch-Kincaid Readability level of 46.7 (IQR 41.1 - 52.2). The overall PEMAT actionability was poor with a median of 40% (40-65), while the PEMAT understandability of the AI chatbot outputs was high, 83% (IQR 75 - 91.2).
CONCLUSION: AI chatbots provide generalised, understandable and accurate information regarding Wilms tumour. They can be reliably used as a source for patients and families when seeking further information. However, much of the information is reliant of medical professionals and not easily actionable by consumers but may act as a guide to help with discussions and understanding treatments.

DOI: https://doi.org/10.1016/j.urology.2025.01.054
J Surg Oncol. 2025 Feb 03.

ChatGPT 35 Better Improves Comprehensibility of English, Than Spanish, Generated Responses to Osteosarcoma Questions.

Rosamaria Dias, Ashley Castan, Katie Gotoff, Yazan Kadkoy, Joseph Ippolito, Kathleen Beebe, Joseph Benevenia.

BACKGROUND: Despite adequate discussion and counseling in the office, inadequate health literacy or language barriers may make it difficult to follow instructions from a physician and access necessary resources. This may negatively impact survival outcomes. Most healthcare materials are written at a 10th grade level, while many patients read at an 8th grade level. Hispanic Americans comprise about 25% of the US patient population, while only 6% of physicians identify as bilingual.
QUESTIONS/PURPOSE: (1) Does ChatGPT 3.5 provide appropriate responses to frequently asked patient questions that are sufficient for clinical practice and accurate in English and Spanish? (2) What is the comprehensibility of the responses provided by ChatGPT 3.5 and are these modifiable?
METHODS: Twenty frequently asked osteosarcoma patient questions, evaluated by two fellowship-trained musculoskeletal oncologists were input into ChatGPT 3.5. Responses were evaluated by two independent reviewers to assess appropriateness for clinical practice, and accuracy. Responses were graded using the Flesch Reading Ease Score (FRES) and the Flesch-Kincaid Grade Level test (FKGL). The responses were then input into ChatGPT 3.5 for a second time with the following command "Make text easier to understand". The same method was done in Spanish.
RESULTS: All responses generated were appropriate for a patient-facing informational platform. There was no difference in the Flesch Reading Ease Score between English and Spanish responses before the modification (p = 0.307) and with the Flesch-Kincaid grade level (p = 0.294). After modification, there was a statistically significant difference in comprehensibility between English and Spanish responses (p = 0.003 and p = 0.011).
CONCLUSION: In both English and Spanish, none of the ChatGPT generated responses were found to be factually inaccurate. ChatGPT was able to modify responses upon follow-up with a simplified command. However, it was shown to be better at improving English responses than equivalent Spanish responses.

DOI: https://doi.org/10.1002/jso.28109
J Oral Rehabil. 2025 Feb 06.

A Comparative Analysis of Three Large Language Models on Bruxism Knowledge.

Elisa Souza Camargo, Isabella Christina Costa Quadras, Roberto Ramos Garanhani, Cristiano Miranda de Araujo, Juliana Stuginski-Barbosa.

   BACKGROUND: Artificial Intelligence (AI) has been widely used in health research, but the effectiveness of large language models (LLMs) in providing accurate information on bruxism has not yet been evaluated.
OBJECTIVES: To assess the readability, accuracy and consistency of three LLMs in responding to frequently asked questions about bruxism.
METHODS: This cross-sectional observational study utilised the Google Trends tool to identify the 10 most frequent topics about bruxism. Thirty frequently asked questions were selected, which were submitted to ChatGPT-3.5, ChatGPT-4 and Gemini at two different times (T1 and T2). The readability was measured using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKG) metrics. The responses were evaluated for accuracy using a three-point scale, and consistency was verified by comparing responses between T1 and T2. Statistical analysis included ANOVA, chi-squared tests and Cohen's kappa coefficient considering a p value of 0.5.
RESULTS: In terms of readability, there was no difference in FRE. The Gemini model showed lower FKG scores than the Generative Pretrained Transformer (GPT)-3.5 and GPT-4 models. The average accuracy of the responses was 68.33% for GPT-3.5, 65% for GPT-4 and 55% for Gemini, with no significant differences between the models (p = 0.290). Consistency was substantial for all models, with the highest being in GPT-3.5 (95%). The three LLMs demonstrated substantial agreement between T1 and T2.
CONCLUSION: Gemini's responses were potentially more accessible to a broader patient population. LLMs demonstrated substantial consistency and moderate accuracy, indicating that these tools should not replace professional dental guidance.

Keywords:  artificial intelligence; bruxism; dental research; knowledge acquisition; machine learning; natural language processing

DOI:  https://doi.org/10.1111/joor.13948
J Pediatr Orthop. 2025 Feb 07.

Assessing ChatGPT Responses to Frequently Asked Questions Regarding Pediatric Supracondylar Humerus Fractures.

Austin W Li, Jeremy M Adelstein, Lambert T Li, Margaret A Sinkler, R Justin Mistovich.

BACKGROUND: The internet and standard search engines are commonly used resources for patients seeking medical information online. With the advancement and increasing usage of artificial intelligence (AI) in health information, online AI chatbots such as ChatGPT may surpass traditional web search engines as the next go-to online resource for medical information. This study aims to assess the ability of ChatGPT to answer frequently asked questions regarding pediatric supracondylar humerus (SCH) fractures.
METHODS: Seven (7) frequently asked questions (FAQs) regarding SCH fractures were presented to ChatGPT. Initial responses were recorded and rated as either "excellent requiring no clarification (0 items need clarification)," "satisfactory requiring minimal clarification (1 to 2 items need clarification)," "satisfactory requiring moderate clarification (3 to 4 items need clarification)," or "unsatisfactory requiring substantial clarification (>4 items need clarification or response contains false information)."
RESULTS: While 4 responses met satisfactory ratings with either moderate (2 responses) or minimal (2 responses) clarification, 3 of the 7 FAQs yielded a response from ChatGPT that were unsatisfactory. There were no responses that required no further clarification.
CONCLUSIONS: ChatGPT provided some satisfactory responses to FAQs regarding pediatric SCH fractures, but required substantial clarification about treatment algorithms, casting and return to sport timelines, and the utility of physical therapy. Therefore, ChatGPT is an unreliable resource for information on treating SCH fractures. Parents of children who experience SCH fractures should continue to communicate with their doctors for the most accurate medical information.
LEVEL OF EVIDENCE: Level V-expert opinion on ChatGPT responses.

DOI: https://doi.org/10.1097/BPO.0000000000002923
Musculoskelet Sci Pract. 2025 Jan 31. pii: S2468-7812(25)00023-2. [Epub ahead of print]76 103275

ChatGPT is a comprehensive education tool for patients with patellar tendinopathy, but it currently lacks accuracy and readability.

Jie Deng, Lun Li, Jelle J Oosterhof, Peter Malliaras, Karin Grävare Silbernagel, Stephan J Breda, Denise Eygendaal, Edwin Hg Oei, Robert-Jan de Vos.

   BACKGROUND: Generative artificial intelligence tools, such as ChatGPT, are becoming increasingly integrated into daily life, and patients might turn to this tool to seek medical information.
OBJECTIVE: To evaluate the performance of ChatGPT-4 in responding to patient-centered queries for patellar tendinopathy (PT).
METHODS: Forty-eight patient-centered queries were collected from online sources, PT patients, and experts and were then submitted to ChatGPT-4. Three board-certified experts independently assessed the accuracy and comprehensiveness of the responses. Readability was measured using the Flesch-Kincaid Grade Level (FKGL: higher scores indicate a higher grade reading level). The Patient Education Materials Assessment Tool (PEMAT) evaluated understandability, and actionability (0-100%, higher scores indicate information with clearer messages and more identifiable actions). Semantic Textual Similarity (STS score, 0-1; higher scores indicate higher similarity) assessed variation in the meaning of texts over two months (including ChatGPT-4o) and for different terminologies related to PT.
RESULTS: Sixteen (33%) of the 48 responses were rated accurate, while 36 (75%) were rated comprehensive. Only 17% of treatment-related questions received accurate responses. Most responses were written at a college reading level (median and interquartile range [IQR] of FKGL score: 15.4 [14.4-16.6]). The median of PEMAT for understandability was 83% (IQR: 70%-92%), and for actionability, it was 60% (IQR: 40%-60%). The medians of STS scores in the meaning of texts over two months and across terminologies were all ≥ 0.9.
CONCLUSIONS: ChatGPT-4 provided generally comprehensive information in response to patient-centered queries but lacked accuracy and was difficult to read for individuals below a college reading level.

Keywords:  Communication; Large language models; Patient education; Self-management

DOI:  https://doi.org/10.1016/j.msksp.2025.103275
Med J Aust. 2025 Feb 04.

Use of ChatGPT to obtain health information in Australia, 2024: insights from a nationally representative survey.

Julie Ayre, Erin Cvejic, Kirsten J McCaffery.



Keywords:  Artificial intelligence; Health communication; Public health; Social determinants of health

DOI:  https://doi.org/10.5694/mja2.52598
J Foot Ankle Surg. 2025 Feb 01. pii: S1067-2516(25)00033-X. [Epub ahead of print]

Readability of foot and ankle patient-reported outcomes: Alignment with National institutes of health and american medical association standards.

Harjot Uppal, Daniel Garcia, Isaac Soliman, Dylan Dupont, Nikhil Sahai, Andrew McGinniss, Arash Emami.

  Patient-reported outcome measures are essential tools for assessing surgical interventions, capturing patient perspectives on functionality, symptoms, and quality of life. However, ensuring that these measures are easily understandable is crucial for accurate patient responses. The National Institutes of Health and American Medical Association recommend that patient materials be written at or below a sixth-grade reading level. This study evaluated the readability of 45 commonly used patient-reported outcome measures in foot and ankle surgery to determine alignment with these guidelines. A readability analysis was conducted using the Flesch Reading Ease Score and the Simple Measure of Gobbledygook Index, with a threshold of a Flesch Reading Ease Score of at least 80 or a Simple Measure of Gobbledygook Index below 7 indicating a sixth-grade or lower reading level. The average readability scores indicated an eighth to ninth-grade reading level, with only 31% of patient-reported outcome measures meeting the readability threshold. Among the least readable measures were the American Orthopaedic Foot and Ankle Society Clinical Rating Scales for various foot and ankle regions and the Ankle Osteoarthritis Scale. These findings suggest that most foot and ankle surgery patient-reported outcome measures are above the recommended readability level, potentially hindering patient comprehension and response accuracy. Improving the readability of patient-reported outcome measures, either by developing new tools or modifying existing ones, may enhance the accessibility and reliability of patient-reported data. LEVEL OF CLINICAL EVIDENCE: 4.

Keywords:  Ankle; Foot; Functional outcome measures; Patient reported outcomes; Readability; Surgery

DOI:  https://doi.org/10.1053/j.jfas.2025.01.016
Clin Rheumatol. 2025 Feb 04.

The readability of online patient education materials for cutaneous autoimmune connective tissue diseases.

Sabrina Saeed, Jeff R Gehlhausen, Fotios Koumpouras, Sarika Ramachandran.



Keywords:  Cutaneous autoimmune disease; Cutaneous lupus; Dermatomyositis; Morphea; Patient education; Readability; Scleroderma; Vasculitis

DOI:  https://doi.org/10.1007/s10067-025-07353-8
Semin Oncol Nurs. 2025 Feb 04. pii: S0749-2081(25)00009-9. [Epub ahead of print] 151816

Analysis of YouTube Videos on Herbal Approaches Used in Coping with Cancer.

Ferda Akyuz Ozdemir, Dilek Yildirim.

   OBJECTIVE: The use of herbal approaches is very common among cancer patients. Patients obtain information about herbal products mostly from YouTube. However, toxicity and complications may develop as a result of unconscious use of herbal products. This study was conducted to evaluate the scope, validity, reliability and quality of English language videos on YouTube about herbal approaches to cope with cancer.
METHODS: The present descriptive study analyzed a total of 62 YouTube videos. All videos published on YouTube until 10 January 2024 were watched as a result of a search with English words 'herbal approaches for cancer treatment' and 'herbal approaches for medicine' . The 62 videos that met the inclusion criteria were assessed for reliability, quality, and content by 2 independent reviewers by using the Global Quality Score, DISCERN, JAMA scales and Herbal Approaches Checklist. The results indicated that the videos included in the study exhibited moderate quality.
RESULTS: Of the analyzed videos, 53.2% were found to be informative and 46.8% were found to be misleading. It was established that 59.7% (n=37) of the videos recommended the use of herbs that are known to be incompatible with chemotherapy.
CONCLUSIONS: It was concluded that the majority of the videos on YouTube about herbal approaches had low accuracy, low quality and insufficient information level. In addition, the use of many herbal products incompatible with cancer type and treatment was recommended. However, the level of knowledge of health professionals regarding herbal approaches should also be taken into consideration and it should be recommended that health professionals with expertise in this field inform patients.
IMPLICATIONS FOR NURSING PRACTICE: Nurses should educate patients about herbal approaches and guide them in evaluating the reliability of online sources. They should stay updated through continuous education on herbal products and collaborate with other healthcare professionals to prevent potential herb-drug interactions.

Keywords:  Cancer; YouTube; chemotherapy; herbal approaches

DOI:  https://doi.org/10.1016/j.soncn.2025.151816
Clin Breast Cancer. 2025 Jan 15. pii: S1526-8209(25)00013-8. [Epub ahead of print]

Health Information on Breast Cancer Surgery on YouTube®.

Louise Cousins, Lynn Darragh, Barry Kerr.

   INTRODUCTION: The quality, reliability and accuracy of health-related videos available online is controversial. Research has examined YouTube® in terms of reconstruction, breast screening, radiotherapy, postoperative arm exercises and mastectomy. The aim of this study is to assess YouTube® as a form of health information on breast cancer surgery/operation.
METHODS: YouTube® was searched using the terms ``breast cancer surgery'' and ``breast cancer operation.'' Video data was recorded including time since upload, video length, viewer engagement, content and upload source. The quality, accuracy and educational usefulness of videos were analyzed using 3 online quality assessment tools by a single clinician. Gender and ethnic representation of the patient was recorded.
RESULTS: About 48 videos were reviewed, 62% of videos uploaded from a Healthcare source and the most common video category was post-op complications/issues (25% of videos). Engagement was highest for videos uploaded from individuals which showed 54.36 "likes"/day and 6.9 comments/day. Healthcare sourced videos were higher quality. The mean DISCERN score for healthcare videos was 34.33 ± 11.44 compared with non-Healthcare scoring 26.33 ± 8.35 (P = .013). 100% of videos referenced females with breast cancer only. 71% showed only white/Caucasian patients/models.
DISCUSSION: In line with previous research, this study showed Healthcare sourced videos were of higher quality. Quality of life information hasn't been reported elsewhere however this study found it was limited and of poor quality. While high levels of misleading information have been reported elsewhere, this study highlighted only 4% as misleading/unsafe.
CONCLUSIONS: Healthcare professional interaction with YouTube® should be encouraged and empowered, in order to educate with the delivery of high-quality health information which is reliable and valid. Quality of life content should be considered by healthcare professionals. This study highlights a paucity of videos on male breast cancer, and suggests the need for more ethnically diverse patient representation on breast cancer surgery/operation on YouTube.

Keywords:  Breast operation; Breast surgery; Online patient information; Social media education; Video education

DOI:  https://doi.org/10.1016/j.clbc.2025.01.003
PLoS One. 2025 ;20(2): e0318568

YouTube as an information source in deep margin elevation: Reliability, accuracy and quality analysis.

Zeyneb Merve Ozdemir, Sevim Atılan Yavuz, Derya Gursel Surmelioglu.

The objective of this research was to assess the accuracy, quality, content, and demographics of videos on YouTube concerning deep margin elevation (DME). Initially, 100 videos for each of the three keywords were analyzed. The content categories of these videos were diverse, encompassing educational materials, teaching techniques, advertisements, and other types of content. The evaluation of the videos was carried out based on the Global Quality Scale (GQS), the Journal of the American Medical Association (JAMA) benchmark, and the modified-DISCERN questionnaire (m-DISCERN). Non-distributed data were analyzed using the Kruskal Wallis test and the Spearman correlation coefficient. The JAMA score was 1 for four videos, 2-3 for 38, and 4 for 14 videos; the GQS score was 1-2 for 18 videos, 3 for 11 videos, and 4-5 for 27 videos; and the m-DISCERN score was < 3 for 39 videos, 3 for four videos, and > 3 for 13 (for a total of 56 videos). Statistically significant differences were observed only for the JAMA scores when comparing the video source groups (p = 0.001). There were significant positive correlations between the GQS and m-DISCERN and m-DISCERN and JAMA scores (p < 0.001 and p = 0.049, respectively). The findings indicated that YouTube videos related to DME generally exhibited high-quality content but only moderate accuracy and poor reliability.

DOI: https://doi.org/10.1371/journal.pone.0318568
J Wrist Surg. 2025 Feb;14(1): 42-48

Quality and Reliability Analysis of YouTube as a Source of Patient Information on de Quervain's Tenosynovitis.

Jason H Kim, John F Hoy, Samuel L Shuman, Farhan Ahmad, Xavier C Simcock.

  Purpose This study seeks to evaluate the quality and reliability of information regarding de Quervain's tenosynovitis on YouTube. Materials and Methods A search on the YouTube was performed using the keywords de Quervain's tenosynovitis , and the first 50 videos were evaluated. Video characteristics including views, content type, and video upload source were recorded. Video reliability was assessed using the Journal of the American Medical Association ( JAMA ) benchmark criteria. Video quality was assessed using the Global Quality Score (GQS) and a novel de Quervain's Tenosynovitis-Specific Score (DQT-SS). Results The total number of views for all videos evaluated was 5,508,498 (mean, 110,169.96 ± 155,667.07). Video reliability and quality metrics were low, with a mean JAMA score of 2.17 ± 0.82 out of 4, a mean GQS of 2.49 ± 1.28 out of 5, and a mean DQT-SS of 4.53 ± 2.35 out of 11. Significant between-group effects were found for the video source and DQT-SS ( p = 0.027), as well as between content type and JAMA score ( p = 0.027), GQS ( p = 0.003), and DQT-SS ( p = 0.003). Positive independent predictors of DQT-SS included video duration in seconds (β = 0.391) and disease-specific information content type (β = 0.648). Conclusion Videos on YouTube regarding de Quervain's tenosynovitis were frequently viewed; however, the information present was of low quality and reliability. Physician-uploaded videos had the highest mean JAMA scores, GQS, and DQT-SS, but had the second-lowest mean number of views of video sources. Patients should receive proper in-office education and be directed toward reputable resources for their orthopaedic conditions.

Keywords:  YouTube; de Quervain's tenosynovitis; education; internet; quality

DOI:  https://doi.org/10.1055/s-0043-1777017
J Drugs Dermatol. 2025 Feb 01. 24(2): e7-e9

Evaluating Quality and Reliability of Most-Viewed TikTok Videos About Spironolactone.

Savanna I Vidal, Peter Baek, Nikita Menta, Emily Murphy, Adam Friedman.
PLoS One. 2025 ;20(2): e0316242

Quality of information in gestational diabetes mellitus videos on TikTok: Cross-sectional study.

Genyan Jiang, Lei Chen, Lan Geng, Yuhan Zhang, Zhiqi Chen, Yaqi Zhu, Shuangshuang Ma, Mei Zhao.

BACKGROUND: TikTok is an important channel for consumers to obtain and adopt health information. However, misinformation on TikTok could potentially impact public health. Currently, the quality of content related to GDM on TikTok has not been thoroughly reviewed.
OBJECTIVE: This study aims to explore the information quality of GDM videos on TikTok.
METHODS: A comprehensive cross-sectional study was conducted on TikTok videos related to GDM. The quality of the videos was assessed using three standardized evaluation tools: DISCERN, the Journal of the American Medical Association (JAMA) benchmarks, and the Global Quality Scale (GQS). The comprehensiveness of the content was evaluated through six questions covering definitions, signs/symptoms, risk factors, evaluation, management, and outcomes. Additionally, a correlational analysis was conducted between video quality and the characteristics of the uploaders and the videos themselves.
RESULTS: A total of 216 videos were included in the final analysis, with 162 uploaded by health professionals, 40 by general users, and the remaining videos contributed by individual science communicators, for-profit organizations, and news agencies. The average DISCERN, JAMA, and GQS scores for all videos were 48.87, 1.86, and 2.06, respectively. The videos uploaded by health professionals scored the highest in DISCERN, while the videos uploaded by individual science communicators scored significantly higher in JAMA and GQS than those from other sources. Correlation analysis between video quality and video features showed DISCERN scores, JAMA scores and GQS scores were positively correlated with video duration (P<0.001). Content scores were positively correlated with the number of comments (P<0.05), the number of shares (P<0.001), and video duration (P<0.001).
CONCLUSION: We found that the quality of GDM video on TikTok is poor and lack of relevant information, highlighting the potential risks of using TikTok as a source of health information. Patients should pay attention to identifying health-related information on TikTok.

DOI: https://doi.org/10.1371/journal.pone.0316242
J Plast Reconstr Aesthet Surg. 2025 Jan 29. pii: S1748-6815(25)00049-X. [Epub ahead of print]102 54-57

Social media and educational resources in masculinizing top surgery: The effect of age on patient preferences, subjective understanding and readability.

Oluwaseun D Adebagbo, John B Park, James E Fanning, Benjamin Rahmani, Micaela J Tobin, Mohammed A Yamin, Matthew Prospero, Sasha Nickman, Bernard T Lee, Ryan P Cauley.

   BACKGROUND: As Gender Affirming Top Surgery (GATS) has become more common, educational resources have increased. To ensure healthcare accessibility, a better understanding of preferred platforms and comprehension of these resources is crucial. This study aimed to: determine commonly used resources for GATS patients of varying ages and assess the difficulty of each resource.
METHODS: A public survey seeking perceptions on educational resource utilization related to GATS was administered to gender-diverse individuals. Responses with reported gender identity and age were grouped by age. Online resources were categorized into institutional websites, plastic surgery (PRS) journals, YouTube (YT), internet forums, and educational websites. Perceived difficulty and objective readability of sample text from each category was compared using validated scales. Univariate analyses were performed.
RESULTS: A total of 464 respondents were included, with over half aged 18 to 25, one-third aged 25 to 34%, and 13.8% aged 35 and older. The youngest cohort had lower education, were less likely to have undergone top surgery, and more often favored non-expert resources. When comparing readability, non-expert resources such as YT and internet forums had lower grade levels compared to institutional websites and PRS journals (all p-values ≤ 0.01).
CONCLUSION: Educational preferences and perceived difficulty of resources related to GATS differed significantly by patient age. Overall YouTube and internet forums were popular patient resources and rated as easier to understand by both objective and subjective measures. In order to improve the accessibility of high-quality healthcare information, improving the readability of expert-created resources is essential.

Keywords:  Gender affirmation surgery; Masculinizing top surgery; Online resources; Patient education; Readability; Social media

DOI:  https://doi.org/10.1016/j.bjps.2025.01.051
Sci Rep. 2025 Feb 04. 15(1): 4212

The impact of health information seeking and social influence on functional food purchase intention.

Yaoze Gong, Fareyha Said, Wajiha Haq, Jiankun Gong, Iffat Aksar.

  The COVID-19 pandemic has given rise to unprecedented transformation of consumer behaviors. Despite the abundance of research on this subject, less is known about why and how consumers processed health information and subsequently decided to purchase food during the pandemic. This study employed a survey questionnaire to collect the data. The sample size consisted of 590 consumers in China. The data were analyzed via SPSS and SmartPLS version 3.2.9 to explore the relationships among variables. The results showed that health information-seeking behavior has a significant impact on healthy food product purchasing intention. Similarly, health-related internet use also has a positive impact on health information seeking. Moreover, the impact of motivation for healthy eating on health information seeking is significant. The results indicate a significant moderating role of social influence (i.e., interaction between health information seeking and healthy food product purchasing intention). Multigroup analysis revealed differences between income and age in terms of health-related internet use and purchasing intention. This study assessed healthy food product purchasing intention in a timely manner by incorporating health communication. variables and social influence into consumer behavior research in the context of COVID-19. It thus expands the extant literature and provides insights into the knowledge and practices concerned.

Keywords:  COVID-19; Health information seeking; Healthy and functional food; Motivation; Purchasing intention; Social influence

DOI:  https://doi.org/10.1038/s41598-025-87343-7
BMC Med Educ. 2025 Feb 06. 25(1): 189

The process of obtaining information about COVID-19 among students of physiotherapy and rehabilitation department.

Duygu Ilgin Gunduz, Erhan Secer, Melda Baser Secer.

   BACKGROUND: This cross-sectional study was conducted to determine the process of obtaining information about COVID-19 infection among students of the Department of Physiotherapy and Rehabilitation (DPR) by examining the topic of information seeking, information source preference, and factors influencing these preferences.
METHODS: A total of 495/645 (76.74%) DPR students participated in the study. The data collection form prepared by the researchers was administered between May-June 2022 using face-to-face interview technique. Students' sociodemographic data (age, biological sex, body mass index) and the main topics they researched about COVID-19, information sources, and factors influencing their choice of sources were recorded.
RESULTS: Students often preferred to use internet social media (61.00%) and sources they considered reliable (81.40%) to access basic clinical information about COVID-19 (the routes of transmission = 30.30%, the main symptoms = 26.30%, number of cases = 22.60%). While biological sex (pbiologicalsex) and class level (pclasslevel) influenced the choice of sources (pbiologicalsex=0.011; pclasslevel:0.0001) and the factors determining this choice (pbiologicalsex=0.011-0.022; pclasslevel=0.0001-0.005), topic preferences were only influenced by class level (pbiologicalsex>0.05; pclasslevel = 0.0001-0.022).
CONCLUSION: DPR students should be supported with reliable and up-to-date social media-based digital content prepared by experts in the field about physiotherapy practice and with easy access to scientific data, even in the late stages of pandemic processes such as COVID-19, when the need for access to information is high due to their professional role.

Keywords:  Coronavirus; Information seeking behaviour; Physical therapy

DOI:  https://doi.org/10.1186/s12909-025-06764-0