bims-librar 2025-03-23 papers

bims-librar

Biomed News

on Biomedical librarianship

Issue of 2025–03–23
thirty-six papers selected by
Thomas Krichel, Open Library Society

Could libraries band together to ensure open access for all?
From Shared Horizons to Impactful Collaboration and Engagement: The Interagency Partnership of the National Endowment for the Humanities/National Library of Medicine, 2012-2024.
Use of large language models to identify pseudo-information: Implications for health information.
Improving Readability of Shoulder and Elbow Surgery Online Patient Education Material with Chat Generative Pretrained Transformer (ChatGPT) 4.
Reliability, Accuracy, and Comprehensibility of AI-Based Responses to Common Patient Questions Regarding Spinal Cord Stimulation.
Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models.
Artificial intelligence solutions for temporomandibular joint disorders: Contributions and future potential of ChatGPT.
Evaluation of the reliability, usefulness, quality and readability of ChatGPT's responses on Scoliosis.
Evaluation of artificial intelligence robot's knowledge and reliability on dental implants and peri-implant phenotype.
Artificial Intelligence for Patient Support: Assessing Retrieval-Augmented Generation for Answering Postoperative Rhinoplasty Questions.
Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.
An Assessment of ChatGPT's Responses to Common Patient Questions About Lung Cancer Surgery: A Preliminary Clinical Evaluation of Accuracy and Relevance.
Evaluating AI-based breastfeeding chatbots: quality, readability, and reliability analysis.
The digital dialogue on premature ejaculation: evaluating the efficacy of artificial intelligence-driven responses.
Readability of the Most Commonly Used Patient-Reported Outcome Measures in Hand Surgery.
Readability and complexity of written information presented to hospitalised patients for trial consent during the COVID-19 pandemic in the UK: a retrospective document analysis.
Exploring the Use of Social Media for Medical Problem Solving by Analyzing the Subreddit r/medical_advice: Quantitative Analysis.
Readability of Online Patient Education Materials for Cleft Care: A Systematic Review and Meta-Analysis.
A joint effort: Evaluating the quality and readability of online resources relating to total hip arthroplasty.
Analysis of readability of the top web searches for pediatric inborn errors of fatty acid metabolism.
Quality assessment of temporomandibular disorders-related information on Chinese social media: A cross-sectional study.
Analysis of Youtube Videos for Artificial Airway Suctioning Training of Nurses: A Content Analysis.
Educational Quality of YouTube™ Videos on Laparoscopic Radical Prostatectomy.
The Usefulness of YouTube Videos Related to Endoscopic Sinus Surgery for Surgical Residents.
Content, Quality, and Reliability of Endometriosis Videos on YouTube.
YouTube as a Source for Arabic-Speaking Parent Education on the Oral Hygiene of Children: A Social Media Content Analysis.
Analysis of Plantar Fasciitis Videos on YouTube: Quality and Reliability Assessment.
YouTube and UTIs: What Is Online Video Content Teaching Our Patients?
The origin of YouTube videos on hereditary angioedema matters.
Evaluating YouTube as a source of information on hemifacial spasm.
Evaluating the quality of TikTok videos on coronary artery disease using various scales to examine correlations with video characteristics and high-quality content.
TikTok as a Source of Information on Retinol: A Cross-Sectional Analysis.
Investigation of Nursing Students' Online Information Searching Strategies and Attitudes Towards Informatics Ethical Values.
Correction to "Social Media Use as a Source of Information by Acne Vulgaris Patients".
Perceptions about the use of virtual assistants for seeking health information among caregivers of young childhood cancer survivors.
Trust in Health Information Sources Among Patients with Systemic Lupus Erythematosus in the Social Networking Era: The TRUMP2-SLE Study.

Nature. 2025 Mar 19.

Could libraries band together to ensure open access for all?

Dalmeet Singh Chawla.



Keywords:  Institutions; Publishing; Research data; Research management

DOI:  https://doi.org/10.1038/d41586-025-00710-2
Interag J. 2024 ;14(2): 17-31

From Shared Horizons to Impactful Collaboration and Engagement: The Interagency Partnership of the National Endowment for the Humanities/National Library of Medicine, 2012-2024.

Frank Vitale, Jeffrey S Reznick.

In 2012, leaders in the National Endowment for the Humanities (NEH) and the National Library of Medicine (NLM) established an interagency partnership to collaborate on research, education, and career initiatives located at the intersection of biomedical and humanities research. Shortly thereafter, the agencies joined with the Maryland Institute for Technology in the Humanities and Research Councils UK (now known as UK Research and Innovation) to convene the symposium Shared Horizons: Data, Biomedicine, and the Digital Humanities. Researchers Erez Aiden and Jean-Baptiste Michel praised the symposium for "betray[ing] an astonishing optimism: the idea that historians and philosophers and artists and doctors and biologists, thinking about data together, can advance their individual causes better than any of them can alone." Aiden and Michael continued "The conference title…was dead-on. At the interface of all our work lies the most exciting terrain in our intellectual future" (206-7). Ten years on, the NEH-NLM interagency partnership has catalyzed and facilitated joint leadership yielding multiple collaborations, engagements, public programs, and open access publications involving dozens of individuals and touching thousands more. At every turn these initiatives have advanced the complementary missions of the NEH and the NLM, including their commitment to open access publishing, as defined by UNESCO to be "the provision of free access to peer reviewed, scholarly and research information to all,… requir[ing] that the rights holder grants worldwide irrevocable right of access to copy, use, distribute, transmit, and make derivative works in any format for any lawful activities with proper attribution to the original author." This article takes stock of the NEH-NLM interagency partnership, conveying its impact on and relevance to the public service of both agencies. As discussed, the NEH-NLM partnership advances a "whole of society approach" toward improving individual and public health writ large: not only in terms of connecting lab, clinic, and community, but also more broadly in terms of supporting the dissemination of trusted health information and sharing knowledge about the human condition across time and place and as studied by a variety of disciplines ranging from the sciences to the social sciences to the humanities. The partnership also advances a "whole of government" approach toward making government more efficient, transparent, accessible, and impactful through outcomes that are not possible when working in isolation. Examining a decade-plus history demonstrating leadership, management, and mutual support among public sector colleagues, this article points to fundamental lessons learned to help achieve "whole of government" activities in other contexts for the greater good.
Health Info Libr J. 2025 Mar 20.

Use of large language models to identify pseudo-information: Implications for health information.

Boris Schmitz.

   BACKGROUND: Open-access scientific research is an essential source of health-related information and self-education. Artificial intelligence-based large language models (LMMs) may be used to identify erroneous health information.
OBJECTIVE: To investigate to what extent LMMs can be used to identify pseudo-information.
METHODS: Four common LMM applications (ChatGPT-4o, Claude 3.5 Sonnet, Gemini and Copilot) were used to investigate their capability to indicate erroneous information provided in an open-access article.
RESULTS: Initially, ChatGPT-4o and Claude were able to mark the provided article as an unreliable information source, identifying most of the inaccuracy problems. The assessments provided by Gemini and Copilot were inaccurate, as several critical aspects were not identified or were misinterpreted. During the validation phase, the initially accurate assessment of ChatGPT-4o was not reproducible, and only Claude was able to detect several critical issues in this phase. The verdicts of Copilot and Gemini remained largely unaltered.
DISCUSSION: Large heterogeneity exists between LMMs in identifying inaccurate pseudo-information. Replication in LMM output may constitute a significant hurdle in their application.
CONCLUSION: The accuracy of LMMs needs to be further improved until they can be reliably used by patients for health-related online information and as assistant tools for health information and library services workers without restriction.

Keywords:  Internet; artificial intelligence (AI); consumer health information; disinformation; patient information

DOI:  https://doi.org/10.1111/hir.12569
J Shoulder Elbow Surg. 2025 Mar 19. pii: S1058-2746(25)00244-7. [Epub ahead of print]

Improving Readability of Shoulder and Elbow Surgery Online Patient Education Material with Chat Generative Pretrained Transformer (ChatGPT) 4.

Krishna Chandra, Umar Ghilzai, Jad Lawand, Abdullah Ghali, Benjamin Fiedler, Adil S Ahmed.

   BACKGROUND: Health literacy is crucial for effective doctor-patient communication, particularly for surgical patients who need to comprehend complex procedures and care protocols. The American Medical Association and National Institutes of Health suggest patient education materials be at a sixth to eighth-grade reading level. Despite this, many online materials for orthopedic surgeries, including shoulder and elbow procedures are written above this level. ChatGPT-4, an AI language model, may help simplify these materials and improve readability given poor health literacy in patient populations.
METHODS: Thirty excerpts of patient-facing information on shoulder and elbow surgeries were selected from academic and professional medical sources a variety of shoulder and elbow orthopedic surgical topics. Original readability was assessed using the SMOG (Simple Measure of Gobbledygook) Index. ChatGPT then analyzed readability and simplified the text to a sixth- to eighth-grade level. To simplify the text while maintaining medical accuracy, the following prompt was used: "Rewrite this text at a 6th to 8th grade level without losing information." ChatGPT achieved this by defining medical terminology, using common language equivalents, and restructuring information for easier readability. Simplified text was re-evaluated for readability by both SMOG Index and ChatGPT and accuracy by study authors.
RESULTS: Original excerpts had an average SMOG readability score of 10.1, requiring about a tenth grade reading level. ChatGPT's initial analysis averaged slightly higher at 10.3 (p<0.001). Following simplification, the both the SMOG readability score and ChatGPT significantly dropped to 8.3, and 7.7, respectively, aligning closer to recommended readability levels (p<0.001). ChatGPT also provided targeted feedback on areas for readability improvement.
CONCLUSIONS: ChatGPT-4 demonstrated utility in analyzing and simplifying shoulder and elbow surgery patient education materials, lowering readability to near-recommended levels. By providing specific suggestions for simplification, ChatGPT streamlined the revision process, enhancing potential patient understanding and engagement. However, human review remains necessary to ensure clinical accuracy in AI-simplified materials.

Keywords:  ChatGPT; Health literacy; OPEM; SMOG; artificial intelligence; elbow surgery; orthopedic surgery; patient education; readability; shoulder surgery

DOI:  https://doi.org/10.1016/j.jse.2025.02.025
J Clin Med. 2025 Feb 21. pii: 1453. [Epub ahead of print]14(5):

Reliability, Accuracy, and Comprehensibility of AI-Based Responses to Common Patient Questions Regarding Spinal Cord Stimulation.

Giuliano Lo Bianco, Marco Cascella, Sean Li, Miles Day, Leonardo Kapural, Christopher L Robinson, Emanuele Sinagra.

  Background: Although spinal cord stimulation (SCS) is an effective treatment for managing chronic pain, many patients have understandable questions and concerns regarding this therapy. Artificial intelligence (AI) has shown promise in delivering patient education in healthcare. This study evaluates the reliability, accuracy, and comprehensibility of ChatGPT's responses to common patient inquiries about SCS. Methods: Thirteen commonly asked questions regarding SCS were selected based on the authors' clinical experience managing chronic pain patients and a targeted review of patient education materials and relevant medical literature. The questions were prioritized based on their frequency in patient consultations, relevance to decision-making about SCS, and the complexity of the information typically required to comprehensively address the questions. These questions spanned three domains: pre-procedural, intra-procedural, and post-procedural concerns. Responses were generated using GPT-4.0 with the prompt "If you were a physician, how would you answer a patient asking…". Responses were independently assessed by 10 pain physicians and two non-healthcare professionals using a Likert scale for reliability (1-6 points), accuracy (1-3 points), and comprehensibility (1-3 points). Results: ChatGPT's responses demonstrated strong reliability (5.1 ± 0.7) and comprehensibility (2.8 ± 0.2), with 92% and 98% of responses, respectively, meeting or exceeding our predefined thresholds. Accuracy was 2.7 ± 0.3, with 95% of responses rated sufficiently accurate. General queries, such as "What is spinal cord stimulation?" and "What are the risks and benefits?", received higher scores compared to technical questions like "What are the different types of waveforms used in SCS?". Conclusions: ChatGPT can be implemented as a supplementary tool for patient education, particularly in addressing general and procedural queries about SCS. However, the AI's performance was less robust in addressing highly technical or nuanced questions.

Keywords:  ChatGPT; artificial intelligence; chronic pain management; healthcare communication; neuromodulation; patient education; spinal cord stimulation

DOI:  https://doi.org/10.3390/jcm14051453
Int J Med Inform. 2025 Mar 13. pii: S1386-5056(25)00088-7. [Epub ahead of print]198 105871

Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models.

Mi Zhou, Yun Pan, Yuye Zhang, Xiaomei Song, Youbin Zhou.

   BACKGROUND: Access to patient-centered health information is essential for informed decision-making. However, online medical resources vary in quality and often fail to accommodate differing degrees of health literacy. This issue is particularly evident in surgical contexts, where complex terminology obstructs patient comprehension. With the increasing reliance on AI models for supplementary medical information, the reliability and readability of AI-generated content require thorough evaluation.
OBJECTIVE: This study aimed to evaluate four natural language processing models-ChatGPT-4o, ChatGPT-o3 mini, DeepSeek-V3, and DeepSeek-R1-in generating patient education materials for three common spinal surgeries: lumbar discectomy, spinal fusion, and decompressive laminectomy. Information quality was evaluated using the DISCERN score, and readability was assessed through Flesch-Kincaid indices.
RESULTS: DeepSeek-R1 produced the most readable responses, with Flesch-Kincaid Grade Level (FKGL) scores ranging from 7.2 to 9.0, succeeded by ChatGPT-4o. In contrast, ChatGPT-o3 exhibited the lowest readability (FKGL > 10.4). The DISCERN scores for all AI models were below 60, classifying the information quality as "fair," primarily due to insufficient cited references.
CONCLUSION: All models achieved merely a "fair" quality rating, underscoring the necessity for improvements in citation practices, and personalization. Nonetheless, DeepSeek-R1 and ChatGPT-4o generated more readable surgical information than ChatGPT-o3. Given that enhanced readability can improve patient engagement, reduce anxiety, and contribute to better surgical outcomes, these two models should be prioritized for assisting patients in the clinical.
LIMITATION & FUTURE DIRECTION: This study is limited by the rapid evolution of AI models, its exclusive focus on spinal surgery education, and the absence of real-world patient feedback, which may affect the generalizability and long-term applicability of the findings. Future research ought to explore interactive, multimodal approaches and incorporate patient feedback to ensure that AI-generated health information is accurate, accessible, and facilitates informed healthcare decisions.

Keywords:  AI-generated health information; Patient health literacy; Readability; Spinal surgery education

DOI:  https://doi.org/10.1016/j.ijmedinf.2025.105871
Korean J Orthod. 2025 Mar 25. 55(2): 131-141

Artificial intelligence solutions for temporomandibular joint disorders: Contributions and future potential of ChatGPT.

Betul Kula, Ahmet Kula, Fatih Bagcier, Bulent Alyanak.

   Objective: This study aimed to evaluate the reliability and usefulness of information generated by Chat Generative Pre-Trained Transformer (ChatGPT) on temporomandibular joint disorders (TMD).
Methods: We asked ChatGPT about the diseases specified in the TMD classification and scored the responses using Likert reliability and usefulness scales, the modified DISCERN (mDISCERN) scale, and the Global Quality Scale (GQS).
Results: The highest Likert scores for both reliability and usefulness were for masticatory muscle disorders (mean ± standard deviation [SD]: 6.0 ± 0), and the lowest scores were for inflammatory disorders of the temporomandibular joint (mean ± SD: 4.3 ± 0.6 for reliability, 4.0 ± 0 for usefulness). The median Likert reliability score indicates that the responses are highly reliable. The median Likert usefulness score was 5 (4-6), indicating that the responses were moderately useful. A comparative analysis was performed, and no statistically significant differences were found in any subject for either reliability or usefulness (P = 0.083-1.000). The median mDISCERN score was 4 (3-5) for the two raters. A statistically significant difference was observed in the mean mDISCERN scores between the two raters (P = 0.046). The GQS scores indicated a moderate to high quality (mean ± SD: 3.8 ± 0.8 for rater 1, 4.0 ± 0.5 for rater 2). No statistically significant correlation was found between mDISCERN and GQS scores (r = -0.006, P = 0.980).
Conclusions: Although ChatGPT-4 has significant potential, it can be used as an additional source of information regarding TMD for patients and clinicians.

Keywords:  Artificial intelligence; ChatGPT; Temporomandibular joint disorders

DOI:  https://doi.org/10.4041/kjod24.106
Eur J Orthop Surg Traumatol. 2025 Mar 18. 35(1): 123

Evaluation of the reliability, usefulness, quality and readability of ChatGPT's responses on Scoliosis.

Ayşe Merve Çıracıoğlu, Suheyla Dal Erdoğan.

   OBJECTIVE: This study evaluates the reliability, usefulness, quality, and readability of ChatGPT's responses to frequently asked questions about scoliosis.
METHODS: Sixteen frequently asked questions, identified through an analysis of Google Trends data and clinical feedback, were presented to ChatGPT for evaluation. Two independent experts assessed the responses using a 7-point Likert scale for reliability and usefulness. Additionally, the overall quality was also rated using the Global Quality Scale (GQS). To assess readability, various established metrics were employed, including the Flesch Reading Ease score (FRE), the Simple Measure of Gobbledygook (SMOG) Index, the Coleman-Liau Index (CLI), the Gunning Fog Index (GFI), the Flesch-Kinkaid Grade Level (FKGL), the FORCAST Grade Level, and the Automated Readability Index (ARI).
RESULTS: The mean reliability scores were 4.68 ± 0.73 (Median: 5, IQR 4-5), while the mean usefulness scores were 4.84 ± 0.84 (Median: 5, IQR 4-5). Additionally the mean GQS scores were 4.28 ± 0.58 (Median: 4, IQR 4-5). Inter-rater reliability analysis using the Intraclass correlation coefficient showed excellent agreement: 0.942 for reliability, 0.935 for usefulness, and 0.868 for GQS. While general informational questions received high scores, responses to treatment-specific and personalized inquiries required greater depth and comprehensiveness. Readability analysis indicated that ChatGPT's responses required at least a high school senior to college-level reading ability.
CONCLUSION: ChatGPT provides reliable, useful, and moderate quality information on scoliosis but has limitations in addressing treatment-specific and personalized inquiries. Caution is essential when using Artificial Intelligence (AI) in patient education and medical decision-making.

Keywords:  Artificial intelligence; ChatGPT; Idiopathic scoliosis

DOI:  https://doi.org/10.1007/s00590-025-04198-4
Sci Rep. 2025 Mar 19. 15(1): 9519

Evaluation of artificial intelligence robot's knowledge and reliability on dental implants and peri-implant phenotype.

Mithat Terzi, Mustafa Cihan Yavuz, Tayyip Bicer, S Kutalmış Buyuk.

  The aim of this study was to evaluate the reliability and quality of information generated by ChatGPT regarding dental implants and peri-implant phenotypes. A structured questionnaire on these topics was presented to the AI-based chatbot, and its responses were assessed by dental professionals using a modified Global Quality Scale (GQS) and the DISCERN tool. The study included 60 participants divided into three professional groups: oral and maxillofacial surgeons, periodontologists, and general dental practitioners. While no statistically significant differences were observed among the groups (p > 0.05), oral and maxillofacial surgeons consistently assigned lower DISCERN and GQS scores compared to other professionals. The findings of this study suggest that ChatGPT has the potential to serve as a supplementary tool for patient information in dental implant procedures. However, its responses may lack the depth and specificity required for clinical decision-making. Dental professionals should exercise caution when relying on artificial intelligence (AI) -generated content and guide patients in interpreting such information. Future research should explore the variability of AI responses, assess multiple chatbot platforms, and investigate their integration into dental clinical practice.

Keywords:  Artificial intelligence; Dental implant; Patient information; Peri-Implant

DOI:  https://doi.org/10.1038/s41598-025-94576-z
Aesthet Surg J. 2025 Mar 15. pii: sjaf038. [Epub ahead of print]

Artificial Intelligence for Patient Support: Assessing Retrieval-Augmented Generation for Answering Postoperative Rhinoplasty Questions.

Ariana Genovese, Sahar Borna, Cesar A Gomez-Cabello, Syed Ali Haider, Srinivasagam Prabha, Maissa Trabilsy, Cui Tao, Keith T Aziz, Peter M Murray, Antonio Jorge Forte.

BACKGROUND: While artificial intelligence (AI) is revolutionizing healthcare, inaccurate or incomplete information from pre-trained large language models (LLMs) like ChatGPT poses significant risks to patient safety. Retrieval-Augmented Generation (RAG) offers a promising solution by leveraging curated knowledge bases to enhance accuracy and reliability, especially in high-demand specialties like plastic surgery.
OBJECTIVES: This study evaluates the performance of RAG-enabled AI models in addressing postoperative rhinoplasty questions, aiming to assess their safety and identify necessary improvements for effective implementation into clinical care.
METHODS: Four RAG models (Gemini-1.0-Pro-002, Gemini-1.5-Flash-001, Gemini-1.5-Pro-001, and PaLM 2) were tested on 30 common patient inquiries. Responses, sourced from authoritative rhinoplasty texts, were evaluated for accuracy (1-5 scale), comprehensiveness (1-3 scale), readability (Flesch Reading Ease, Flesch-Kincaid Grade Level), and understandability/actionability (Patient Education Materials Assessment Tool). Statistical analyses included Wilcoxon rank sum, Armitage trend tests, and pairwise comparisons.
RESULTS: When responses were generated, they were generally accurate (41.7% completely accurate); however, a 30.8% nonresponse rate revealed potential challenges with query context interpretation and retrieval. Gemini-1.0-Pro-002 demonstrated superior comprehensiveness (p < 0.001), but readability (FRE: 40-49) and understandability (mean: 0.7) fell below patient education standards. PaLM 2 scored lowest in actionability (p < 0.007).
CONCLUSIONS: This first application of RAG to postoperative rhinoplasty patient care highlights its strengths in accuracy alongside its limitations, including nonresponse and contextual understanding. Addressing these challenges will enable safer, more effective implementation of RAG models across diverse surgical and medical contexts, with the potential to revolutionize patient care by reducing physician workload while enhancing patient engagement.

DOI: https://doi.org/10.1093/asj/sjaf038
Medicine (Baltimore). 2025 Mar 14. 104(11): e41780

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.

Erkan Ozduran, Ibrahim Akkoc, Sibel Büyükçoban, Yüksel Erkin, Volkan Hanci.

It is clear that artificial intelligence-based chatbots will be popular applications in the field of healthcare in the near future. It is known that more than 30% of the world's population suffers from chronic pain and individuals try to access the health information they need through online platforms before applying to the hospital. This study aimed to examine the readability, reliability and quality of the responses given by 3 different artificial intelligence chatbots (ChatGPT, Gemini and Perplexity) to frequently asked questions about pain. In this study, the 25 most frequently used keywords related to pain were determined using Google Trend and asked to every 3 artificial intelligence chatbots. The readability of the response texts was determined by Flesch Reading Ease Score (FRES), Simple Measure of Gobbledygook, Gunning Fog and Flesch-Kincaid Grade Level readability scoring. Reliability assessment was determined by the Journal of American Medical Association (JAMA), DISCERN scales. Global Quality Score and Ensuring Quality Information for Patients (EQIP) score were used in quality assessment. As a result of Google Trend search, the first 3 keywords were determined as "back pain," "stomach pain," and "chest pain." The readability of the answers given by all 3 artificial intelligence applications was determined to be higher than the recommended 6th grade readability level (P < .001). In the readability evaluation, the order from easy to difficult was determined as Google Gemini, ChatGPT and Perplexity. Higher GQS scores (P = .008) were detected in Gemini compared to other chatbots. Perplexity had higher JAMA, DISCERN and EQIP scores compared to other chatbots, respectively (P < .001, P < .001, P < .05). It has been determined that the answers given by ChatGPT, Gemini, and Perplexity to pain-related questions are difficult to read and their reliability and quality are low. It can be stated that these artificial intelligence chatbots cannot replace a comprehensive medical consultation. In artificial intelligence applications, it may be recommended to facilitate the readability of text content, create texts containing reliable references, and control them by a supervisory expert team.

DOI: https://doi.org/10.1097/MD.0000000000041780
J Clin Med. 2025 Mar 01. pii: 1676. [Epub ahead of print]14(5):

An Assessment of ChatGPT's Responses to Common Patient Questions About Lung Cancer Surgery: A Preliminary Clinical Evaluation of Accuracy and Relevance.

Marina Troian, Stefano Lovadina, Alice Ravasin, Alessia Arbore, Aneta Aleksova, Elisa Baratella, Maurizio Cortale.

  Background: Chatbots based on artificial intelligence (AI) and machine learning are rapidly growing in popularity. Patients may use these technologies to ask questions regarding surgical interventions, preoperative assessments, and postoperative outcomes. The aim of this study was to determine whether ChatGPT could appropriately answer some of the most frequently asked questions posed by patients about lung cancer surgery. Methods: Sixteen frequently asked questions about lung cancer surgery were asked to the chatbot in one conversation, without follow-up questions or repetition of the same questions. Each answer was evaluated for appropriateness and accuracy using an evidence-based approach by a panel of specialists with relevant clinical experience. The responses were assessed using a four-point Likert scale (i.e., "strongly agree, satisfactory", "agree, requires minimal clarification", "disagree, requires moderate clarification", and "strongly disagree, requires substantial clarification"). Results: All answers provided by the chatbot were judged to be satisfactory, evidence-based, and generally unbiased overall, seldomly requiring minimal clarification. Moreover, information was delivered in a language deemed easy-to-read and comprehensible to most patients. Conclusions: ChatGPT could effectively provide evidence-based answers to the most commonly asked questions about lung cancer surgery. The chatbot presented information in a language considered understandable by most patients. Therefore, this resource may be a valuable adjunctive tool for preoperative patient education.

Keywords:  ChatGPT; artificial intelligence (AI); lung cancer; outpatient setting; patient education; thoracic surgery

DOI:  https://doi.org/10.3390/jcm14051676
PLoS One. 2025 ;20(3): e0319782

Evaluating AI-based breastfeeding chatbots: quality, readability, and reliability analysis.

Emine Ozdemir Kacer.

BACKGROUND: In recent years, expectant and breastfeeding mothers commonly use various breastfeeding-related social media applications and websites to seek breastfeeding-related information. At the same time, AI-based chatbots-such as ChatGPT, Gemini, and Copilot-have become increasingly prevalent on these platforms (or on dedicated websites), providing automated, user-oriented breastfeeding guidance.
AIM: The goal of our study is to understand the relative performance of three AI-based chatbots: ChatGPT, Gemini, and Copilot, by evaluating the quality, reliability, readability, and similarity of the breastfeeding information they provide.
METHODS: Two researchers evaluated the information provided by three different AI-based breastfeeding chatbots: ChatGPT version 3.5, Gemini, and Copilot. A total of 50 frequently asked questions about breastfeeding were identified and used in the study, divided into two categories (Baby-Centered Questions and Mother-Centered Questions), and evaluated using five scoring criteria, including the Quality Information Provision for Patients (EQIP) scale, the Simple Measure of Gobbledygook (SMOG) scale, the Similarity Index (SI), the Modified Dependability Scoring System (mDISCERN), and the Global Quality Scale (GQS).
RESULTS: The evaluation of AI chatbots' answers showed statistically significant differences across all criteria (p < 0.05). Copilot scored highest on the EQIP, SMOG, and SI scales, while Gemini excelled in mDISCERN and GQS evaluations. No significant difference was found between Copilot and Gemini for mDISCERN and GQS scores. All three chatbots demonstrated high reliability and quality, though their readability required university-level education. Notably, ChatGPT displayed high originality, while Copilot exhibited the greatest similarity in responses.
CONCLUSION: AI chatbots provide reliable answers to breastfeeding questions, but the information can be hard to understand. While more reliable than other online sources, their accuracy and usability are still in question. Further research is necessary to facilitate the integration of advanced AI in healthcare.

DOI: https://doi.org/10.1371/journal.pone.0319782
Int Urol Nephrol. 2025 Mar 20.

The digital dialogue on premature ejaculation: evaluating the efficacy of artificial intelligence-driven responses.

Hakan Anıl, Mehmet Vehbi Kayra.

   PURPOSE: This study investigated the quality and comprehensibility of responses generated by three prominent artificial intelligence-powered chatbots (ChatGPT, Gemini, and Llama) when queried about premature ejaculation (PME).
METHODS: A set of 25 frequently asked questions (FAQs) were identified on the basis of Google Trends and Semrush platforms. Each chatbot was prompted with these questions and their responses were analyzed via a comprehensive set of metrics. Readability was assessed via the Flesch Reading Ease (FRES) and Flesch-Kincaid Grade Level (FKGL) scores. Quality and reliability were evaluated via the modified DISCERN (mDISCERN) and Ensuring Quality Information for Patients (EQIP) scores, which assess the clarity, comprehensiveness, and trustworthiness of health information.
RESULTS: Readability scores, as assessed by FRES and FKGL, did not significantly differ across the three chatbots. In terms of quality, the mean EQIP scores were significantly different between the models, with Llama (72.2 ± 1.1) achieving the highest scores, followed by Gemini (67.6 ± 4.5) and ChatGPT (63.1 ± 4.9) (P < 0.001). The median (interquartile range) mDISCERN scores were 2 (1) for ChatGPT, 3 (0) for Gemini, and 3 (1) for Llama (P < 0.001), indicating a significant difference in the quality of information provided by the different models.
CONCLUSION: The three chatbots demonstrated statistically similar results in terms of readability. Llama achieved the highest EQIP score among them. Additionally, both Llama and Gemini outperformed ChatGPT in terms of mDISCERN scores.

Keywords:  Artificial intelligence; Chatbots; Premature ejaculation; Sexual health

DOI:  https://doi.org/10.1007/s11255-025-04461-x
J Hand Surg Am. 2025 Mar 19. pii: S0363-5023(25)00081-4. [Epub ahead of print]

Readability of the Most Commonly Used Patient-Reported Outcome Measures in Hand Surgery.

Harjot Uppal, Daniel Garcia, George Abdelmalek, Joseph Farshchian, Nikhil Sahai, Arash Emami, Andrew McGinniss.

   PURPOSE: Patient-reported outcome measures (PROMs) assess surgical outcomes and patient perspectives on function, symptoms, and quality of life. The readability of patient-reported outcome measures is crucial for ensuring patients can understand and accurately complete them. The National Institutes of Health and American Medical Association recommend that patient materials be written at or below a sixth-grade reading level. We aimed to evaluate whether PROMs identified in the hand literature meet these recommended reading standards.
METHODS: We conducted a readability analysis of 22 PROMs referenced in the hand literature. Readability was assessed using the Flesch Reading Ease Score (FRES) and the Simple Measure of Gobbledygook (SMOG) Index. Scores were obtained using an online readability calculator. Patient-reported outcome measures meeting a FRES ≥ 80 or SMOG ˂ 7 were considered at a sixth-grade reading level or lower, per the National Institutes of Health and American Medical Association guidelines.
RESULTS: Across all PROMs, the average FRES was 66 ± 12, and the average SMOG Index was 8 ± 1, corresponding to approximately an eighth- to ninth-grade reading level. Three PROMs met the target readability thresholds: Patient-Reported Outcome Measurement Information System-Physical Function Upper Extremity, Patient Evaluation Measure, and the 6-item Carpal Tunnel Syndrome Symptom Scale. Several PROMs, including the Southampton Dupuytren's Scoring Scheme, Hand Assessment Tool, and Manual Ability Measure 16, demonstrated relatively low readability scores.
CONCLUSIONS: Most PROMs mentioned in the hand literature exceeded the recommended sixth-grade reading level, potentially affecting patient comprehension and data accuracy. Although improving readability may enhance patient understanding, altering PROM wording is not straightforward and may require extensive revalidation because changes risk affecting validity and reliability, underscoring the complexity of revising PROMs.
CLINICAL RELEVANCE: These findings highlight the importance of raising awareness about PROM readability issues. Recognizing these readability challenges may encourage researchers, developers, and journal editors to consider recommended guidelines when proposing, modifying, or evaluating these measures.

Keywords:  Hand; patient advocacy; patient-centered care; patient-reported outcome measures; readability

DOI:  https://doi.org/10.1016/j.jhsa.2025.02.011
BMJ Open. 2025 Mar 21. 15(3): e089447

Readability and complexity of written information presented to hospitalised patients for trial consent during the COVID-19 pandemic in the UK: a retrospective document analysis.

Ewan Gourlay, Tim Felton, Mona Bafadhel, Christopher E Brightling, Jane C Davies, Rachael A Evans, Ling Pei Ho, Stefan J Marciniak, Nick A Maskell, Joanna Porter, Elizabeth Sapey, Salman Siddiqui, Samantha Walker, Tom Wilkinson, Alex Robert Horsley.

   OBJECTIVES: Patient information sheets (PISs) and informed consent forms (ICFs) are essential tools to communicate and document informed consent for clinical trial participation. These documents need to be easily understandable, especially when used to take informed consent from acutely unwell patients. Health literacy guidance recommends written information should be at a level between reading ages 9-11. We aimed to assess the readability and complexity of PISs/ICFs used for clinical trials of acute therapies during the COVID-19 pandemic.
DESIGN: Retrospective document analysis.
SETTING: PISs/ICFs used in trials involving pharmaceutical interventions recruiting hospitalised patients with COVID-19 during the first year of the pandemic were sourced from hospitals across the UK.
PRIMARY AND SECONDARY OUTCOME MEASURES: PISs/ICFs were assessed for length, approximate reading time and subsection content. Readability and language complexity were assessed using Flesch-Kincaid Grade Level (FKGL) (range 1-18; higher is more complex), Gunning-Fog (GFOG) (range 1-20; higher is more complex) and Flesch Reading Ease Score (FRES) (range 0-100; below 60 is 'difficult' for comprehension).
RESULTS: 13 documents were analysed with a median length of 5139 words (range 1559-7026), equating to a median reading time of 21.4 min (range 6.5-29.3 min) at 240 words per minute. Median FKGL was 9.8 (9.1-10.8), GFOG 11.8 (10.4-13) and FRES was 54.6 (47.0-58.3). All documents were classified as 'difficult' for comprehension and had a reading age of 14 years old or higher.
CONCLUSIONS: All PISs/ICFs analysed contained literary complexity beyond both recommendations and the reading level of many in the UK population. Researchers should seek to improve communications to improve trial volunteer comprehension and recruitment.

Keywords:  COVID-19; Clinical Trial; Lung Diseases; MEDICAL ETHICS

DOI:  https://doi.org/10.1136/bmjopen-2024-089447
JMIR Infodemiology. 2025 Mar 20. 5 e56116

Exploring the Use of Social Media for Medical Problem Solving by Analyzing the Subreddit r/medical_advice: Quantitative Analysis.

Xiyu Zhao, Victor Yang, Arjun Menta, Jacob Blum, Padmini Ranasinghe.

   BACKGROUND: The advent of the internet has transformed the landscape of health information acquisition and sharing. Reddit has become a hub for such activities, such as the subreddit r/medical_advice, affecting patients' knowledge and decision-making. While the popularity of these platforms is recognized, research into the interactions and content within these communities remains sparse. Understanding the dynamics of these platforms is crucial for improving online health information quality.
OBJECTIVE: This study aims to quantitatively analyze the subreddit r/medical_advice to characterize the medical questions posed and the demographics of individuals providing answers. Insights into the subreddit's user engagement, information-seeking behavior, and the quality of shared information will contribute to the existing body of literature on health information seeking in the digital era.
METHODS: A cross-sectional study was conducted, examining all posts and top comments from r/medical_advice since its creation on October 1, 2011. Data were collected on March 2, 2023, from pushhift.io, and the analysis included post and author flairs, scores, and engagement metrics. Statistical analyses were performed using RStudio and GraphPad Prism 9.0.
RESULTS: From October 2011 to March 2023, a total of 201,680 posts and 721,882 comments were analyzed. After excluding autogenerated posts and comments, 194,678 posts and 528,383 comments remained for analysis. A total of 41% (77,529/194,678) of posts had no user flairs, while only 0.1% (108/194,678) of posts were made by verified medical professionals. The average engagement per post was a score of 2 (SD 7.03) and 3.32 (SD 4.89) comments. In period 2, urgent questions and those with level-10 pain reported higher engagement, with significant differences in scores and comments based on flair type (P<.001). Period 3 saw the highest engagement in posts related to pregnancy and the lowest in posts about bones, joints, or ligaments. Media inclusion significantly increased engagement, with video posts receiving the highest interaction (P<.001).
CONCLUSIONS: The study reveals a significant engagement with r/medical_advice, with user interactions influenced by the type of query and the inclusion of visual media. High engagement with posts about pregnancy and urgent medical queries reflects a focused public interest and the subreddit's role as a preliminary health information resource. The predominance of nonverified medical professionals providing information highlights a shift toward community-based knowledge exchange, though it raises questions about the reliability of the information. Future research should explore cross-platform behaviors and the impact of misinformation on public health. Effective moderation and the involvement of verified medical professionals are recommended to enhance the subreddit's role as a reliable health information resource.

Keywords:  Reddit; cross-sectional study; decision-making; health information; health information–seeking behavior; information quality; medical advice; medical problem; online health; online health information; patient education; quantitative analyses; r/medical_advice; social media; social news; subreddits; user interactions; user-generated content; virtual environments

DOI:  https://doi.org/10.2196/56116
Cleft Palate Craniofac J. 2025 Mar 17. 10556656251327803

Readability of Online Patient Education Materials for Cleft Care: A Systematic Review and Meta-Analysis.

Antoinette T Nguyen, Rena A Li, Arun K Gosain, Robert D Galiano.

  ObjectiveTo evaluate the readability of online patient education materials (PEMs) for cleft lip and/or palate and assess their alignment with recommended readability levels.DesignThis study is a systematic review and meta-analysis.SettingLiterature search conducted in PubMed, Scopus, and Embase databases following PRISMA guidelines.MaterialsStudies evaluating online PEMs for cleft care with reported readability metrics, including Flesch-Kincaid Grade Level, Flesch Reading Ease, SMOG Index, or Gunning Fog Index.InterventionsAssessment of readability metrics of online PEMs and evaluation of artificial intelligence tools (eg, ChatGPT) for text simplification.Main Outcome Measure(s)Pooled readability estimates (eg, Flesch-Kincaid Grade Level, Flesch Reading Ease, SMOG Index, Gunning Fog Index), heterogeneity (I²), and confidence intervals (CIs).ResultsNine studies were included, consistently showing that PEMs exceed readability recommendations. Pooled estimates revealed a Flesch-Kincaid Grade Level of 9.48 (95% CI: 8.51-10.45), Flesch Reading Ease score of 52.98 (95% CI: 42.62-63.34), SMOG Index of 9.27 (95% CI: 5.97-12.57), and Gunning Fog Index of 9.94 (95% CI: 8.90-10.98). Heterogeneity was minimal (I² = 0%). Artificial intelligence tools like ChatGPT demonstrated potential in simplifying text to the recommended sixth-grade reading level but lacked usability and comprehension testing.ConclusionsOnline PEMs for cleft care are consistently written at reading levels too complex for the average caregiver, underscoring the need for improved readability and accessibility. Future research should focus on developing multimodal resources, conducting usability assessments, and including non-English materials to address global disparities in cleft care education.

Keywords:  cleft lip; cleft palate; cleft repair; patient education; readability

DOI:  https://doi.org/10.1177/10556656251327803
Surgeon. 2025 Mar 14. pii: S1479-666X(25)00048-4. [Epub ahead of print]

A joint effort: Evaluating the quality and readability of online resources relating to total hip arthroplasty.

Samher Jassim, Conor J Kilkenny, Alex Price, Thomas Moore, Niall P McGoldrick, John F Quinlan.

   BACKGROUND: The internet serves as a major source of information for patients undergoing total hip arthroplasty (THA). However, prior research has shown that online medical information often exceeds recommended readability levels, posing a barrier to patient comprehension. The average reading level in the United States is between 7th and 8th grade, while leading health organizations recommend that patient information not exceed a 6th-grade level. This study aims to evaluate the readability and quality of information available online regarding THA.
METHODS: A systematic search was conducted on Google, Bing, and Yahoo using the terms "total hip arthroplasty" and "hip replacement surgery," with the top 30 URLs from each search engine selected. Readability was assessed using three readability scores (Gunning FOG, Flesch-Kincaid Grade, and Flesch Reading Ease). Quality was evaluated based on HONcode certification and the JAMA benchmark criteria.
RESULTS: Ninety webpages were included in the analysis. The mean Flesch-Kincaid Grade level was 9.5 ± 2.4, the mean Gunning FOG grade was 11.1 ± 3.0, and the mean Flesch Reading Ease score was 48.5 ± 13.8. Only 6 webpages were at or below a 6th-grade reading level. The mean JAMA score was 1.4 ± 1.3 out of 4, and 13 websites were HONcode accredited.
CONCLUSION: Online THA-related medical information is often too complex for the average patient, with inconsistent quality. This study assessed readability and credibility but did not evaluate medical accuracy or include hospital-based resources. Improving both readability and reliability is essential to enhance patient comprehension, support informed decision-making, and promote better health literacy.

Keywords:  Arthroplasty; Health literacy; Patient education; Readability; Total hip

DOI:  https://doi.org/10.1016/j.surge.2025.02.016
Mol Genet Metab Rep. 2025 Mar;42 101195

Analysis of readability of the top web searches for pediatric inborn errors of fatty acid metabolism.

Katelyn Sawyer, William Miller, Courtney Popp, Chloe Strege, Cindy Eide, Jakub Tolar.

Background: Disorders of fatty acid oxidation (FAOD) are estimated to account for around 1 in 10,000 live births, and with modern newborn screens, these conditions are often identified in childhood. However, not all parents will receive regular medical follow-up, and varying levels of parental health literacy can influence their reliance on online resources for information. Therefore, assessing the readability of online materials is critical to ensuring accessible and comprehensible patient education. Understanding the readability landscape informs our efforts to improve the quality of online resources and to support parents and patients in navigating the diagnosis of an FAOD.
Objective: Our goal was to evaluate the readability of public facing online materials concerning the 10 most common disorders of fatty acid oxidation, with consideration given to the recommended reading levels by the National Institutes of Health (NIH) and the American Medical Association (AMA).
Methods: Using Flesch-Kincaid, Coleman-Liau, and SMOG readability indices, we analyzed the top 25 internet search results for each disorder. Excluding empty or paywalled content, 232 publicly accessible materials were assessed.
Results: Mean readability ranged from 11.64 to 12.85, indicating generally higher complexity than recommended. Only 15.5 % of materials met NIH's 8th grade reading level guideline, and 3.9 % met AMA's 6th grade level. Variability existed between disorders, with percentages meeting guidelines ranging from 0 % to 25 % for NIH and 0 % to 8.3 % for AMA.
Conclusion: Ensuring readability of online resources for rare disorders of fatty acid oxidation is crucial, particularly given the prevalence of childhood diagnosis and varying levels of parental health literacy. Parents may rely on easily accessible but potentially complex materials found through online searches, highlighting the importance of aligning online content with recommended reading levels. Improving readability can enhance accessibility and understanding and facilitate informed decision-making and optimal care for patients.

DOI: https://doi.org/10.1016/j.ymgmr.2025.101195
Digit Health. 2025 Jan-Dec;11:11 20552076251327039

Quality assessment of temporomandibular disorders-related information on Chinese social media: A cross-sectional study.

Yifei Deng, Jianing Zhou, Ming Yang, Yaxin Weng, Xin Xiong.

   Background: Temporomandibular disorders (TMDs) affect people's quality of life greatly, and precise understanding of TMDs contributes to a proper treatment choice. Social media is an access to health information, hence it is needed to evaluate the quality of relative information on social media.
Objective: This research aims to assess the quality of information about TMDs understood by the public on two mainstream social media platforms, WeChat and Zhihu. They will be evaluated from four aspects: readability, credibility, concreteness, and accuracy.
Methods: Researchers searched for relative articles on WeChat and Zhihu and selected the samples. The readability was evaluated separately, and the DISCERN instrument was employed to evaluate the credibility and concreteness. Accuracy was measured by comparing samples with authoritative journals and textbooks. The Health On the Net code of conduct for medical and health websites (HONcode) and Global Quality Scale (GQS) were used as supplemental tools. Two researchers conducted this process independently, the intraclass correlation coefficient was used to examine the consistency.
Results: One hundred and eleven articles were included, with 47 articles from WeChat, 64 from Zhihu. For readability, the articles received a mean score of 27.79 (standard deviation (SD) 2.99) out of 35. The DISCERN instrument reported a mean score of 38.52 (SD 7.13) out of 80. As for accuracy, most articles (92 of 111) got 3.5 or more out of 5, demonstrating that the two platforms did well in this area. HONcode reported a mean score of 6.29 out of 16 (SD 1.42) while GQS showed a mean score of 2.91 out of 5 (SD 0.77), indicating the reliability needs improvements, and these articles can only provide limited help to the public.
Conclusions: The quality of TMDs-related information from WeChat and Zhihu is generally low. Although they do well on accuracy and readability, the credibility and concreteness still need further improvements. And different improvements and suggestions are recommended for uploaders and platforms.

Keywords:  Temporomandibular disorders; WeChat; Zhihu; quality assessment; social media; the DISCERN

DOI:  https://doi.org/10.1177/20552076251327039
J Eval Clin Pract. 2025 Mar;31(2): e70053

Analysis of Youtube Videos for Artificial Airway Suctioning Training of Nurses: A Content Analysis.

Sercan Kara, Dilek Yildirim.

   BACKGROUND: Artificial airway suctioning is one of the most widespread nursing procedures in clinical practice. Although artificial airway suctioning is frequently applied procedure, it is not easy to perform. It is most important to follow the correct technique to prevent possible complications.
METHODS: This study aims to assess the content, reliability, and quality of training videos on endotracheal aspiration for nurses. The descriptive and retrospective study was conducted in September 2023 and the videos available on the YouTube platform for artificial airway suctioning training of nurses were evaluated in terms of content, reliability and quality.
RESULTS: In total, 36 videos were analyzed. The analysis was conducted by two independent researchers. Video Information Form and Endotracheal or Tracheostomy Tube Aspiration Checklist created in line with the literature were used to evaluate the application steps of the videos. The short form of the DISCERN questionnaire was utilized to assess the reliability of the videos, while the 5-point Global Quality Scale (GQS) was employed to evaluate their quality. The mean DISCERN score for the videos was 2.958 ± 1.513, the mean GQS score was 3.430 ± 1.083, and the mean total score for the Implementation Checklist was 47.166 ± 10.338. A strong agreement was observed between the assessments made by the first and second researchers using the DISCERN, GQS, and Checklist. The videos included in the study were found to be of medium quality. Additionally, it was determined that all videos exhibited deficiencies in the Checklist steps. In addition, there were significant deficiencies in application principles such as compliance with sterile technique, sequence of application stages and recording.
CONCLUSIONS: These deficiencies are thought to pose a great risk in terms of patient safety. It is recommended that artificial airway suctioning videos available on YouTube be used in nursing education after being checked for content and by experts.

Keywords:  YouTube; artificial airway suctioning; clinical skills; video

DOI:  https://doi.org/10.1111/jep.70053
J Laparoendosc Adv Surg Tech A. 2025 Mar 21.

Educational Quality of YouTube™ Videos on Laparoscopic Radical Prostatectomy.

Ender Akdemir, Muhammet Çiçek, Battal Selçuk Çakmak, Mehmet Levent Akbulut, Muhammet Serdar Buğday.

  Introduction: Prostate cancer is the most prevalent urogenital cancer among males. Radical prostatectomy remains the gold standard for localized prostate cancer treatment, with minimally invasive procedures (laparoscopic, robot-assisted laparoscopic) increasingly replacing open surgeries. YouTube™, a popular digital platform, hosts a substantial volume of prostate cancer-related videos, presenting a mix of accurate and misleading content. Given these challenges, researchers have proposed evaluation frameworks to assess the quality of YouTube™ videos. This study evaluates the educational adequacy and contextual relevance of laparoscopic radical prostatectomy (LRP) videos on YouTube™ using established video evaluation criteria. Methods: A search using the keyword "Laparoscopic Radical Prostatectomy" yielded 200 YouTube™ videos. After applying inclusion and exclusion criteria, 131 videos were analyzed by three laparoscopic prostatectomy specialists. An evaluation was performed using scoring systems, including LAP-VEGaS, DISCERN, JAMA, GQS, and video power index (VPI). Results: Of the 131 videos, 88 (67%) were from individual participants (Group 1), and 43 (33%) were from corporate channels (Group 2). Group 2 demonstrated significantly higher JAMA, GQS, and mDISCERN scores (P = .028, .005, and .001, respectively). The LAP-VEGaS score was also higher in Group 2 (7.09 ± 0.43) compared to Group 1 (5.08 ± 0.26; P < .001). VPI values were significantly greater in Group 2 (P = .008). Conclusion: This study highlights a critical gap in the educational quality of LRP videos on YouTube™. Using comprehensive scoring systems, corporate channels consistently provided higher-quality educational content compared to individual contributors.

Keywords:  YouTubeTM; educational videos; laparoscopic radical prostatectomy

DOI:  https://doi.org/10.1089/lap.2025.0002
J Neurol Surg B Skull Base. 2025 Apr;86(2): 185-190

The Usefulness of YouTube Videos Related to Endoscopic Sinus Surgery for Surgical Residents.

Justin Shapiro, Marc Levin, Saud Sunba, Emily Steinberg, Vince Wu, John M Lee.

  Objective The use of online teaching modalities to supplement surgical learning has increased recently, demonstrating promising results. Previous studies have analyzed the value and usefulness of YouTube as an educational source to learners, including teaching surgical skills to Otolaryngology-Head and Neck Surgery (OHNS) trainees. YouTube videos on endoscopic sinus surgery (ESS) still need to be explored as ESS remains a common, yet challenging surgery that OHNS residents encounter regularly. This study aimed to objectively evaluate the usefulness of YouTube videos on ESS for surgical education. Design YouTube was searched using the following keywords: "uncinectomy," "maxillary antrostomy," "anterior ethmoidectomy," and "ethmoid bulla resection." These represent the initial ESS steps residents learn. Each video was assessed for eligibility by two independent reviewers. Outcome Measures The LAParoscopic surgery Video Educational Guidelines (LAP-VEGaS) and ESS-specific criteria were used to assess educational quality. Video popularity index (VPI) was used to calculate video popularity. Results Of the 38 videos that met inclusion criteria, the average LAP-VEGaS score was 6.59 (± ) 3.23 standard deviation. Most videos were designated low quality. There was a weak positive correlation between whether a video included ESS-specific criteria and LAP-VEGaS score ( r = 0.269, p = 0.102). There was a significant positive correlation between VPI and LAP-VEGaS scores ( r = 0.497, p = 0.003). Conclusion Overall, the quality of included videos was poor. OHNS residents should not rely solely or primarily on YouTube videos to learn surgical skills relevant to ESS. To maximize potential of online teaching, high-quality videos should be used to compliment other methods of teaching.

Keywords:  education; endoscopic sinus surgery; residents; surgical skills

DOI:  https://doi.org/10.1055/s-0044-1786045
J Obstet Gynaecol Can. 2025 Mar 14. pii: S1701-2163(25)00054-4. [Epub ahead of print] 102814

Content, Quality, and Reliability of Endometriosis Videos on YouTube.

Alexandra McGough, Rebecca J Schneyer, Kacey M Hamilton, Gabriel Levin, Matthew T Siedhoff, Kelly N Wright, Raanan Meyer.

  We aimed to assess the content and quality of YouTube videos about endometriosis. Apify was used to retrieve videos, and 138 videos were included. Most videos originated in high income countries and 50.0% were monetized. Median PEMAT Actionability and Understandability were 95.0 and 100.0, respectively. Median Discern score was 67.5. Compared to other sources, PEMAT Actionability and Discern scores were significantly higher for healthcare professional videos and for videos created from 2021 onward. There was a significant positive association between videos' year of appearance and promotion of endometriosis awareness. In conclusion, endometriosis videos are of high quality, especially when produced by healthcare professionals, endometriosis awareness promotion increases over the years.

Keywords:  endometriosis awareness; healthcare professionals; pelvic pain; social media; view count

DOI:  https://doi.org/10.1016/j.jogc.2025.102814
Public Health Nurs. 2025 Mar 20.

YouTube as a Source for Arabic-Speaking Parent Education on the Oral Hygiene of Children: A Social Media Content Analysis.

Maram Ali M Alwadi, AlBandary Hassan AlJameel, Munirah Mohammed A Alaskar, Saleha Ali Alzahrani, Fatmah Almoayad, Basil H Aboul-Enein, Patricia J Kelly.

   INTRODUCTION: Much primary prevention in public health dentistry depends on parents' having accurate knowledge about pediatric oral health. In areas with minimal education levels and few oral health professionals, information on this topic is available from the widespread use of the social medial resource YouTube. This study assessed the quality and viewer engagement of YouTube Arabic videos on pediatric oral health practices.
METHODS: Using standard procedures to search YouTube, we identified Arabic-language pediatric oral health videos. A social media content analysis was conducted and videos analyzed for viewer engagement metrics, country of origin, and creator occupation. The DISCERN instrument was used to evaluate video quality, reliability, and information quality; statistical correlations were examined between these parameters and video statistics.
RESULTS: A majority of the 47 videos that were identified originated from Egypt and were created by pediatric dentists, attracting an average of 13,328.7 views and 218.7 likes. Quality assessment found 61.7% of videos with moderate quality; 63.8% had only medium levels of reliability (63.8%) and 63.8% medium information quality (63.8%); only a minor segment achieving high reliability and information quality. Correlation analysis revealed a positive but weak association between DISCERN scores and viewer engagement metrics (e.g., likes, comments, views), suggesting that while better quality videos tend to engage more viewers, other factors also contribute to engagement. Additionally, a stronger correlation was noted between the overall quality of videos and both information quality and reliability, indicating that videos with higher-quality content were perceived as more reliable and informative by viewers.
CONCLUSION: While a significant volume of pediatric oral health content is available online, variability in quality highlights the need for stringent evidence-based standards to ensure the provision of reliable, quality educational materials.

Keywords:  Arab language; YouTube; internet; oral hygiene; social media

DOI:  https://doi.org/10.1111/phn.13551
Cureus. 2025 Feb;17(2): e78993

Analysis of Plantar Fasciitis Videos on YouTube: Quality and Reliability Assessment.

Ahmet Burak Satılmış, Tolgahan Cengiz.

  Objective Plantar fasciitis is one of the most common causes of heel pain and affects a significant portion of the population. Digital platforms such as YouTube play an essential role in patients' searches for health information. However, the accuracy and reliability of the information shared on these platforms are often questioned. Method In this study, the first 50 videos searched for "Plantar Fasciitis" on YouTube were evaluated using DISCERN and JAMA scoring systems. Videos were categorized according to uploaders (physicians, physiotherapists, independent users, etc.) and content types (general information, exercise, non-surgical treatment). Video Power Index (VPI) and statistical analyses were applied to evaluate the quality of the content. Results 74% of the videos were uploaded by non-physicians, and the DISCERN and JAMA scores of the content uploaded by physicians were statistically higher (p<0.01). However, in the overall evaluation, most of the videos were found to be of low quality. The average length of the videos was 7.63 minutes, and most of the content was shared by physiotherapists (46%). Conclusion Most YouTube videos about plantar fasciitis contain low-quality content. Although videos uploaded by physicians appear more reliable, a general lack of information can lead to misinforming patients. Healthcare professionals, universities, and institutions should be encouraged to produce accurate educational content. Improving the information quality of digital platforms will help patients make informed decisions.

Keywords:  discern score; health information; jama score; plantar fasciitis; youtube

DOI:  https://doi.org/10.7759/cureus.78993
Urogynecology (Phila). 2025 Mar 19.

YouTube and UTIs: What Is Online Video Content Teaching Our Patients?

Benjamin Worrall, Anthony-Joe Nassour, Kevin Zhuo, Maria Pilar Alvarado, Amanda Chung.

IMPORTANCE: YouTube is an important source of information about urinary tract infections (UTIs), which are the most common outpatient infections.
OBJECTIVE: This study aimed to assess the quality of YouTube videos about UTI prevention.
STUDY DESIGN: Three doctors independently reviewed the first 50 YouTube search results for "how to prevent UTIs," using the DISCERN and Patient Education Materials Assessment Tool (PEMAT); SPSSv28 was used for analysis with P < 0.05 considered significant.
RESULTS: Three non-English videos were excluded. Sixteen of 47 (34%) were produced by medical sources. Forty-three of 47 (91%) were targeted at patients, rather than clinicians. The median views per video was 24,110 (88-5,552,204). Nonmedical sources ranked higher in search results (rs = 0.41, P < 0.05). Nonmedical sources had more subscribers and views. Nonmedical sources were "liked" significantly more than nonmedical sources (U = 146, P < 0.05). The overall quality of evidence-based material was moderate (mean DISCERN, 3.1). Medical sources were significantly more accurate than nonmedical sources (DISCERN, 3.6 cf. 2.9; P = 0.03). The overall mean PEMAT understandability was 62.8%, and actionability was 65.7%, with no significant difference between medical and nonmedical sources. The video view count was not associated with significantly higher PEMAT or DISCERN scores.
CONCLUSIONS: Videos by medical sources were more factually reliable, but there was no difference in delivery quality between medical and nonmedical sources. Patients may present with inaccurate preconceptions about UTI treatment from YouTube, which practitioners should be prepared to address. There is a role for medical institutions and all doctors who treat patients for UTIs to create YouTube content that is both factually accurate and accessible to patients.

DOI: https://doi.org/10.1097/SPV.0000000000001672
Allergy Asthma Clin Immunol. 2025 Mar 19. 21(1): 12

The origin of YouTube videos on hereditary angioedema matters.

Pelin Korkmaz, Ilkim Deniz Toprak, Zeynep Kilinc, Derya Unal, Semra Demir, Asli Gelincik.

BACKGROUND: Hereditary angioedema (HAE) is a rare, potentially life-threatening condition that requires accessible and reliable information. YouTube has emerged as a significant source of health-related content, offering valuable insights while posing the risk of misinformation that warrants caution among users. The aim of this study was to evaluate the popularity, reliability, understandability, actionability, and overall quality of YouTube videos related to HAE.
METHOD: A search was conducted on YouTube using the term "hereditary angioedema." Videos were categorized based on their origin (health or nonhealth) and content type (medical professional education (MPE), patient education (PE), patient experience, or awareness). The quality, reliability, understandability, and actionability of the videos were assessed via the Global Quality Scale (GQS), the Patient Education Materials Assessment Tool for Audiovisual Materials (PEMAT-A/V), and the Quality Criteria for Consumer Health Information (DISCERN) tool. Three independent allergists evaluated the videos.
RESULTS: Out of 135 reviewed videos, 111 met the inclusion criteria. The health group presented significantly higher scores than did the nonhealth group in several metrics: PEMAT-A/V understandability (83, IQR: 56-92, p = 0.001), total DISCERN score (37, IQR: 3-45, p < 0.001), reliability (23, IQR: 19-26, p < 0.001), treatment (15, IQR: 8-21, p = 0.007), and modified DISCERN score (3, IQR: 2-4, p = 0.002). Health videos were uploaded more recently (p = 0.006), while awareness videos tended to be older than more recent MPE videos (p = 0.002). The MPE videos had the longest duration, whereas the awareness videos had the shortest duration (p < 0.001). Video quality scores, assessed via the GQS, were higher in both the MPE and PE groups (scores: 3, 4, and 5; p = 0.005). Compared with the other groups, the MPE group also had significantly higher PEMAT-A/V understandability scores (91, IQR: 70.75-92, p < 0.001), total DISCERN scores (40, IQR: 30.75-49.5, p < 0.001), reliability scores (24, IQR: 21-27.25, p < 0.001), and overall scores for moderate to high quality (83, 74.8%, p = 0.002).
CONCLUSION: YouTube videos on HAE uploaded by health care professionals generally offer higher-quality information, but their overall reliability remains suboptimal. There is a pressing need for higher-quality, trustworthy content, particularly from professional medical organizations, to address this gap.

DOI: https://doi.org/10.1186/s13223-025-00947-6
Clin Park Relat Disord. 2025 ;12 100311

Evaluating YouTube as a source of information on hemifacial spasm.

Kimberly L Po, Alfeo Julius R Sy, Roland Dominic G Jamora.

   Background and Objectives: Patients increasingly turn to YouTube for trustworthy health-related information, prompting a study to evaluate the quality and reliability of videos about hemifacial spasms (HFS) available on the platform.
Materials and Methods: In August 2024, a systematic search was conducted using a formal strategy to identify relevant videos. Two independent neurology resident physicians reviewed each video, scoring it with the validated modified DISCERN (mDISCERN) tool for reliability and the Global Quality Scale (GQS) for content quality. Videos were categorized based on their purpose and assessed for video/audio quality, accuracy, comprehensiveness, and procedure-specific content.
Results: The study included 44 videos. According to GQS, 17 (38.6 %) were rated as high quality, 14 (31.8 %) as good, 5 (17 %) as medium, and 8 (18.2 %) as poor quality. On the mDISCERN scale, 24 (54.5 %) were deemed reliable, while 9 (20.5 %) were unreliable. Videos created by physicians, academic institutions, and reputable health information websites scored higher on both mDISCERN and GQS compared to other sources. A strong positive correlation was found between mDISCERN and GQS scores (r = 0.925, p < 0.001), indicating that higher reliability was linked to better content quality.
Conclusion: YouTube offers valuable resources for HFS patients and caregivers. Videos produced by healthcare professionals and academic institutions offered particularly accurate insights, enhancing patients' understanding of the condition's pathophysiology and treatment options, and serving as a useful complement to healthcare professionals' knowledge. Healthcare professionals and academic institutions have a pivotal role in creating and promoting high-quality educational content. Future efforts should focus on increasing the availability of reliable, expert-verified videos to improve overall quality of information accessible to patients.

Keywords:  Facial exercise; Facial twitching; Hemifacial spasm; Information; YouTube

DOI:  https://doi.org/10.1016/j.prdoa.2025.100311
Sci Rep. 2025 Mar 17. 15(1): 9189

Evaluating the quality of TikTok videos on coronary artery disease using various scales to examine correlations with video characteristics and high-quality content.

Peng Hao, Guixin Liu, Shuang Lian, Jiaxu Huang, Lin Zhao.

  Background Coronary artery disease (CAD) is a major public health concern, yet reliable sources of relevant information are limited. TikTok, a popular social media platform in China, hosts diverse health-related videos, including those on CAD; however, their quality varies and is largely unassessed. Objective This study aimed to investigate the quality of CAD-related videos on TikTok and explore the correlation between video characteristics and high-quality videos. Methods A total of 122 CAD-related short videos on TikTok were analyzed on July 18, 2023. Basic video information and sources were extracted. Two evaluators independently scored each video using DISCERN (a health information quality scale), the Patient Education Materials Assessment Tool (PEMAT) and the Health on the Net (HONcode) scales. Videos were categorized into four groups based on their source, with the medical professional group further categorized by job titles. Simple linear analysis was used to examine the linear relationship across different scales and to explore the relationship between video characteristics (video length, time since posting, the number of "likes", comments and "favorites", and the number of followers of the video creator) and different scales. Results AQVideos were categorized into four groups based on their source: medical professionals (n = 98, 80.3%), user-generated content (n = 11, 9.0%), news programs (n = 4, 3.3%), and health agencies or organizations (n = 9, 7.4%). The score of DISCERN was 46.5 ± 7.6/80, the score rate of PEMAT was 79.2 ± 12.6%/100%, and the number of score items for HONcode was 1.4 ± 0.6/8. In Sect. 1 of DISCERN, user-generated content scored highest (29.1 ± 3.6), followed by medical professionals (28.6 ± 2.4), health agencies or organizations (28.0 ± 0.0) and news programs (28.0 ± 0.0)(P = 0.047). In HONcode, most videos met only one or two of the eight evaluation criteria. PEMAT scores varied slightly across categories without significant differences (P = 0.758). Medical professionals were further divided into senior (n = 69, 70.4%) and intermediate (n = 29, 29.6%) groups, with intermediate professionals scoring higher in DISCERN (P < 0.001). In simple linear analysis models, no linear correlation was found between DISCERN and PEMAT scores (P = 0.052). Time since posting on TikTok was negatively correlated with DISCERN (P = 0.021) and PEMAT scores (P = 0.037), and the number of "favorites" was positively correlated to DISCERN score (P = 0.007). Conclusion The quality of CAD-related videos on China's TikTok is inconsistent and varies across different evaluation scales. Videos posted by medical professionals with intermediate titles tended to offer higher quality, more up-to-date content, as reflected by higher "favorite" counts. HONcode may not be suitable for short video evaluation due to its low score rate, while DISCERN and PEMAT may be effective tools for short video evaluation. However, their lack of consistency in evaluation dimensions highlight the need for a tailored scoring system for short videos.

Keywords:  DISCERN; HONcode; PEMAT; Quality evaluation; Short video; TikTok

DOI:  https://doi.org/10.1038/s41598-025-93986-3
Int J Dermatol. 2025 Mar 21.

TikTok as a Source of Information on Retinol: A Cross-Sectional Analysis.

Anagha Srivatsa, Wasey Rehman, Rafey Rehman, Meena Moossavi, Darius Mehregan.



Keywords:  TikTok; dermatology; health education; retinol

DOI:  https://doi.org/10.1111/ijd.17741
Scand J Caring Sci. 2025 03;39(1): e70020

Investigation of Nursing Students' Online Information Searching Strategies and Attitudes Towards Informatics Ethical Values.

Necibe Dagcan Sahin, Gulsah Gurol Arslan, Dilek Ozden.

   AIMS AND OBJECTIVES: Information technologies used in accessing online information also bring ethical problems. Informatics ethics are affected by students' socio-demographic characteristics. This study was conducted to examine nursing students' online information searching strategies and attitudes towards informatics ethical values.
METHODS: A descriptive, cross-sectional study design was employed. A non-probability sampling method was used to determine the research sample. Data for the study were collected from students aged ≥ 18 years who were studying at the faculty of nursing of a university located in the west of the country between 15 September 2021 and 15 June 2022 and who agreed to participate in the research voluntarily. Data were collected using a 'Descriptive Information Form', the 'Online Information Searching Strategy Inventory' and the 'Attitude Scale towards Informatics Ethical Values'.
RESULTS: Data were collected from a total of 710 first-, second-, third- and fourth-year nursing students. Of the nursing students participating in the study, 40.1% were between the ages of 20 and 21, and 61.5% were female. A statistically significant difference was determined between their mean scores on the total information searching strategy scale and gender, class, the status of having a personal computer, level of computer use, the status of having an internet connection and the frequency of searching for information on the Internet (p < 0.05). In addition, there was a statistically significant difference between their mean scores on the total attitude scale towards informatics ethical values scale and age, gender, school year, the status of having a personal computer, the status of having an internet connection, frequency of searching for information on the Internet and the duration of searching for information online a week (p < 0.05). A positive, low-level, statistically significant relationship was found between nursing students' attitudes towards informatics ethical values and online information searching strategies (r = 0.339, p = 0.000).
CONCLUSION: In conclusion, it was found that when online information searching strategies increased, attitudes towards informatics ethical values increased as well. It is thought that our study results will be a guide for nursing students to develop online information search strategies throughout their educational lives and raise awareness about informatics ethics. It may be recommended that students' information search strategies be determined, information sources be found, information be used and necessary arrangements be made to meet their needs.

Keywords:  informatics ethics; nursing student; online information search

DOI:  https://doi.org/10.1111/scs.70020
J Cosmet Dermatol. 2025 Mar;24(3): e70130

Correction to "Social Media Use as a Source of Information by Acne Vulgaris Patients".

DOI: https://doi.org/10.1111/jocd.70130
Digit Health. 2025 Jan-Dec;11:11 20552076251326160

Perceptions about the use of virtual assistants for seeking health information among caregivers of young childhood cancer survivors.

Emre Sezgin, Daniel I Jackson, Kate Kaufman, Micah A Skeens, Cynthia A Gerhardt, Emily Moscato.

   Objectives: This study examined the perceptions of caregivers of young childhood cancer survivors (YCCS) regarding the use of virtual assistant (VA) technology for health information seeking and care management. The study aim was to understand how VAs can support caregivers, especially those from underserved communities, in navigating health information related to cancer survivorship.
Methods: A qualitative study design was employed, involving semi-structured interviews and focus groups with 10 caregivers of YCCS from metropolitan, rural, and Appalachian regions, recruited from a large pediatric academic medical center in the Midwest. A web-based VA prototype was tested with caregivers, who provided feedback on its usability, utility, and feasibility. Data were analyzed using thematic analysis to identify key themes related to caregivers' interactions with and perceptions of the VA technology.
Results: We identified four major themes: Interface and Interaction, User Experience, Content Relevance, and Trust. Caregivers expressed preferences for multimodal interactions (voice and text), particularly valuing flexibility based on context. They emphasized the need for accurate, relevant, and easily retrievable health information tailored to their child's needs. Trust and confidentiality were critical, with caregivers favoring VAs integrated with trusted healthcare systems. While VAs were perceived as valuable tools for reducing search fatigue and cognitive burden, caregivers highlighted the need for improved conversational depth, personalization, and empathetic response.
Conclusions: VAs hold promise as support tools for caregivers of YCCS, particularly in underserved communities, by offering personalized, credible, and accessible health information. To maximize their potential, research and development efforts should focus on building trust-building, integrated, and personalized VAs. These enhancements can help VAs further ease caregiving tasks and support caregivers in managing complex health needs.

Keywords:  Caregivers; health information seeking; pediatric cancer survivors; rural health; underserved communities; virtual assistants

DOI:  https://doi.org/10.1177/20552076251326160
J Rheumatol. 2025 Mar 15. pii: jrheum.2024-1088. [Epub ahead of print]

Trust in Health Information Sources Among Patients with Systemic Lupus Erythematosus in the Social Networking Era: The TRUMP2-SLE Study.

Takanori Ichikawa, Dai Kishida, Yasuhiro Shimojima, Nobuyuki Yajima, Nao Oguro, Ryusuke Yoshimi, Natsuki Sakurai, Chiharu Hidekawa, Ken-Ei Sada, Yoshia Miyawaki, Keigo Hayashi, Kenta Shidahara, Yuichi Ishikawa, Yoshiki Sekijima, Noriaki Kurita.

OBJECTIVE: The growing use of social networking services (SNSs) has impacted how patients with systemic lupus erythematosus (SLE) access health information, potentially influencing their interaction with healthcare providers. This study aimed to examine patients' preferences, actual use, and trust in various health information sources, along with the factors influencing the trust among patients with SLE.
METHODS: A multicenter cross-sectional survey was conducted from June 2020 to August 2021, involving 510 Japanese adults with SLE. Participants reported their preferred and actual sources of health information, including SNSs, and their level of trust in these sources. Modified Poisson regression was used to analyze factors influencing trust, including internet usage and health literacy (HL) (functional, communicative, and critical).
RESULTS: Most respondents (98.2%) expressed trust in doctors, while trust in websites/blogs (52.0%) and SNSs (26.8%) was lower. Despite this, the internet was the most frequent initial source of health information (45.3%), encompassing medical institution websites, patient blogs, X (formerly Twitter), and Instagram. Longer internet usage periods were associated with a greater trust in websites/blogs and SNSs. Higher functional HL was correlated with an increased trust in doctors but decreased trust in websites/blogs and SNSs. Higher communicative HL was linked to a greater trust in doctors, websites, and blogs.
CONCLUSION: Although many patients with SLE initially seek health information online, they prefer consulting rheumatologists. Internet usage duration and multidimensional HL influence trust in online sources. Healthcare providers should consider these factors when disseminating health information and engaging with patients.

DOI: https://doi.org/10.3899/jrheum.2024-1088