bims-librar 2025-04-20 papers

bims-librar

Biomed News

on Biomedical librarianship

Issue of 2025–04–20
thirty-one papers selected by
Thomas Krichel, Open Library Society

Trends in Medical and Health Sciences Librarianship: A Comparative Analysis of Job Postings, Salary and Geographic Location, 2022 - 2024.
The Influence of Medical Expertise and Information Search Skills on Medical Information Searching: Comparative Analysis From a Free Data Set.
Evaluation of the measurement properties of online health information quality assessment tools: A systematic review.
Fictional biomedical bibliographies in the Era of ChatGPT: Don´t revive Dr. O. Úplavici (1887-1938).
Artificial Intelligence Software to Accelerate Screening for Living Systematic Reviews.
Assessing the Coverage of PubMed, Embase, OpenAlex and Semantic Scholar for Automated Single Database Searches in Living Guideline Evidence Surveillance: A Case Study of the International PCOS Guidelines 2023.
An automated framework for assessing how well LLMs cite relevant medical references.
DeepSeek Versus GPT: Evaluation of Large Language Model Chatbots' Responses on Orofacial Clefts.
Assessing large language models as assistive tools in medical consultations for Kawasaki disease.
Evaluating the Efficacy of Large Language Models in Generating Medical Documentation: A Comparative Study of ChatGPT-4, ChatGPT-4o, and Claude.
Readability of AI & Ophthalmologist Responses to Patient Surgery Queries: Comment.
A comparative analysis of CDC and AI-generated health information using computer-aided text analysis.
Comparative Analysis of ChatGPT and Google Gemini in Generating Patient Educational Resources on Cardiac Health: A Focus on Exercise-Induced Arrhythmia, Sleep Habits, and Dietary Habits.
Performance of artificial intelligence chatbots in responding to the frequently asked questions of patients regarding dental prostheses.
Assessing the Usability of ChatGPT Responses Compared to Other Online Information in Hand Surgery.
Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.
The readability of online English and Spanish patient education materials on anaesthesia for orthopaedic surgery.
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.
Disparities in the Readability and Quality of Online Patient Education Materials for Neurotoxin and Surgical Treatment of Chronic Migraine.
Are we being forthright with the patients about vertebral body tethering? Quality, contemporaneity, and readability analysis of the online content about vertebral body tethering.
Osseointegration in Limb Replacement Surgery: A Quality and Readability Analysis of Information on the Internet.
Readability of Online Patient Education Materials for Cleft Care: Comment.
YouTube Videos as an Information Source About Aerobic Exercise in Rehabilitation of Lung Cancer.
A Content Analysis of YouTube Videos on Allergic Rhinitis Management.
YouTube as a source of education on piriformis injection: a content, quality, and reliability analysis.
Quality of cerebral palsy videos on Chinese social media platforms.
YouTube and TikTok as sources of information on acute pancreatitis: a content and quality analysis.
Video platforms and sexual healthcare in China: assessment of content on premature ejaculation.
An evaluation of TikTok videos as a source of information for orthognathic surgery.
Descriptive analysis of TikTok content on vaccination in Arabic.
Video Clips of the Dietary Approaches to Stop Hypertension Diet on YouTube: A Social Media Content Analysis.

Med Ref Serv Q. 2025 Apr 14. 1-12

Trends in Medical and Health Sciences Librarianship: A Comparative Analysis of Job Postings, Salary and Geographic Location, 2022 - 2024.

David Petersen, Emily Harris.

  Job postings for medical and health sciences librarians provide valuable data for those seeking a better understanding of the evolving field of librarianship. Our data indicate a decrease in the number of postings from 2022 to 2024, a modest increase in the percentage of postings advertising remote/hybrid work, an increase in the average minimum posted salary, and a majority of postings focused on one or more public service components of library services. Utilizing this data provides a more complete picture of a profession in transition.

Keywords:  Health sciences librarianship; job advertisements; job postings; librarianship as a profession; medical librarianship; remote and hybrid work; salaries

DOI:  https://doi.org/10.1080/02763869.2025.2489935
JMIR Form Res. 2025 Apr 17. 9 e62754

The Influence of Medical Expertise and Information Search Skills on Medical Information Searching: Comparative Analysis From a Free Data Set.

Aline Chevalier, Cheyenne Dosso.

   BACKGROUND: Nowadays, the internet has become the primary source of information for physicians seeking answers to medical questions about their patients before consulting colleagues. However, many websites provide low-quality, unreliable information that lacks scientific validation. Therefore, physicians must develop strong information search skills to locate relevant, accurate, and evidence-based content. However, previous studies have shown that physicians often have poor search skills and struggle to find information on the web, which may have detrimental consequences for patient care.
OBJECTIVE: This study aims to determine how medical students and residents searched for medical information on the internet, the quality of the web resources they used (including their nature and credibility), and how they evaluated the reliability of these resources and the answers they provided. Given the importance of domain knowledge (in this case, medicine) and information search skills in the search process, we compared the search behaviors of medical students and residents with those of computer science students. While medical students and residents possess greater medical-related knowledge, computer science students have stronger information search skills.
METHODS: A total of 20 students participated in this study: 10 medical students and residents, and 10 computer science students. Data were extracted from a freely accessible data set in accordance with FAIR (Findable, Accessible, Interoperable, and Reusable) principles. All participants searched for medical information online to make a diagnosis, select a treatment, and enhance their knowledge of a medical condition-3 primary activities they commonly perform. We analyzed search performance metrics, including search time, the use of medical-related keywords, and the accuracy of the information found, as well as the nature and credibility of web resources used by medical students and residents compared with computer science students.
RESULTS: Medical students and residents provided more accurate answers than computer science students without requiring additional time. Their medical expertise also enabled them to better assess the reliability of resources and select high-quality web sources, primarily from hospital websites. However, it is noteworthy that they made limited use of evidence-based tools such as PubMed.
CONCLUSIONS: Although medical students and residents generally outperformed computer science students, they did not frequently use evidence-based tools. As previously observed, they may avoid databases due to the risk of encountering too many irrelevant articles and difficulties in applying appropriate filters to locate relevant information. Nevertheless, clinical and practical evidence-based medicine plays a crucial role in updating physicians' knowledge, improving patient care, and enhancing physician-patient relationships. Therefore, information search skills should be an integral part of medical education and continuing professional development for physicians.

Keywords:  credibility; information search skills; information searching; internet; medicine

DOI:  https://doi.org/10.2196/62754
Int J Nurs Sci. 2025 Mar;12(2): 130-136

Evaluation of the measurement properties of online health information quality assessment tools: A systematic review.

Yating Li, Hui Ouyang, Gan Lin, Yichao Peng, Jinghui Yao, Yun Chen.

   Objectives: This study aimed to evaluate the measurement properties and methodological quality of instruments developed to evaluate the quality of online health information.
Methods: In this study, a systematic search was conducted across a range of databases, including the China National Knowledge Infrastructure (CNKI), Wanfang, China Science and Technology Journal (VIP), SinoMed, PubMed, Web of Science, CINAHL, Embase, the Cochrane Library, PsycINFO, and Scopus. The search period spanned from the inception of the databases to October 2023. Two researchers independently conducted the literature screening and data extraction. The methodological quality of the included studies was assessed using the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) Risk of Bias checklist. The measurement properties were evaluated using the COSMIN criteria. The modified Grading, Recommendations, Assessment, Development, and Evaluation (GRADE) system was used to determine the quality grade.
Results: A total of 18 studies were included, and the measurement properties of 17 scales were assessed. Fifteen scales had content validity, three had structural validity, six had internal consistency, two had test-retest reliability, nine had interrater reliability, one had measurement error, six instruments had criterion validity, and three scales had hypotheses testing for construct validity; however, the evaluation of their methodological quality and measurement properties revealed deficiencies. Of these 17 scales, 15 were assigned a Level B recommendation, and two received a Level C recommendation.
Conclusions: The Health Information Website Evaluation Tool (HIWET) can be temporarily used to evaluate the quality of health information on websites. The Patient Education Materials Assessment Tool (PEMAT) can temporarily assess the quality of video-based health information. However, the effectiveness of both tools needs to be further verified.

Keywords:  Consumer health information; Data accuracy; Internet; Quality indicators; Systematic review

DOI:  https://doi.org/10.1016/j.ijnss.2025.02.015
Virchows Arch. 2025 Apr 15.

Fictional biomedical bibliographies in the Era of ChatGPT: Don´t revive Dr. O. Úplavici (1887-1938).

Abbas Agaimy.

DOI: https://doi.org/10.1007/s00428-025-04100-x
Clin Child Fam Psychol Rev. 2025 Apr 18.

Artificial Intelligence Software to Accelerate Screening for Living Systematic Reviews.

Matthew Fuller-Tyszkiewicz, Allan Jones, Rajesh Vasa, Jacqui A Macdonald, Camille Deane, Delyth Samuel, Tracy Evans-Whipp, Craig A Olsson.

  Systematic and meta-analytic reviews provide gold-standard evidence but are static and outdate quickly. Here we provide performance data on a new software platform, LitQuest, that uses artificial intelligence technologies to (1) accelerate screening of titles and abstracts from library literature searches, and (2) provide a software solution for enabling living systematic reviews by maintaining a saved AI algorithm for updated searches. Performance testing was based on LitQuest data from seven systematic reviews. LitQuest efficiency was estimated as the proportion (%) of the total yield of an initial literature search (titles/abstracts) that needed human screening prior to reaching the in-built stop threshold. LitQuest algorithm performance was measured as work saved over sampling (WSS) for a certain recall. LitQuest accuracy was estimated as the proportion of incorrectly classified papers in the rejected pool, as determined by two independent human raters. On average, around 36% of the total yield of a literature search needed to be human screened prior to reaching the stop-point. However, this ranged from 22 to 53% depending on the complexity of language structure across papers included in specific reviews. Accuracy was 99% at an interrater reliability of 95%, and 0% of titles/abstracts were incorrectly assigned. Findings suggest that LitQuest can be a cost-effective and time-efficient solution to supporting living systematic reviews, particularly for rapidly developing areas of science. Further development of LitQuest is planned, including facilitated full-text data extraction and community-of-practice access to living systematic review findings.

Keywords:  Accuracy; Artificial intelligence; Efficiency; Machine learning; Systematic reviews

DOI:  https://doi.org/10.1007/s10567-025-00519-5
J Clin Epidemiol. 2025 Apr 16. pii: S0895-4356(25)00122-2. [Epub ahead of print] 111789

Assessing the Coverage of PubMed, Embase, OpenAlex and Semantic Scholar for Automated Single Database Searches in Living Guideline Evidence Surveillance: A Case Study of the International PCOS Guidelines 2023.

Darren Rajit, Steve McDonald, Chau Thien Tay, Lan Du, Joanne Enticott, Helena Teede.

   BACKGROUND: Living guideline maintenance is underpinned by manual approaches towards evidence retrieval, limiting long term sustainability. Our study aimed to evaluate the feasibility of using only PubMed, Embase, OpenAlex or Semantic Scholar in automatically retrieving articles that were included in a high-quality international guideline - the 2023 International Polycystic Ovary Syndrome (PCOS) Guidelines.
METHODS: The digital object identifiers (DOIs) and PubMed ID (PMIDs) of articles included after full text screening in the 2023 International PCOS Guidelines were extracted. These IDs were used to automatically retrieve article metadata from all tested databases. A title only search was then conducted on articles that were not initially retrievable. The extent of coverage, and overlap of coverage, was determined for each database. An exploratory analysis of the risk of bias of articles that were unretrievable was then conducted for each database.
RESULTS: OpenAlex had the best coverage (98.6%), followed by Semantic Scholar (98.3%), Embase (96.8%) and PubMed (93.0%). However, 90.5% of all articles were retrievable from all four databases. All articles that were not retrievable from OpenAlex and Semantic Scholar were either assessed as medium or high risk of bias. In contrast, both Embase and PubMed missed articles that were of high quality (low risk of bias).
CONCLUSION: OpenAlex should be considered as a single source for automated evidence retrieval in living guideline development, due to high coverage, and low risk of missing high-quality articles. These insights are being leveraged as part of transitioning the 2023 International PCOS Guidelines towards a living format.

Keywords:  Evidence synthesis; evidence retrieval; learning health systems; living evidence; living guidelines

DOI:  https://doi.org/10.1016/j.jclinepi.2025.111789
Nat Commun. 2025 Apr 16. 16(1): 3615

An automated framework for assessing how well LLMs cite relevant medical references.

Kevin Wu, Eric Wu, Kevin Wei, Angela Zhang, Allison Casasola, Teresa Nguyen, Sith Riantawan, Patricia Shi, Daniel Ho, James Zou.

As large language models (LLMs) are increasingly used to address health-related queries, it is crucial that they support their conclusions with credible references. While models can cite sources, the extent to which these support claims remains unclear. To address this gap, we introduce SourceCheckup, an automated agent-based pipeline that evaluates the relevance and supportiveness of sources in LLM responses. We evaluate seven popular LLMs on a dataset of 800 questions and 58,000 pairs of statements and sources on data that represent common medical queries. Our findings reveal that between 50% and 90% of LLM responses are not fully supported, and sometimes contradicted, by the sources they cite. Even for GPT-4o with Web Search, approximately 30% of individual statements are unsupported, and nearly half of its responses are not fully supported. Independent assessments by doctors further validate these results. Our research underscores significant limitations in current LLMs to produce trustworthy medical references.

DOI: https://doi.org/10.1038/s41467-025-58551-6
J Craniofac Surg. 2025 Apr 17.

DeepSeek Versus GPT: Evaluation of Large Language Model Chatbots' Responses on Orofacial Clefts.

Hongru Zhou, Zhiyan Wang, Rongsheng Wang, Leheng Jiang, Congxiao Zhu, Haoyue Guo, Tao Song, Ningbei Yin.

  Advancements in natural language processing (NLP) have led to the emergence of large language models (LLMs) as potential tools for patient consultations. This study investigates the ability of reasoning-capable models to provide diagnostic and treatment recommendations for orofacial clefts. A cross-sectional comparative study was conducted using 20 questions based on Google Trends and expert experience, with both models providing responses to these queries. Readability was assessed using the Flesch-Kincaid Reading Ease (FRES), Flesch-Kincaid Grade Level (FKGL), sentence count, number of sentences, and percentage of complex words. No statistically significant differences were found in the readability metrics for FKGL (P = 0.064) and FRES (P = 0.56) between the responses of the 2 models. Physician evaluation using a 4-point Likert scale assessed accuracy, clarity, relevance, and trustworthiness, with Deepseek-R1 achieving significantly higher ratings overall (P = 0.041). However, GPT o1-preview exhibited notable empathy in certain clinical scenarios. Both models displayed complementary strengths, indicating potential for clinical consultation applications. Future research should focus on integrating these strengths within medical-specific LLMs to generate more reliable, empathetic, and personalized treatment recommendations.

Keywords:  Artificial intelligence; ChatGPT; Deepseek; large language model; orofacial clefts

DOI:  https://doi.org/10.1097/SCS.0000000000011399
Front Artif Intell. 2025 ;8 1571503

Assessing large language models as assistive tools in medical consultations for Kawasaki disease.

Chunyi Yan, Zexi Li, Yongzhou Liang, Shuran Shao, Fan Ma, Nanjun Zhang, Bowen Li, Chuan Wang, Kaiyu Zhou.

   Background: Kawasaki disease (KD) presents complex clinical challenges in diagnosis, treatment, and long-term management, requiring a comprehensive understanding by both parents and healthcare providers. With advancements in artificial intelligence (AI), large language models (LLMs) have shown promise in supporting medical practice. This study aims to evaluate and compare the appropriateness and comprehensibility of different LLMs in answering clinically relevant questions about KD and assess the impact of different prompting strategies.
Methods: Twenty-five questions were formulated, incorporating three prompting strategies: No prompting (NO), Parent-friendly (PF), and Doctor-level (DL). These questions were input into three LLMs: ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro. Responses were evaluated based on appropriateness, educational quality, comprehensibility, cautionary statements, references, and potential misinformation, using Information Quality Grade, Global Quality Scale (GQS), Flesch Reading Ease (FRE) score, and word count.
Results: Significant differences were found among the LLMs in terms of response educational quality, accuracy, and comprehensibility (p < 0.001). Claude 3.5 provided the highest proportion of completely correct responses (51.1%) and achieved the highest median GQS score (5.0), outperforming GPT-4o (4.0) and Gemini 1.5 (3.0) significantly. Gemini 1.5 achieved the highest FRE score (31.5) and provided highest proportion of responses assessed as comprehensible (80.4%). Prompting strategies significantly affected LLM responses. Claude 3.5 Sonnet with DL prompting had the highest completely correct rate (81.3%), while PF prompting yielded the most acceptable responses (97.3%). Gemini 1.5 Pro showed minimal variation across prompts but excelled in comprehensibility (98.7% under PF prompting).
Conclusion: This study indicates that LLMs have great potential in providing information about KD, but their use requires caution due to quality inconsistencies and misinformation risks. Significant discrepancies existed across LLMs and prompting strategies. Claude 3.5 Sonnet offered the best response quality and accuracy, while Gemini 1.5 Pro excelled in comprehensibility. PF prompting with Claude 3.5 Sonnet is most recommended for parents seeking KD information. As AI evolves, expanding research and refining models is crucial to ensure reliable, high-quality information.

Keywords:  Kawasaki disease; artificial intelligence; large language models; medical education; medical information

DOI:  https://doi.org/10.3389/frai.2025.1571503
Aesthetic Plast Surg. 2025 Apr 14.

Evaluating the Efficacy of Large Language Models in Generating Medical Documentation: A Comparative Study of ChatGPT-4, ChatGPT-4o, and Claude.

Bryan Lim, Ishith Seth, Molly Maxwell, Roberto Cuomo, Richard J Ross, Warren M Rozen.

   BACKGROUND: Large language models (LLMs) have demonstrated transformative potential in health care. They can enhance clinical and academic medicine by facilitating accurate diagnoses, interpreting laboratory results, and automating documentation processes. This study evaluates the efficacy of LLMs in generating surgical operation reports and discharge summaries, focusing on accuracy, efficiency, and quality.
METHODS: This study assessed the effectiveness of three leading LLMs-ChatGPT-4.0, ChatGPT-4o, and Claude-using six prompts and analyzing their responses for readability and output quality, validated by plastic surgeons. Readability was measured with the Flesch-Kincaid, Flesch reading ease scores, and Coleman-Liau Index, while reliability was evaluated using the DISCERN score. A paired two-tailed t-test (p<0.05) compared the statistical significance of these metrics and the time taken to generate operation reports and discharge summaries against the authors' results.
RESULTS: Table 3 shows statistically significant differences in readability between ChatGPT-4o and Claude across all metrics, while ChatGPT-4 and Claude differ significantly in the Flesch reading ease and Coleman-Liau indices. Table 6 reveals extremely low p-values across BL, IS, and MM for all models, with Claude consistently outperforming both ChatGPT-4 and ChatGPT-4o. Additionally, Claude generated documents the fastest, completing tasks in approximately 10 to 14 s. These results suggest that Claude not only excels in readability but also demonstrates superior reliability and speed, making it an efficient choice for practical applications.
CONCLUSION: The study highlights the importance of selecting appropriate LLMs for clinical use. Integrating these LLMs can streamline healthcare documentation, improve efficiency, and enhance patient outcomes through clearer communication and more accurate medical reports.
LEVEL OF EVIDENCE V: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

Keywords:  AI in health care; ChatGPT; Claude; Large language models; Medical documentation

DOI:  https://doi.org/10.1007/s00266-025-04842-8
Ophthalmologica. 2025 Apr 17. 1-3

Readability of AI & Ophthalmologist Responses to Patient Surgery Queries: Comment.

Hinpetch Daungsupawong, Viroj Wiwanitkit.

DOI: https://doi.org/10.1159/000545970
J Commun Healthc. 2025 Apr 14. 1-12

A comparative analysis of CDC and AI-generated health information using computer-aided text analysis.

Anna Young, Foluke Omosun.

   BACKGROUND: AI-generated content is easy to access. Members of the public use it as an alternative or to supplement official sources, such as the Centers for Disease Control and Prevention (CDC). However, the quality and reliability of AI-generated health information is questionable. This study aims to understand how AI-generated health information differs from that provided by the CDC, particularly in terms of sentiment, readability, and overall quality. Language expectancy theory serves as a framework and offers insights into how people's expectations of message content from different sources can influence perceived credibility and persuasiveness of such information.
METHODS: Computer-aided text analysis was used to analyze 20 text entries from the CDC and 20 entries generated by ChatGPT 3.5. Content analysis utilizing human coders was used to assess the quality of information.
RESULTS: ChatGPT used more negative sentiments, particularly words associated with anger, sadness, and disgust. The CDC's health messages were significantly easier to read than those generated by ChatGPT. Furthermore, ChatGPT's responses required a higher reading grade level. In terms of quality, the CDC's information was a little higher quality than that of ChatGPT, with significant differences in DISCERN scores.
CONCLUSION: Public health professionals need to educate the general public about the complexity and quality of AI-generated health information. Health literacy programs should address topics about quality and readability of AI-generated content. Other recommendations for using AI-generated health information are provided.

Keywords:  Health literacy; artificial intelligence; health communication; public health informatics; sentiment analysis

DOI:  https://doi.org/10.1080/17538068.2025.2487378
Cureus. 2025 Mar;17(3): e80771

Comparative Analysis of ChatGPT and Google Gemini in Generating Patient Educational Resources on Cardiac Health: A Focus on Exercise-Induced Arrhythmia, Sleep Habits, and Dietary Habits.

Nithin Karnan, Sumaiya Fatima, Palwasha Nasir, Lovekumar Vala, Rutva Jani, Nahir Montserrat Moyano.

   INTRODUCTION: Patient education is crucial in cardiovascular health, aiding in shared decision-making and improving adherence to treatments. Artificial intelligence (AI) tools, including ChatGPT (OpenAI, San Francisco, CA) and Google Gemini (Google LLC, Mountain View, CA), are revolutionizing patient education by providing personalized, round-the-clock access to information, enhancing engagement, and improving health literacy. The paper aimed to compare the responses generated by ChatGPT and Google Gemini for creating patient education guides on exercise-induced arrhythmia, sleep habits and cardiac health, and "dietary habits and cardiac health.
METHODOLOGY: A comparative observational study was conducted evaluating three AI-generated guides: "exercise-induced arrhythmia," "sleep habits and cardiac health," and "dietary habits and cardiac health," using ChatGPT and Google Gemini. Responses were evaluated for word count, sentence count, grade level, ease score, and readability using the Flesch-Kincaid calculator and QuillBot (QuillBot, Chicago, IL) plagiarism tool for similarity score. Reliability was assessed with the modified DISCERN score. Statistical analysis was conducted using R version 4.3.2 (The R Core Team, R Foundation for Statistical Computing, Vienna, Austria).
RESULTS: ChatGPT-generated responses had an overall higher average word count when compared to Google Gemini; however, the difference was not statistically significant (p = 0.2817). Google Gemini scored higher on ease of understanding, though this difference was also not significant (p = 0.7244). There were no significant differences in sentence count or average words per sentence. ChatGPT tended to produce more complex content for certain topics, whereas Google Gemini's responses were generally easier to read. Similarity scores were higher for ChatGPT across all topics, while reliability scores varied by topic, with Google Gemini performing better for exercise-induced arrhythmia and ChatGPT for sleep habits and cardiac health.
CONCLUSIONS: The study found no significant difference in ease score, grade score, and reliability between AI-generated responses for a cardiology disorders brochure. Future research should explore AI techniques across various disorders, ensuring up-to-date and reliable public information.

Keywords:  artificial intelligence; cardiovascular health; chatgpt; google gemini; patient education guide

DOI:  https://doi.org/10.7759/cureus.80771
BMC Oral Health. 2025 Apr 15. 25(1): 574

Performance of artificial intelligence chatbots in responding to the frequently asked questions of patients regarding dental prostheses.

Hossein Esmailpour, Vanya Rasaie, Yasamin Babaee Hemmati, Mehran Falahchai.

   BACKGROUND: Artificial intelligence (AI) chatbots are increasingly used in healthcare to address patient questions by providing personalized responses. Evaluating their performance is essential to ensure their reliability. This study aimed to assess the performance of three AI chatbots in responding to the frequently asked questions (FAQs) of patients regarding dental prostheses.
METHODS: Thirty-one frequently asked questions (FAQs) were collected from accredited organizations' websites and the "People Also Ask" feature of Google, focusing on removable and fixed prosthodontics. Two board-certified prosthodontists evaluated response quality using the modified Global Quality Score (GQS) on a 5-point Likert scale. Inter-examiner agreement was assessed using weighted kappa. Readability was measured using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) indices. Statistical analyses were performed using repeated measures ANOVA and Friedman test, with Bonferroni correction for pairwise comparisons (α = 0.05).
RESULTS: The inter-examiner agreement was good. Among the chatbots, Google Gemini had the highest quality score (4.58 ± 0.50), significantly outperforming Microsoft Copilot (3.87 ± 0.89) (P =.004). Readability analysis showed ChatGPT (10.45 ± 1.26) produced significantly more complex responses compared to Gemini (7.82 ± 1.19) and Copilot (8.38 ± 1.59) (P <.001). FRE scores indicated that ChatGPT's responses were categorized as fairly difficult (53.05 ± 7.16), while Gemini's responses were in plain English (64.94 ± 7.29), with a significant difference between them (P <.001).
CONCLUSIONS: AI chatbots show great potential in answering patient inquiries about dental prostheses. However, improvements are needed to enhance their effectiveness as patient education tools.

Keywords:  Artificial intelligence; Health literacy; Natural Language processing; Patient education as topic; Prosthodontics

DOI:  https://doi.org/10.1186/s12903-025-05965-9
Hand (N Y). 2025 Apr 12. 15589447251329584

Assessing the Usability of ChatGPT Responses Compared to Other Online Information in Hand Surgery.

Ophelie Z Lavoie-Gagne, Oscar Y Shen, Neal C Chen, Abhiram R Bhashyam.

   BACKGROUND: ChatGPT is a natural language processing tool with potential to increase accessibility of health information. This study aimed to: (1) assess usability of online medical information for hand surgery topics; and (2) evaluate the influence of medical consensus.
METHODS: Three phrases were posed 20 times each to Google, ChatGPT-3.5, and ChatGPT-4.0: "What is the cause of carpal tunnel syndrome?" (high consensus), "What is the cause of tennis elbow?" (moderate consensus), and "Platelet-rich plasma for thumb arthritis?" (low consensus). Readability was assessed by grade level while reliability and accuracy were scored based on predetermined rubrics. Scores were compared via Mann-Whitney U tests with alpha set to .05.
RESULTS: Google responses had superior readability for moderate-high consensus topics (P < .0001) with an average eighth-grade reading level compared to college sophomore level for ChatGPT. Low consensus topics had poor readability throughout. ChatGPT-4 responses had similar reliability but significantly inferior readability to ChatGPT-3.5 for low medical consensus topics (P < .01). There was no significant difference in accuracy between sources. ChatGPT-4 and Google had differing coverage of cause of disease (P < .05) and procedure details/efficacy/alternatives (P < .05) with similar coverage of anatomy and pathophysiology.
CONCLUSIONS: Compared to Google, ChatGPT does not provide readable responses when providing reliable medical information. While patients can modulate ChatGPT readability with prompt engineering, this requires insight into their health literacy and is an additional barrier to accessing medical information. Medical consensus influences usability of online medical information for both Google and ChatGPT. Providers should remain aware of ChatGPT limitations in distributing medical information.

Keywords:  ChatGPT; artificial intelligence; hand surgery; large language model; medical information

DOI:  https://doi.org/10.1177/15589447251329584
JMIR Cancer. 2025 Apr 16. 11 e63677

Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.

Ana Grilo, Catarina Marques, Maria Corte-Real, Elisabete Carolino, Marco Caetano.

   Background: Patients frequently resort to the internet to access information about cancer. However, these websites often lack content accuracy and readability. Recently, ChatGPT, an artificial intelligence-powered chatbot, has signified a potential paradigm shift in how patients with cancer can access vast amounts of medical information, including insights into radiotherapy. However, the quality of the information provided by ChatGPT remains unclear. This is particularly significant given the general public's limited knowledge of this treatment and concerns about its possible side effects. Furthermore, evaluating the quality of responses is crucial, as misinformation can foster a false sense of knowledge and security, lead to noncompliance, and result in delays in receiving appropriate treatment.
Objective: This study aims to evaluate the quality and reliability of ChatGPT's responses to common patient queries about radiotherapy, comparing the performance of ChatGPT's two versions: GPT-3.5 and GPT-4.
Methods: We selected 40 commonly asked radiotherapy questions and entered the queries in both versions of ChatGPT. Response quality and reliability were evaluated by 16 radiotherapy experts using the General Quality Score (GQS), a 5-point Likert scale, with the median GQS determined based on the experts' ratings. Consistency and similarity of responses were assessed using the cosine similarity score, which ranges from 0 (complete dissimilarity) to 1 (complete similarity). Readability was analyzed using the Flesch Reading Ease Score, ranging from 0 to 100, and the Flesch-Kincaid Grade Level, reflecting the average number of years of education required for comprehension. Statistical analyses were performed using the Mann-Whitney test and effect size, with results deemed significant at a 5% level (P=.05). To assess agreement between experts, Krippendorff α and Fleiss κ were used.
Results: GPT-4 demonstrated superior performance, with a higher GQS and a lower number of scores of 1 and 2, compared to GPT-3.5. The Mann-Whitney test revealed statistically significant differences in some questions, with GPT-4 generally receiving higher ratings. The median (IQR) cosine similarity score indicated substantial similarity (0.81, IQR 0.05) and consistency in the responses of both versions (GPT-3.5: 0.85, IQR 0.04; GPT-4: 0.83, IQR 0.04). Readability scores for both versions were considered college level, with GPT-4 scoring slightly better in the Flesch Reading Ease Score (34.61) and Flesch-Kincaid Grade Level (12.32) compared to GPT-3.5 (32.98 and 13.32, respectively). Responses by both versions were deemed challenging for the general public.
Conclusions: Both GPT-3.5 and GPT-4 demonstrated having the capability to address radiotherapy concepts, with GPT-4 showing superior performance. However, both models present readability challenges for the general population. Although ChatGPT demonstrates potential as a valuable resource for addressing common patient queries related to radiotherapy, it is imperative to acknowledge its limitations, including the risks of misinformation and readability issues. In addition, its implementation should be supported by strategies to enhance accessibility and readability.

Keywords:  ChatGPT; OpenAI; accuracy; artificial intelligence; cancer awareness; chat generative pretrained transformer; chatbot; health information; internet access; large language model; natural language processing; patient information; patient query; patients with cancer; quality; radiotherapy; readability

DOI:  https://doi.org/10.2196/63677
BJA Open. 2025 Jun;14 100388

The readability of online English and Spanish patient education materials on anaesthesia for orthopaedic surgery.

Mariana Restrepo, Brocha Z Stern, Garrett W Burnett, Chang Park, Jashvant Poeran.

   Background: With the increasing utilisation of regional anaesthesia for orthopaedic procedures, it is imperative that related online patient education materials (PEMs) be easily retrievable, comprehensive, and readable by the general population, irrespective of the language they are written in. Therefore, we compared the readability levels of online PEMs available in English and Spanish for anaesthesia related to total hip, knee, and shoulder joint replacements.
Methods: Six pairs (English and Spanish) of search terms were entered into Google and used to identify relevant online PEMs. Results for English search terms were analysed for readability using the Flesch Reading Ease (FRE), Fry Graph (FG), Simple Measures of Gobbledygook (SMOG) Index, and Gunning Fog Index scores, and for Spanish search terms using the Fernandez-Huerta Reading Ease (FHRE), the Gilliam-Peña-Mountain Grade Level (GPMGL), the Spanish SMOG (SOL), and Indice de Legibilidad de Flesch-Szigriszt (INFLESZ) scores. Scores were compared between languages if the Spanish-language calculator was a validated adaptation of the English one (FRE vs FHRE; FG vs GPMGL; SMOG vs SOL).
Results: Overall, 180 and 146 relevant websites were retrieved across all six English language and Spanish-language search terms, respectively. Generally, the FRE and FG scores for the English search results corresponded to college-level material and the SMOG score reflected a 10th grade reading level. In contrast, the FHRE, GPMGL, and SOL scores for Spanish search results corresponded to 10th-12th grade reading levels, and the INFLESZ to 'Difficult'/'Normal' levels.
Conclusions: These findings reinforce past literature describing English and Spanish-language orthopaedic anaesthesia PEMs being written at far higher reading levels than the recommended 5th to 8th grade reading level. Interestingly, English PEMs were less readable than their Spanish counterparts, yet, they were more available than related Spanish PEMs overall.

Keywords:  neuraxial anaesthesia; orthopaedic surgery; patient education materials; readability analysis; regional anaesthesia

DOI:  https://doi.org/10.1016/j.bjao.2025.100388
Medicine (Baltimore). 2025 Apr 11. 104(15): e42135

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.

Christoph A Stephenson-Moe, Benjamin J Behers, Rebecca M Gibons, Brett M Behers, Laura De Jesus Herrera, Djhemson Anneaud, Manuel A Rosario, Caroline N Wojtas, Samantha Bhambrah, Karen M Hamad.

  Artificial intelligence (AI) and the introduction of Large Language Model (LLM) chatbots have become a common source of patient inquiry in healthcare. The quality and readability of AI-generated patient education materials (PEM) is the subject of many studies across multiple medical topics. Most demonstrate poor readability and acceptable quality. However, an area yet to be investigated is chemotherapy-induced cardiotoxicity. This study seeks to assess the quality and readability of chatbot created PEM relative to chemotherapy-induced cardiotoxicity. We conducted an observational cross-sectional study in August 2024 by asking 10 questions to 4 chatbots: ChatGPT, Microsoft Copilot (Copilot), Google Gemini (Gemini), and Meta AI (Meta). The generated material was assessed for readability using 7 tools: Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), Simple Measure of Gobbledygook (SMOG) Index, Automated Readability Index (ARI), and FORCAST Grade Level. Quality was assessed using modified versions of 2 validated tools: the Patient Education Materials Assessment Tool (PEMAT), which outputs a 0% to 100% score, and DISCERN, a 1 (unsatisfactory) to 5 (highly satisfactory) scoring system. Descriptive statistics were used to evaluate performance and compare chatbots amongst each other. Mean reading grade level (RGL) across all chatbots was 13.7. Calculated RGLs for ChatGPT, Copilot, Gemini and Meta were 14.2, 14.0, 12.5, 14.2, respectively. Mean DISCERN scores across the chatbots was 4.2. DISCERN scores for ChatGPT, Copilot, Gemini, and Meta were 4.2, 4.3, 4.2, and 3.9, respectively. Median PEMAT scores for understandability and actionability were 91.7% and 75%, respectively. Understandability and actionability scores for ChatGPT, Copilot, Gemini, and Meta were 100% and 75%, 91.7% and 75%, 90.9% and 75%, and 91.7% and 50%, respectively. AI chatbots produce high quality PEM with poor readability. We do not discourage using chatbots to create PEM but recommend cautioning patients about their readability concerns. AI chatbots are not an alternative to a healthcare provider. Furthermore, there is no consensus on which chatbots create the highest quality PEM. Future studies are needed to assess the effectiveness of AI chatbots in providing PEM to patients and how the capabilities of AI chatbots are changing over time.

Keywords:  artificial intelligence; chatbots; chemotherapy cardiotoxicity; patient education materials

DOI:  https://doi.org/10.1097/MD.0000000000042135
J Craniofac Surg. 2025 Apr 18.

Disparities in the Readability and Quality of Online Patient Education Materials for Neurotoxin and Surgical Treatment of Chronic Migraine.

Anitesh Bajaj, Gabrielle C Rodriguez, Nikhil Sriram, Kathryn R Reisner, Parul Rai, Kristof S Gutowski, Emily George, Jason Zhang, Arun K Gosain.

  The present study aims to evaluate the readability, content quality, and technical quality of online patient educational materials related to surgical intervention and neurotoxin treatment for chronic migraines. An online search using 10 search terms that corresponded to "surgical deactivation" or "neurotoxin treatment" was conducted. For each search term, the first 20 unique results were screened for online patient educational materials related to migraine treatment. Readability, content quality, and technical quality were assessed. Website-specific characteristics, including discussion of insurance coverage, photos, and videos were recorded. Analyses were performed across website types and between broad categories of search terms. Overall, 127 online patient educational materials were included (52% academic/hospital, 29.1% online health reference, 13.4% private practice, and 5.5% other). The average reading grade level was 10.8, which was found to be significantly higher than the AMA/NIH-recommended sixth-grade benchmark (P<0.001). Across website types, academic/hospital websites had a higher reading level than online health reference websites (11.5 vs. 9.7, P<0.001). Online health reference websites had significantly higher content quality scores than both academic/hospital and private practice websites (P<0.001). Notably, online patient educational materials within the "neurotoxin treatment" category were found to have a lower reading grade level (10.1 vs. 11.4) and an increased content quality score compared with those categorized within "surgical deactivation" (P<0.05). Online patient educational materials had a significantly higher reading level than the recommended sixth-grade level. Further efforts must be made to ensure online patient educational materials related to treatments for chronic migraines are accessible.

Keywords:  Health literacy; migraine surgery; neurotoxin; online sources; readability

DOI:  https://doi.org/10.1097/SCS.0000000000011274
Spine Deform. 2025 Apr 15.

Are we being forthright with the patients about vertebral body tethering? Quality, contemporaneity, and readability analysis of the online content about vertebral body tethering.

Rajul Gupta, Aakanksha Sriwastwa, Saral J Patel, Neal Taliwal, Alvin C Jones, Peter F Sturm, Viral V Jain.

   PURPOSE: The majority of patients refer to online patient education content before elective surgeries, including Vertebral Body Tethering (VBT). The purpose of this study was to evaluate the quality, contemporaneity, and readability of patient information web pages across different sources (teaching hospital, private HCF, commercial/news, and non-profit organization) on VBT.
METHODS: The search results from Google and Bing were analyzed using a systematic approach, excluding peer-reviewed articles, insurance policy documents, and videos. Forty-seven web pages were reviewed for quality based on preoperative, operative, and postoperative information, alongside compliance with Journal of American Medical Association (JAMA) benchmark criteria. The web page content was assessed using a contemporaneity score, which evaluated the inclusion of the latest research. Readability was assessed using the Flesch-Kincaid Grade level and Gunning-Fog Index.
RESULTS: The overall mean quality score, JAMA score, and contemporaneity scores were 7.63 (95% CI 6.63-8.64) out of 16, one (95% CI 0.68-1.32) out of four, and 0.61 (95% CI 0.33-0.9) out of five, respectively. The mean Flesch-Kincaid grade level and Gunning-Fog index were 11.7 (95% CI 10.88-12.55) and 14.94 (95% CI 14.12-15.75), respectively. Higher Quality scores also correlated with better Flesch-Kincaid and Gunning-Fox readability scores (Quality score-Flesch-Kincaid grade level: ρ = - 0.38, p = 0.0074; Quality score-Gunning-Fog index: ρ = - 0.354, p = 0.0161).
CONCLUSION: Existing patient education material contains limited and fragmentary information, lacks essential details, does not reflect the current limitations of VBT, and is written at a much advanced reading level than recommended. The material requires thorough revision, given that VBT is a relatively new surgical procedure with evolving indications and outcomes.

Keywords:  Online resource quality; Readability; Scoliosis; Spine deformity; Vertebral body tethering

DOI:  https://doi.org/10.1007/s43390-025-01082-3
Strategies Trauma Limb Reconstr. 2024 Sep-Dec;19(3):19(3): 131-134

Osseointegration in Limb Replacement Surgery: A Quality and Readability Analysis of Information on the Internet.

Ciaran Stanley, Gerard A Sheridan, Brian J Page, Jason S Hoellwarth, Taylor J Reif, S Robert Rozbruch.

   Introduction: Osseointegrated limb surgery for amputees is becoming increasingly common due to its advantages over traditional socket prostheses. Institutions worldwide are developing services to offer this option to suitable patients and are promoting these services online. This study aims to assess the quality and readability of the online information provided.
Methods: The three most popular search engines using the English language were searched, and the first 40 websites from each search were included. All included websites were assessed for reading using the Flesch-Kincaid Grade Level (FKGL) (equivalent to the United States reading grade level), the Gunning Fog Index (G-Fog) and the Flesch Reading Ease (FRE). A quality assessment was conducted using the Journal of the American Medical Association (JAMA) benchmark criteria and the Health on the Net Certificate (HONC).
Results: The initial search yielded 9,985,000 websites, of which the top 120 were assessed. Of the 23 websites in the final analysis, none had a reading grade level of sixth grade or lower. The lowest FKGL reading grade was 9.7/18, with a mean of 11.3/18 (95% CI 10.72-11.93), which corresponds with the reading level of a high school junior. The mean FRE was 38.83/100 (95% CI 36.16-41.49). The mean G-Fog score was 12.57/20 (95% CI 11.71-13.42) (representing reading levels of a college freshman). None of the 23 websites had HON certification. The mean JAMA score was 1.76 out of 4 (95% CI 1.34-2.18), meaning the information was of low to moderate quality. The mean Hospital for Special Surgery Osseointegration Information Score was 5.70 out of 16 (95% CI 4.33-7.06), indicating low quality.
Discussion: A significant amount of online information is available about osseointegrated limb replacement surgery (OLRS). However, much of this content is written at a level that many patients may find difficult to understand, limiting usefulness. Additionally, the overall quality of the available information is generally low. Improving both the readability and quality of this information is essential to ensure patients have access to accurate and accessible details.
How to cite this article: Stanley C, Sheridan GA, Page BJ, et al. Osseointegration in Limb Replacement Surgery: A Quality and Readability Analysis of Information on the Internet. Strategies Trauma Limb Reconstr 2024;19(3):131-134.

Keywords:  Amputation; Amputee; Complication; Efficiency; Lower limb amputee; Osseointegrated implant; Osseointegration; Transcutaneous osseointegration

DOI:  https://doi.org/10.5005/jp-journals-10080-1632
Cleft Palate Craniofac J. 2025 Apr 18. 10556656251335176

Readability of Online Patient Education Materials for Cleft Care: Comment.

Hinpetch Daungsupawong, Viroj Wiwanitkit.

DOI: https://doi.org/10.1177/10556656251335176
Integr Cancer Ther. 2025 Jan-Dec;24:24 15347354251331461

YouTube Videos as an Information Source About Aerobic Exercise in Rehabilitation of Lung Cancer.

Deniz Kocamaz, Arzu Demircioğlu Karagöz, Songul Atasavun Uysal.

   OBJECTIVE: The internet has become a preferred source for obtaining information about diagnostic and treatment methods related to health issues. This study aims to investigate whether aerobic exercise videos on the YouTube platform are an excellent source for lung cancer patients.
METHODS: The keywords, "lung cancer and exercise," "lung cancer and physical activity," and "lung cancer and rehabilitation" were used to identify videos on YouTube on 27 to 28 May 2023. We recorded the characteristics of the videos, including the number of views, duration, days since upload, and the number of likes and dislikes. The Global Quality Scale and the modified DISCERN questionnaire were used to assess the quality and reliability of videos.
RESULTS: 150 videos were evaluated. 12 of 150 videos met the eligibility criteria. Lung cancer and aerobic exercise in rehabilitation videos were most commonly uploaded by health organizations and patients. Videos had a median of 3300 views. We assessed videos for user-focused video quality using the DISCERN instrument and found that the average total score was 3 (range 2-5). Inter-observer agreement was 0.89 and 0.91 for DISCERN and GQS scored, respectively.
SIGNIFICANCE OF RESULTS: The results show that YouTube can be a preferred, easy, and inexpensive way to access aerobic exercise modalities, which are the basic rehabilitation steps for lung cancer patients. Experts recommend increasing the number of high-quality videos explaining the exercises. To fill this gap, healthcare professionals and organizations can take an active role in planning, producing, or ensuring reliable content. Collaborations with medical institutions and physiotherapists could further ensure that patients have access to accurate and effective exercise guidance, ultimately improving rehabilitation outcomes.

Keywords:  YouTube; exercise; lung cancer; physical activity; rehabilitation

DOI:  https://doi.org/10.1177/15347354251331461
Indian J Otolaryngol Head Neck Surg. 2025 Apr;77(4): 1760-1767

A Content Analysis of YouTube Videos on Allergic Rhinitis Management.

Sanjay Kumar, Anghusman Dutta, Srujan Vallur, Ran Singh.

   Introduction: This study evaluates the quality, reliability, and usability of YouTube videos on allergic rhinitis (AR) management using validated assessment tools to determine their alignment with evidence-based practices and their potential impact on patient understanding.
Methods: A cross-sectional analysis was conducted on 50 YouTube videos identified using predefined keywords. Videos were assessed using DISCERN, JAMA, and PEMAT tools for quality, authorship, and usability, respectively. Engagement metrics such as views, likes, and comments were recorded. Videos were categorized by source (medical professionals, influencers, pharmaceutical companies, and alternative therapists) and format (testimonial, educational, promotional). Statistical analyses, including Spearman's correlation and chi-square tests, were performed to explore relationships between video quality and engagement.
Results: The mean DISCERN, JAMA, and PEMAT scores across all videos were 3.1/5, 2.7/4, and 72%, respectively. Videos by medical professionals scored the highest in quality but had lower engagement, while influencer videos attracted the highest views (mean: 40,000) but scored poorly on quality metrics. Evidence-based treatments were discussed in 72% of videos, yet only 20% mentioned immunotherapy. Misinformation was identified in 36% of videos, with exaggerated claims about "natural cures" being the most common theme. A negative correlation was found between DISCERN scores and views (r = -0.42, p = 0.03), indicating that higher-quality videos received fewer views.
Conclusions: YouTube videos on AR management exhibit significant variability in quality and engagement, highlighting a trade-off between reliability and popularity. Enhancing the accessibility and visibility of evidence-based content requires collaboration between medical professionals and content creators. Addressing misinformation and improving patient education through digital platforms is essential for optimizing health outcomes.
Supplementary Information: The online version contains supplementary material available at 10.1007/s12070-025-05396-6.

Keywords:  Allergic rhinitis; Immunotherapy; Misinformation; Online health information; Patient education; YouTube content analysis

DOI:  https://doi.org/10.1007/s12070-025-05396-6
BMC Med Educ. 2025 Apr 16. 25(1): 549

YouTube as a source of education on piriformis injection: a content, quality, and reliability analysis.

Yucel Olgun.

   BACKGROUND: Piriformis injection is commonly used to diagnose and relieve piriformis syndrome.YouTube has become a frequently accessed platform for healthcare professionals seeking procedural information.However, the lack of studies on the quality and reliability of medical content on YouTube raises major concerns, suggesting that the platform cannot be trusted as a source of medical information, particularly in terms of reliability and content quality.This study aims to assess the educational value and quality of YouTube videos on piriformis injections and is the first to specifically evaluate these aspects.
METHODS: A keyword search for "piriformis injection" was conducted on YouTube in December 2024, ensuring search history was cleared before the review.The top 100 videos were screened, and data including subscriber count, views, likes, dislikes, comments, video duration, upload date, like ratio, view ratio, video power index, injection guidance method, and video source were collected. Two pain medicine specialists independently evaluated the videos using the modified DISCERN, JAMA benchmark criteria, and Global Quality Scale (GQS) to assess reliability and quality.
RESULTS: Of the 100 screened videos, 24 met the inclusion criteria for analysis. Notably, none of the videos attained maximum scores across all three evaluation criteria. According to the modified DISCERN score, 58% of the videos had low reliability, while 50% had low quality according to the GQS score. Video scores were consistent across different sources. Positive correlations were observed between the number of views, likes, dislikes, and comments. Additionally, strong correlations were identified between GQS, DISCERN, and JAMA scores.
CONCLUSION: The educational value of YouTube videos cannot be evaluated based on a single factor, such as the source of the video or its popularity metrics. Relying on YouTube as the sole source can likely lead to misinformation, so cross-verifying the information in these videos is vital. While YouTube can be a supplementary resource, it should not replace primary educational materials. Therefore, healthcare professionals and organizations should be encouraged to produce high-quality, peer-reviewed educational content to ensure the quality of information available on such platforms.

Keywords:  Online medical education; Piriformis injection; Quality; YouTube

DOI:  https://doi.org/10.1186/s12909-025-07154-2
Sci Rep. 2025 Apr 17. 15(1): 13323

Quality of cerebral palsy videos on Chinese social media platforms.

Wenjie He, Dongning Tang, Ya Jin, Wenyan Zhang, Yunyun Kang, Qing Xia.

  A significant research gap exists in evaluating the prevalence and quality of Chinese videos depicting CP on domestic social media platforms. In contrast to studies that focus on online video content concerning CP on YouTube, CP videos on YouTube are largely inaccessible to average citizens in mainland China. This disparity underscores the need for further investigation into the availability and nature of CP videos specifically on Chinese social media platforms. To assess the reliability and quality of short videos related to cerebral palsy (CP) on Chinese social media platforms. The present cross-sectional study examined 344 videos about CP from popular Chinese social media platforms, including TikTok, Kwai, Weibo, Bilibili, and RED. The analysis of these videos involved a detailed assessment of their sources, content, and characteristics. Additionally, quantitative scoring tools such as journal of the American medical association (JAMA) benchmarks, gobal quality score (GQS), and DISCERN were utilized to evaluate video quality. Furthermore, the potential relationship between video quality and various attributes such as duration, number of likes, and comments was explored and their impact on the quality of information presented in the videos was analyzed. The average duration of the 344 videos was 92.12 s (SD 105.69). CP rehabilitation training videos comprised 45.64% of the total, followed by expert-contributed videos at 40.70%. Mean scores for JAMA, GQS, and DISCERN were 1.62 (SD 0.60), 2.05 (SD 0.99), and 1.26 (SD 1.26) respectively. RED had the lowest average scores. Videos focusing on disease knowledge scored highest on JAMA and GQS scales. Experts achieved significantly higher GQS and DISCERN scores compared to health-related institutions and amateurs. Spearman correlation analysis revealed a strong positive correlation between likes and comments (r = .0.87, P < .0.001). Enhancing the management of medical content is crucial to address the compromised reliability of Chinese online short videos providing information to families of CP patients. Improving content professionalism and accuracy ensures users access genuinely valuable information.

Keywords:  Cerebral palsy; Chinese short videos; Chinese social media; Healthcare information

DOI:  https://doi.org/10.1038/s41598-024-84845-8
BMC Public Health. 2025 Apr 17. 25(1): 1446

YouTube and TikTok as sources of information on acute pancreatitis: a content and quality analysis.

Yu-Chen Zhu, Ren-Chun Du, Jie Gao, Nong-Hua Lu, Yin Zhu, Yi Hu.

   BACKGROUND: As one of the leading causes of hospitalization and huge medical expenses for gastrointestinal disorders, morbidity and mortality of acute pancreatitis continue to rise globally. Short videos are an important medium for population to achieve information about acute pancreatitis. We aimed to evaluate the content and quality of acute pancreatitis-related videos on TikTok and YouTube.
METHOD: A search was performed on the TikTok and YouTube platforms using the keyword "Acute pancreatitis". The sources of the videos were categorized as academic institutions, national institutions, physicians, healthcare professionals other than physicians, health information websites and others. The Journal of American Medical Association (JAMA), Global Quality Scale (GQS), and modified DISCERN scores were used to assess the quality of the included videos.
RESULT: A total of 75 TikTok videos and 79 YouTube videos were included and analyzed. Regarding modified DISCERN scale, the videos from national institutions scored highest on TikTok (p = 0.020). As for YouTube, healthcare professionals other than physicians had the highest averaged score judged by GQS score and JAMA score (p = 0.016 for JAMA score, p = 0.020 for GQS score). The duration of the videos on TikTok are significantly shorter than that on YouTube (71.5 vs. 361, respectively; p < 0.01). The length of the video was associated with higher JAMA score and DISCERN score (p < 0.01, r = 0.635 and 0.207, respectively).
CONCLUSION: According to TikTok and YouTube, basic information about acute pancreatitis was the main presentation of the videos. We recommend that video producers extend the length of their videos appropriately to flesh out the content, and national institutions, physicians, and healthcare professionals other than physicians are all great resource of getting to know the acute pancreatitis better for viewers.

Keywords:  Acute pancreatitis; Cross-sectional study; Quality; TikTok; Video; YouTube

DOI:  https://doi.org/10.1186/s12889-025-22738-9
Transl Androl Urol. 2025 Mar 30. 14(3): 729-739

Video platforms and sexual healthcare in China: assessment of content on premature ejaculation.

Hao-Yang Lu, Rong-Hao Zhang, Xi Wen, Yi Wang, Ke-Qiu Wu, Zhong-Lai Li, Du Geon Moon, Wei-Fan Jiang.

   Background: Premature ejaculation (PE) is among the most commonly reported types of sexual dysfunction both globally and in China. Despite the growing popularity of online healthcare services and the greater availability of information in mainland China, there is a pressing need to assess the quality and reliability of PE-related content available online and address the potential impact of online misinformation. Thus, in this study, we assessed the quality of information regarding PE videos on the top video websites in China.
Methods: The top 10 video platforms in mainland China were searched using PE-related keywords for videos published as of July 2023. All available videos were examined for eligibility and reliability, and two reviewers independently evaluated the videos using Global Quality Score (GQS) scores for quality and DISCERN tools for their content reliability. All data were analyzed with SPSS software (IBM Corp.).
Results: Information on sexual medicine content related to PE was found to be available on just seven of the websites examined. From the 1,468 videos initially retrieved, 582 met the inclusion criteria. Of these, 319 videos (54.81%) were deemed reliable, while 263 (45.19%) were classified as unreliable. The agreement between the two urologists reviewing the videos and the intraclass correlation coefficient (ICC) were deemed acceptable. There were significant differences in the quality, reliability, source, presentation format, and themes of the videos.
Conclusions: The quality of resources on Chinese video platforms varies widely. Users seeking PE-related information should carefully select the appropriate platforms and opt for higher-quality videos. The existing participation of professional medical personnel was seen as insufficient, and joint efforts are needed to implement content review and the establishment of an evaluation framework.

Keywords:  Premature ejaculation (PE); online misinformation; sexual medicine; social media

DOI:  https://doi.org/10.21037/tau-2025-104
Br J Oral Maxillofac Surg. 2025 Mar 20. pii: S0266-4356(25)00054-3. [Epub ahead of print]

An evaluation of TikTok videos as a source of information for orthognathic surgery.

Jenna Pierse, Saoirse Kilgarriff, Conor M Bowe, David M McGoldrick.

  As social media platforms like TikTok become increasingly popular, patients are turning to them for information about surgical procedures, including orthognathic surgery. This study aimed to evaluate the content quality, reliability, and educational value of videos related to orthognathic surgery on TikTok. A total of 84 videos were analysed using two standardised assessment scales: the modified 5-point DISCERN scale and the Global Quality Score (GQS). These videos were sourced through search terms #JawSurgery and #OrthognathicSurgery, and the data were extracted using a web-scraping tool. The analysis revealed that TikTok videos on orthognathic surgery had a mean DISCERN score of 1.6/5 and a mean GQS of 2.2/5, indicating generally poor quality. Most videos were uploaded by patients (46/84), while 33/84 were by healthcare professionals, particularly maxillofacial surgeons and orthodontists. Despite the high viewership (246 million views across all videos), many videos were limited in content, with 52/84 not containing any information on the procedure, and only 4/84 mentioning risks beyond swelling and bruising. Notably, 82 of the 84 videos had patients interacting in the comments, seeking advice or sharing experiences. While TikTok serves as an important information platform, it also highlights a significant gap in reliable, comprehensive content. Healthcare professionals should direct patients to high quality sources of information that have been approved by them to mitigate the risks of misinformation and anxiety for prospective patients.

Keywords:  maxillofacial surgery; modified DISCERN; orthognathic surgery; social media platforms

DOI:  https://doi.org/10.1016/j.bjoms.2025.02.014
AIMS Public Health. 2025 ;12(1): 137-161

Descriptive analysis of TikTok content on vaccination in Arabic.

Malik Sallam, Kholoud Al-Mahzoum, Lujain Alkandari, Aisha Shabakouh, Asmaa Shabakouh, Abiar Ali, Fajer Alenezi, Muna Barakat.

  The extensive impact of social media on communication of public health information is a growing concern. This is particularly worrying in the context of vaccination. Thus, we investigated the quality of TikTok videos regarding vaccination in Arabic, with examination of the association of video source and content type with the information quality and video engagement metrics. The final sample comprised a total of 129 TikTok videos in Arabic posted between January 2021 and July 2024. Videos were categorized based on the source [healthcare professional (HCPs), lay individuals, media], and content type (COVID-19 vaccination, childhood vaccination, general vaccination, others). We utilized a miniaturized version of the DISCERN instrument (mini-DISCERN) scale to evaluate information quality by two independent raters and assessed video engagement metrics (Likes, Comments, Shares, and Saves). The results indicated a statistically significant discrepancy in information quality, with videos from HCPs and media outlets scoring higher on the mini-DISCERN scale compared to those from lay individuals [mean: (4.818 ± 0.726) vs. (4.053 ± 1.441) vs. (2.003 ± 1.640), P < 0.001]. The highest information quality was found for videos on childhood vaccination, whereas content on COVID-19 vaccination was rated significantly lower on mini-DISCERN [mean: (4.510 ± 1.269) vs. (2.542 ± 1.827), P < 0.001]. Videos with higher engagement metrics, particularly those from lay individuals, were negatively correlated with information quality. Linear regression analysis confirmed the significant influence of the creator background (β = -0.618, P < 0.001) and video topic (β = 0.179, P = 0.009) on information quality. This study highlights the critical role of content creator background and topic on the quality of vaccination-related information on TikTok in Arabic. We emphasize the need for stringent verification of TikTok content, especially from lay individuals, as videos with higher engagement metrics often contained lower-quality information regarding vaccination. We recommend enhanced support for content from HCPs and targeted digital literacy programs to combat vaccine misinformation on TikTok effectively.

Keywords:  content quality; health communication; misinformation; social media; vaccination

DOI:  https://doi.org/10.3934/publichealth.2025010
J Cardiovasc Nurs. 2025 Apr 15.

Video Clips of the Dietary Approaches to Stop Hypertension Diet on YouTube: A Social Media Content Analysis.

Juan-José Boté-Vericad, Sarah Gillaspie, Madelena Eifert, Jai Chopra, Nada Benajiba, Fatmah Almoayad, Elizabeth Dodge, Basil H Aboul-Enein.

   BACKGROUND: YouTube is among the most highly used internet video sharing platforms worldwide.
OBJECTIVE: The aim in this study was to conduct a social media content analysis of Dietary Approaches to Stop Hypertension (DASH) diet videos on YouTube.
METHODS: Specific search parameters were input into YouTube, and 101 videos were evaluated for quality and viewer exposure/engagement metrics independently by 3 content experts using the DISCERN instrument, a 16-item instrument designed to assess quality, reliability, and dependability of an online source. Scores were aggregated for analysis.
RESULTS: The majority (n = 69, 68.3%) of the videos were categorized as educational and came from a nonprofit source (n = 35, 34.7%). Although multiple speakers were the most frequent speaker type (n = 30, 29.7%), influencers/actors, dietitians, and healthcare providers were relatively equally distributed across the category (25.7%, 22.8%, and 21.8% respectively). Correlation analysis evaluating video's views, comments, and likes indicates that these video metrics are not correlated with video quality. However, the length of video was moderately and positively associated (P = .01) with the 4 DISCERN parameters of quality.
CONCLUSION: Study findings suggest that videos on DASH diet offered via YouTube could potentially be an inexpensive venue to promote healthful dietary practices and educate clients. Existing YouTube content on DASH diet demonstrates significant variability in quality ratings based on DISCERN. Providers should direct individuals to engage with high-quality educational DASH diet videos on YouTube that are created with qualified health professionals and are shared by reputable institutions.

Keywords:  DASH diet; YouTube; hypertension; internet; social media

DOI:  https://doi.org/10.1097/JCN.0000000000001216