bims-librar 2025-03-30 papers

bims-librar

Biomed News

on Biomedical librarianship

Issue of 2025–03–30
28 papers selected by
Thomas Krichel, Open Library Society

How Do Public Library Partnerships Impact Health? A Scoping Review.
Research on Training Competent Health Information Management Professionals: Based on A Survey of Field Experts.
The narrow search effect and how broadening search promotes belief updating.
Text mining for case report articles on "peritoneal dialysis" from PubMed database.
eHealth Literacy and Trust in Health Information Sources.
Public Trust in Different Sources of Information: Gaps in Rural Residents and Cancer Patients.
Spanish language version of the "Medical Quality Video Evaluation Tool" (MQ-VET): Cross-cultural AI-supported adaptation and validation study.
By any other name: searching for the right plasma nomenclature.
Correspondence Regarding "Readability of Rhinology-Related Patient Education Materials Generated by Artificial Intelligence-Based Chatbots".
[Internet resources on plantar fasciitis : German-language analysis of the information content of digital sourcess].
ChatGPT as a Source of Information for Kidney Transplant Recipients and Their Caregivers: Useful or not?
A Cross-Sectional Study Comparing Patient Information Guides for Amyotrophic Lateral Sclerosis, Myasthenia Gravis, and Guillain-Barré Syndrome Produced by ChatGPT-4 and Google Gemini 1.5.
Large Language Models' Responses to Spinal Cord Injury: A Comparative Study of Performance.
Assessing the readability of dermatological patient information leaflets generated by ChatGPT-4 and its associated plugins.
GPT-4 as a Source of Patient Information for Carpal Tunnel Surgery: A Comparative Analysis Against Google Web Search.
Evaluating the quality of medical content on YouTube using large language models.
Health consumers' emotional responses toward asthma videos on YouTube are influenced by time since posting, number of tags, subject of content and the emotional tone.
Evaluation of the Quality of Tympanoplasty Videos on YouTube Using the Ivory Grading System.
Evaluation of the Content, Reliability, and Quality of YouTube Videos on Surgical Hand Scrubbing.
Assessment of the Usefulness and Reliability of YouTube Videos on Postmortem Procedures.
YouTube Hindi Videos Proved to be a Valid and Reliable Source of Health Information: A Case Study of COVID-19 Vaccine.
YouTube as a source of information for stroke rehabilitation: a cross-sectional analysis of quality and reliability of videos.
Quality Assessment of YouTube Videos as a Source of Information on Ingrown Toenails.
Evaluating the Quality and Readability of Online Health Information on Snapping Hip Syndrome: A Cross-Sectional Analysis.
Assessment of YouTube videos on post-dural puncture headache: a cross-sectional study.
Evaluating the Quality and Reliability of YouTube Videos on Optic Neuritis: A Cross-Sectional Study Using Validated Scoring Systems.
Changes in Health Education Literacy After Structured Web-Based Education Versus Self-Directed Online Information Seeking in Patients Undergoing Carpal Tunnel Release Surgery: Nonrandomized, Controlled Study.
Health information-seeking behavior among users of traditional, complementary and integrative medicine (TCIM).

Health Promot Pract. 2025 Mar 23. 15248399251323901

How Do Public Library Partnerships Impact Health? A Scoping Review.

Noah Lenstra, Pam B DeGuzman, Rozalynd McConnaughy, Megan Weis.

  During the past decade, public libraries have been framed as key health promotion partners for everything from telemedicine to the opioid crisis. This study's goal was to evaluate the impacts of health promotion initiatives involving public libraries as collaborators. Using preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines, a search of three databases (PubMed, CINAHL Complete, and Library and information science abstracts (LISA)) for articles written in English referencing "health" and "public libraries" returned 985 unique citations, of which 67 papers were selected for review, based on the criteria of being peer-reviewed articles on health initiatives involving public libraries. All studies were published between 1957 and 2023, with 88% published in the 2000's, and 76% conducted in the United States. Most studies consisted of descriptive accounts of health promotion initiatives, with minimal reporting of outcome measures for the populations targeted. Better understanding the impacts of health promotion initiatives involving public libraries requires more rigorous assessment mechanisms, and the long-term success of these partnerships depends on stronger and sustained linkages between those working in health and those working in libraries, particularly public libraries.

Keywords:  community academic partnerships; community health promotion; community partnerships; health education; health promotion; libraries; public health; public library

DOI:  https://doi.org/10.1177/15248399251323901
Perspect Health Inf Manag. 2024 Summer-Fall;21(2):21(2): 1e

Research on Training Competent Health Information Management Professionals: Based on A Survey of Field Experts.

Hyunkyung Lee, Sangok Cho.

Keywords: education; empowerment; health information management
Proc Natl Acad Sci U S A. 2025 Apr;122(13): e2408175122

The narrow search effect and how broadening search promotes belief updating.

Eugina Leung, Oleg Urminsky.

  Information search platforms, from Google to AI-assisted search engines, have transformed information access but may fail to promote a shared factual foundation. We demonstrate that the combination of users' prior beliefs influencing their search terms and the narrow scope of search algorithms can limit belief updating from search. We test this "narrow search effect" across 21 studies (14 preregistered) using various topics (e.g., health, financial, societal, political) and platforms (e.g., Google, ChatGPT, AI-powered Bing, our custom-designed search engine and AI chatbot interfaces). We then test user-based and algorithm-based interventions to counter the "narrow search effect" and promote belief updating. Studies 1 to 5 show that users' prior beliefs influence the direction of the search terms, thereby generating narrow search results that limit belief updating. This effect persists across various domains (e.g., beliefs related to coronavirus, nuclear energy, gas prices, crime rates, bitcoin, caffeine, and general food or beverage health concerns; Studies 1a to 1b, 2a to 2g, 3, 4), platforms (e.g., Google-Studies 1a to 1b, 2a to 2g, 4, 5; ChatGPT, Study 3), and extends to consequential choices (Study 5). Studies 6 and 7 demonstrate the limited efficacy of prompting users to correct for the impact of narrow searches on their beliefs themselves. Using our custom-designed search engine and AI chatbot interfaces, Studies 8 and 9 show that modifying algorithms to provide broader results can encourage belief updating. These findings highlight the need for a behaviorally informed approach to the design of search algorithms.

Keywords:  algorithmic search; artificial intelligence; belief updating; confirmation bias

DOI:  https://doi.org/10.1073/pnas.2408175122
Ther Apher Dial. 2025 Mar 26.

Text mining for case report articles on "peritoneal dialysis" from PubMed database.

Kazuhiko Fukushima, Kenji Tsuji, Hiroyuki Nakanoh, Naruhiko Uchida, Soichiro Haraguchi, Shinji Kitamura, Jun Wada.

   INTRODUCTION: The number of published medical articles on peritoneal dialysis (PD) has been increasing, and efficiently selecting information from numerous articles can be difficult. In this study, we examined whether artificial intelligence (AI) text mining can be a good support for efficiently collecting PD information.
METHODS: We performed text mining and analyzed all the abstracts of case reports on PD in the PubMed database. In total, 3137 case reports with abstracts related to "peritoneal dialysis" published from 1970 to 2021 were identified.
RESULTS: A total of 280 347 relevant words were extracted from all the abstracts. Word frequency analysis, word dependency analysis, and word frequency transition analysis showed that peritonitis, encapsulating peritoneal sclerosis, and child have been important keywords. Theseanalyses not only reflected historical background but also anticipated future trends of PD study.
CONCLUSION: These suggest that text mining can be a good support for efficiently collecting PD information.

Keywords:  artificial intelligence; case reports; peritoneal dialysis; text mining

DOI:  https://doi.org/10.1111/1744-9987.70013
Healthcare (Basel). 2025 Mar 12. pii: 616. [Epub ahead of print]13(6):

eHealth Literacy and Trust in Health Information Sources.

Abdullah Alhewiti.

   INTRODUCTION: The spread of health-related information across the internet necessitates an evaluation of public eHealth literacy, trust in different health information sources, including healthcare providers, and how eHealth literacy is related to trust in different sources.
METHODS: 407 individuals participated in a web-based survey in the Tabuk region of Saudi Arabia. Univariate analysis was used to evaluate the relationships between eHealth literacy and demographic variables, and multiple linear regression was used to measure the relationship between eHealth literacy and trust in health information sources after adjustment for demographic factors.
RESULTS: The average eHealth literacy of the respondents was 27.17 out of 40. eHealth literacy levels were higher among females, younger age groups, those in the higher-education category, and those with a chronic disease or currently on medication. For 51.9% of participants, physicians and healthcare workers were their main source of health information, while 40% considered the internet their main source. None of the study participants perceived physicians and healthcare workers as untrustworthy, and social media was the least trusted source. eHealth literacy was not related to trust in physicians and health workers but was positively associated with trust in specialized health websites and negatively associated with trust in social media.
CONCLUSIONS: The findings suggest that the public tends to prefer and trust physicians and other healthcare workers as a primary source of health information, regardless of their eHealth literacy levels. A higher eHealth literacy level was associated with trust in specialized health websites and distrust in social media.

Keywords:  eHealth literacy; health information; healthcare providers; healthcare workers; internet; trust

DOI:  https://doi.org/10.3390/healthcare13060616
Healthcare (Basel). 2025 Mar 15. pii: 640. [Epub ahead of print]13(6):

Public Trust in Different Sources of Information: Gaps in Rural Residents and Cancer Patients.

Wei-Chen Lee, Emily M Kim, Elizabeth A Nemirovski, Sagar Kamprath, Meredith C Masel, Darpan I Patel.

   BACKGROUND/OBJECTIVES: Understanding health information-seeking behavior is critical in providing effective interventions and improving quality of life for patients, especially those facing complex diagnoses like cancer. The purpose of this study is to understand rural-urban differences in trust levels for various information sources and how trust may differ by cancer status (no cancer, newly diagnosed, survived for six and more years).
METHODS: We examined 5775 responses from the 2022 Health Information National Trends Survey®. Using the component analysis, eight sources of information were classified into three domains: structured (doctor, government, scientist, and charity), less structured (family and religion), and semi-structured (health system and social media). Respondents answered questions on a scale of 1-4. Weighted linear regression models were constructed to examine trust level in three domains by rural residency and cancer status, while adjusting for demographic and socioeconomic status.
RESULTS: Urban patients reported higher trust in more structured sources of information (2.999 > 2.873, p = 0.005) whereas rural counterparts reported higher trust in less structured sources of information (2.241 > 2.153, p = 0.012). After adjusting for covariates, urban respondents with cancer are more likely to trust doctors (Coeff. = 0.163, p < 0.001) than those without cancer. Rural respondents with cancer are less likely to trust charities (Coeff. = -0.357, p < 0.01) and scientists (Coeff. = -0.374, p < 0.05) than rural respondents without cancer.
CONCLUSIONS: Newly diagnosed cancer patients in rural areas are less likely to trust structured sources of information even after adjusting for all covariates. Additional studies about misinformation and disinformation being channeled through less structured sources of information are needed to prevent any delay in care among cancer patients, especially rural patients who are more likely to access these sources of information.

Keywords:  cancer; health information-seeking behavior; rural; source of information

DOI:  https://doi.org/10.3390/healthcare13060640
Sci Prog. 2025 Jan-Mar;108(1):108(1): 368504251327507

Spanish language version of the "Medical Quality Video Evaluation Tool" (MQ-VET): Cross-cultural AI-supported adaptation and validation study.

Alvaro Manuel Rodriguez-Rodriguez, Marta De la Fuente-Costa, Mario Escalera de la Riva, Borja Perez-Dominguez, Sergio Hernandez-Sanchez, Gustavo Paseiro-Ares, Fernando Ramos-Gomez, Jose Casaña-Granell, María Blanco-Diaz.

   BACKGROUND: The Medical Quality Video Evaluation Tool (MQ-VET) is a standardized instrument for assessing health-related video quality, yet it is only available in English. This study addresses the growing demand for a Spanish version to better support the increasing Spanish-speaking population seeking reliable digital health content.
OBJECTIVE: To adapt and validate the MQ-VET into Spanish, ensuring robust psychometric reliability and validity through rigorous cross-cultural adaptation methods, augmented by the integration of artificial intelligence (AI) tools.
MATERIALS AND METHODS: Following international guidelines, the MQ-VET was translated, back-translated, and reviewed by experts. AI-based tools were employed to refine linguistic and cultural accuracy. Psychometric properties were evaluated by 60 participants (30 healthcare and 30 nonhealthcare professionals), focusing on reliability, agreement, and concurrent validity with the DISCERN instrument.
RESULTS: The Spanish MQ-VET showed excellent reliability (Cronbach's alpha>0.90, ICC=0.81) and strong concurrent validity (Pearson r = 0.9435, Spearman r = 0.9482, p < 0.0001), alongside with a robust linear regression result (R²=0.8902). Bland-Altman analysis confirmed a robust agreement, and AI-driven tools performed the factorial analysis that revealed a clear three-factor structure explaining 81.1% of the variance.
CONCLUSIONS: The Spanish MQ-VET is a reliable and valid instrument for assessing the quality of health-related videos, applicable to both healthcare professionals and individuals outside the healthcare field. Leveraging AI-driven methodologies, it serves as a robust resource for enhancing digital health literacy and promoting critical appraisal of video content among Spanish-speaking populations.

Keywords:  Spanish-language tools; Validation studies; artificial intelligence; cross-cultural adaptation; health literacy; health videos evaluation; healthcare education; psychometric analysis

DOI:  https://doi.org/10.1177/00368504251327507
IEEE Trans Radiat Plasma Med Sci. 2025 Mar;9(3): 388-394

By any other name: searching for the right plasma nomenclature.

Caroline Corcoran, Rachel Bennett, Vandana Miller, Fred Krebs, Will Dampier.

  Non-thermal plasma, cold plasma, and atmospheric-pressure plasma are few terms used to describe the plasma used in plasma medicine research. The resulting ambiguity hampers literature searches, confuses discussion, and complicates collaborations. To assess the full breadth of this problem, we designed a natural language processing model (NLP) that surveyed approximately 15,000 papers in response to the query "plasma medicine" indexed in PubMed between 2020-2022. Our NLP was constructed and executed using the Hugging Face transformers API and PubMed BERT pretrained model. We used this model to determine the prevalence and to assess the utility of each term for searching literature relevant to plasma medicine. The effectiveness of each term was measured by precision, the ability to discriminate relevant and irrelevant literature; and recall, the ability to retrieve relevant literature. Each term was given a combined effectiveness score of 0-1 (1 = ideal effectiveness) accounting for precision, recall, sample size, and model confidence. Our model showed that of the twelve commonly used terms analyzed, none received a combined effectiveness score over 0.025. We concluded that there is no universal term for "plasma" that provides a satisfactory representation of literature. These results highlight the need for standardization of nomenclature in plasma medicine.

Keywords:  Atmospheric-pressure plasmas; low-temperature plasmas; machine learning; plasmas; text categorization

DOI:  https://doi.org/10.1109/trpms.2024.3447551
Int Forum Allergy Rhinol. 2025 Mar 25. e23585

Correspondence Regarding "Readability of Rhinology-Related Patient Education Materials Generated by Artificial Intelligence-Based Chatbots".

Simon Høj, Howraman Meteran.

DOI: https://doi.org/10.1002/alr.23585
Orthopadie (Heidelb). 2025 Mar 25.

[Internet resources on plantar fasciitis : German-language analysis of the information content of digital sourcess].

Marianne Rosenthal, Mark Lenz, Sophie Maria Tengler, Sebastian Sachse, Matthias Walcher, Klaus E Roth, Lena Mohr, Kajetan Klos.

   BACKGROUND: Plantar fasciitis belongs to the most common reasons for a consultation in medical institutions. Websites are assisting health education in this field; however, the quality is sometimes questionable.
OBJECTIVES: The quality of websites thematising plantar fasciitis was evaluated in a descriptive survey regarding content, structure, readability and visual aspects. Information of literature added by score results was used to create a website describing foot diseases.
MATERIALS AND METHODS: Keywords chosen by Google Ads were searched for in Google, Bing and Yahoo. Search results were analysed according to linguistic scores (EQIP36, 25-Item, Flesch-Kincaid). A third of these score worths were summarized as a total score. An online survey examined three websites with the highest scores. An optimized website was created.
RESULTS: 137 websites and 37 videos were scored. Arithmetic averages amounted to 72 (EQIP36), 15 (25-Item), 43 (Flesch-Kincaid), 59 (total score). Major websites with the best results in 25-Item were prepared by physicians. Websites of encyclopedias reached the best reading level, whereas those of health insurance funds received the highest values in EQIP36 and the total score. Websites with the lowest total score, EQIP36, 25-Item did not mention any sources. Websites including videos created by the medical sector reached a higher total score and reading levels. The outcome of the survey was slightly worse than the results of the score.
CONCLUSIONS: A lack of high qualified websites regarding foot pain was identified. This information can help producers of websites for digital patient education to improve the inadequate aspects.

Keywords:  Education of patient; Exercise therapy; Foot diseases; Heel spur syndrome; Inflammation

DOI:  https://doi.org/10.1007/s00132-025-04641-8
Pediatr Transplant. 2025 May;29(3): e70073

ChatGPT as a Source of Information for Kidney Transplant Recipients and Their Caregivers: Useful or not?

Amnuay Kleebayoon, Viroj Wiwanitkit.



Keywords:  ChatGPT; caregiver; kidney; pediatric transplantation; transplant

DOI:  https://doi.org/10.1111/petr.70073
Cureus. 2025 Feb;17(2): e79646

A Cross-Sectional Study Comparing Patient Information Guides for Amyotrophic Lateral Sclerosis, Myasthenia Gravis, and Guillain-Barré Syndrome Produced by ChatGPT-4 and Google Gemini 1.5.

Daaniya Tariq, Ramya Madhusudan, Yashaswi Guntupalli, Shivaashish Karumanchi Anantha Venkata Sai, Bharath Vejandla, Mohit Lnu.

   INTRODUCTION: Patient education for amyotrophic lateral sclerosis (ALS), myasthenia gravis (MG), and Guillain-Barré syndrome (GBS) is essential for effective symptom management, improving quality of life, and enabling informed care decisions. AI tools enhance healthcare and patient education through personalized care and improved diagnostics.
METHODS: In this study, ChatGPT (OpenAI, San Francisco, CA, USA) and Google Gemini (Mountain View, CA, USA) generated patient education guides for ALS, MG, and GBS. Variables included word count, sentence count, average words and syllables per sentence, grade level, ease score using the Flesch-Kincaid calculator, similarity score using QuillBot, and reliability using a modified DISCERN score. Statistical analysis was done using R version 4.3.2 (2023; R Foundation for Statistical Computing, Vienna, Austria).
RESULTS: ChatGPT-generated brochures for patient education on ALS, MG, and GBS had a higher grade level and lower ease score compared to those generated by Google Gemini. Although both models had similar reliability and similarity percentages, ChatGPT produced more content with greater complexity and slightly higher reliability.
CONCLUSION: This study found no significant difference in the average ease, grade, and reliability scores between the two AI tools when generating patient information brochures on ALS, MG and GBS. However, a statistically significant difference was observed in the mean word counts generated by the tools.

Keywords:  amyotrophic lateral sclerosis; artificial intelligence; chatgpt; google gemini; guillain-barré syndrome; myasthenia gravis

DOI:  https://doi.org/10.7759/cureus.79646
J Med Syst. 2025 Mar 25. 49(1): 39

Large Language Models' Responses to Spinal Cord Injury: A Comparative Study of Performance.

Jinze Li, Chao Chang, Yanqiu Li, Shengyu Cui, Fan Yuan, Zhuojun Li, Xinyu Wang, Kang Li, Yuxin Feng, Zuowei Wang, Zhijian Wei, Fengzeng Jian.

  With the increasing application of large language models (LLMs) in the medical field, their potential in patient education and clinical decision support is becoming increasingly prominent. Given the complex pathogenesis, diverse treatment options, and lengthy rehabilitation periods of spinal cord injury (SCI), patients are increasingly turning to advanced online resources to obtain relevant medical information. This study analyzed responses from four LLMs-ChatGPT-4o, Claude-3.5 sonnet, Gemini-1.5 Pro, and Llama-3.1-to 37 SCI-related questions spanning pathogenesis, risk factors, clinical features, diagnostics, treatments, and prognosis. Quality and readability were assessed using the Ensuring Quality Information for Patients (EQIP) tool and Flesch-Kincaid metrics, respectively. Accuracy was independently scored by three senior spine surgeons using consensus scoring. Performance varied among the models. Gemini ranked highest in EQIP scores, suggesting superior information quality. Although the readability of all four LLMs was generally low, requiring a college-level reading comprehension ability, they were all able to effectively simplify complex content. Notably, ChatGPT led in accuracy, achieving significantly higher "Good" ratings (83.8%) compared to Claude (78.4%), Gemini (54.1%), and Llama (62.2%). Comprehensiveness scores were high across all models. Furthermore, the LLMs exhibited strong self-correction abilities. After being prompted for revision, the accuracy of ChatGPT and Claude's responses improved by 100% and 50%, respectively; both Gemini and Llama improved by 67%. This study represents the first systematic comparison of leading LLMs in the context of SCI. While Gemini excelled in response quality, ChatGPT provided the most accurate and comprehensive responses.

Keywords:  Accuracy assessment; Large language model; Quality assessment; Readability assessment; Spinal cord injury

DOI:  https://doi.org/10.1007/s10916-025-02170-7
Skin Health Dis. 2025 Feb;5(1): 14-21

Assessing the readability of dermatological patient information leaflets generated by ChatGPT-4 and its associated plugins.

Dominik Todorov, Jae Yong Park, James Andrew Ng Hing Cheung, Eleni Avramidou, Dushyanth Gnanappiragasam.

Background: In the UK, 43% of adults struggle to understand health information presented in standard formats. As a result, Health Education England recommends that patient information leaflets (PILs) be written at a readability level appropriate for an 11-year-old.
Objectives: To evaluate the ability of ChatGPT-4 and its three dermatology-specific plugins to generate PILs that meet readability recommendations and compare their readability with existing British Association of Dermatologists (BAD) PILs.
Methods: ChatGPT-4 and its three plugins were used to generate PILs for 10 preselected dermatological conditions. The readability of these PILs was assessed using three readability formulas Simple Measure of Gobbledygook (SMOG), Flesch Reading Ease Test (FRET) and Flesch-Kincaid Grade Level Test (FKGLT) and compared against the readability of BAD PILs. A one-way ANOVA was conducted to identify any significant differences.
Results: The readability scores of PILs generated by ChatGPT-4 and its plugins did not meet the recommended target range. However, some of these PILs demonstrated more favourable mean readability scores compared with those from the BAD, with certain plugins, such as Chat with a Dermatologist, showing significant differences in mean SMOG (P = 0.0005) and mean FKGLT (P = 0.002) scores. Nevertheless, the PILs generated by ChatGPT-4 were found to lack some of the content typically included in BAD PILs.
Conclusions: ChatGPT-4 can produce dermatological PILs free from misleading information, occasionally surpassing BAD PILs in terms of readability. However, these PILs still fall short of being easily understood by the general public, and the content requires rigorous verification by healthcare professionals to ensure reliability and quality.

DOI: https://doi.org/10.1093/skinhd/vzae015
J Am Acad Orthop Surg. 2025 Mar 25.

GPT-4 as a Source of Patient Information for Carpal Tunnel Surgery: A Comparative Analysis Against Google Web Search.

Paul G Mastrokostas, Aaron B Lavi, Bruce B Zhang, Leonidas E Mastrokostas, Scott Liu, Katherine M Connors, Jennifer Hashem.

INTRODUCTION: Carpal tunnel surgery (CTS) accounts for approximately 577,000 surgeries in the United States annually. This high frequency raises concerns over the dissemination of medical information through artificial intelligence chatbots, Google, and healthcare professionals. The objectives of this study are to determine whether GPT-4 and Google differ in (1) the type of questions asked, (2) the readability of responses, and (3) the accuracy of numerical responses for the top 10 most frequently asked questions (FAQs) about CTS.
METHODS: A Google search was conducted to identify the top 10 FAQs related to CTS, which were then queried in GPT-4. Responses were categorized using the Rothwell classification system and evaluated for readability using Flesch Reading Ease and Flesch-Kincaid grade level scores. Statistical analyses included Cohen kappa coefficients for interobserver reliability and Student t-tests for comparing response characteristics. Statistical significance was set at the 0.05 level.
RESULTS: This study found that 70% of Google's FAQs were fact based, predominantly focusing on technical details (40%) and specific activities (40%). GPT-4's FAQs were mainly factual (50%), with technical details (40%) being the most queried topic. Complete agreement in interobserver reliability was observed. Google's answers were more readable than GPT-4's, with a Flesch Reading Ease score of 56.40 vs. 34.19 (P = 0.001) and a Flesch-Kincaid grade level of 9.93 vs. 12.85 (P = 0.007). Google responses were shorter, with an average word count of 91.50 compared with GPT-4's 162.90 (P = 0.013). For numerical responses to FAQs, GPT-4 and Google differed in nine out of 10 questions, with GPT-4 often providing broader time frames.
CONCLUSION: GPT-4 offers a more detailed and technically oriented approach to addressing patient queries about CTS when compared with Google. This suggests that GPT-4 can offer detailed insights where patients seek more in-depth information, enhancing the quality of healthcare education.
LEVEL OF EVIDENCE: NA.

DOI: https://doi.org/10.5435/JAAOS-D-24-00249
Sci Rep. 2025 Mar 22. 15(1): 9906

Evaluating the quality of medical content on YouTube using large language models.

Mahmoud Khalil, Fatma Mohamed, Abdulhadi Shoufan.

  YouTube has become a dominant source of medical information and health-related decision-making. Yet, many videos on this platform contain inaccurate or biased information. Although expert reviews could help mitigate this situation, the vast number of daily uploads makes this solution impractical. In this study, we explored the potential of Large Language Models (LLMs) to assess the quality of medical content on YouTube. We collected a set of videos previously evaluated by experts and prompted twenty models to rate their quality using the DISCERN instrument. We then analyzed the inter-rater agreement between the language models' and experts' ratings using Brennan-Prediger's (BP) Kappa. We found that LLMs exhibited a wide range of inter-rater agreements with the experts (ranging from -1.10 to 0.82). All models tended to give higher scores than the human experts. The agreement on individual questions tended to be lower, with some questions showing significant disagreement between models and experts. Including scoring guidelines in the prompt has improved model performance. We conclude that some LLMs are capable of evaluating the quality of medical videos. If used as stand-alone expert systems or embedded into traditional recommender systems, these models can mitigate the quality issue of health-related online videos.

Keywords:  Content quality; LLMs; Medical content; YouTube

DOI:  https://doi.org/10.1038/s41598-025-94208-6
Health Info Libr J. 2025 Mar 28.

Health consumers' emotional responses toward asthma videos on YouTube are influenced by time since posting, number of tags, subject of content and the emotional tone.

Yanyan Wang, Jin Zhang, Xiaohan Yan, Benjamin Ombati Omwando.

   BACKGROUND: Social media platforms and user-generated videos have become important channels and resources for health consumers seeking information and learning about asthma management.
OBJECTIVES: This study examined the characteristics of asthma-related videos on YouTube, health consumers' emotional responses to these videos and explored the video attributes influencing their emotional responses and attitudes toward asthma-related content.
METHODS: The study employed manual subject analysis, sentiment analysis, descriptive statistical analysis and regression modelling.
RESULTS: The most popular content categories were Treatment, Prevention and Cause & Pathophysiology. Consumer interactions confirmed interest in Treatment. The time since posting, the number of tags, the subject of content and the general tone (positive/neutral/negative) of a video influenced whether it elicited positive or negative emotions.
DISCUSSION: The consumer interactions might indicate interest in a content category, but the analysis might show negative attitudes to that content. 'Sign & Symptom' content can reduce the positive emotional responses, and 'Cause & Pathophysiology' content can raise the negative emotional responses, thus reducing the consumers' expression of positive attitudes in different ways.
CONCLUSION: The content priorities of video creators and health consumers differed, and keeping the emotional tone positive appears important for fostering positive emotional responses and attitudes.

Keywords:  asthma; attitude to health; consumer health information; information dissemination; social media

DOI:  https://doi.org/10.1111/hir.12570
Laryngoscope Investig Otolaryngol. 2025 Apr;10(2): e70131

Evaluation of the Quality of Tympanoplasty Videos on YouTube Using the Ivory Grading System.

Ahmet Ufuk Kilictas, Mehmet Birinci, Başar Erdivanlı.

   Purpose: YouTube has become a key platform for sharing information on surgical procedures. However, the absence of peer review raises concerns about the educational quality and reliability of its content. The IVORY grading system was developed to address this gap in otorhinolaryngology. This study aims to evaluate the educational quality of tympanoplasty videos on YouTube using the IVORY system and identify deficiencies.
Methods: Ninety-four tympanoplasty videos were analyzed based on inclusion and exclusion criteria. Video metrics such as duration, views, and likes were recorded, and videos were scored on a 0-2 scale using the IVORY system. Total scores were categorized from A to F. Statistical analyses examined the relationships between IVORY scores and video characteristics.
Results: The median IVORY score of the videos was 21.0 (IQR: 17.0-24.0). Only 4.3% of the videos (A and B categories) were of high educational quality, while 68.1% were classified as low quality (F category). A weak but significant correlation was found between IVORY scores and metrics such as views (p = 0.008, r = 0.271) and likes (p = 0.005, r = 0.288). Video duration showed a significant negative correlation with IVORY scores (beta = -0.05, p = 0.012). Geographic differences significantly affected video quality (p < 0.05).
Conclusion: This study highlights YouTube's potential as a resource for surgical education while also revealing significant shortcomings in video quality. The broader adoption of guidelines like the IVORY system and the encouragement of producing content aligned with these standards would be an important step toward improving the quality of surgical educational materials.
Level of Evidence: NA.

Keywords:  IVORY guideline; YouTube; educational videos; tympanoplasty; video quality

DOI:  https://doi.org/10.1002/lio2.70131
AORN J. 2025 Apr;121(4): e1-e10

Evaluation of the Content, Reliability, and Quality of YouTube Videos on Surgical Hand Scrubbing.

Mahmut Dağcı, Hatice Merve Alptekin, Deren İhtiyar, Gamze Öztürk, Hazal Öztürk.

  The purpose of this study was to evaluate the content, reliability, and quality of YouTube videos on surgical hand scrubbing. Two hundred videos in English were identified and screened according to the inclusion criteria. The evaluation of the sample of 72 videos was guided by content, reliability, and quality tools. Videos with at least one source in the description had significantly greater reliability scores than those without (t = 3.871, P < .001). There were no significant differences between the content scores and the general traits of the videos (eg, advertisements, subtitles). Analysis of the relationship between the content and quality scores with the video traits showed a weak positive correlation between quality scores and video length (r = 0.233, P = .049). Viewers should consider content, reliability, and quality rather than popularity when seeking educational video content on hand scrubbing.

Keywords:  YouTube videos; hand hygiene; nursing education; social media; surgical hand scrubbing

DOI:  https://doi.org/10.1002/aorn.14319
Cureus. 2025 Feb;17(2): e79412

Assessment of the Usefulness and Reliability of YouTube Videos on Postmortem Procedures.

Anamika Nath, Anupam Datta.

   INTRODUCTION: The increasing reliance on digital platforms for educational purposes has resulted in the widespread availability of medical content, including videos on postmortem procedures on YouTube (Google Inc., Mountain View, CA). The present study analyzed the significance of YouTube videos on postmortem procedures to use as an effective educational tool.
METHODS: A comprehensive search of YouTube videos on postmortem procedures was conducted using the keyword "postmortem" on a specific date. Videos were included if they were in English, provided content directly related to postmortem procedures, and included audio narration or explanatory text. Data collection and analysis involved evaluating video quality, engagement metrics, and content accuracy using various scales and statistical tests.
RESULTS: The study analyzed 50 YouTube videos on postmortem procedures, finding that most (n=32, 64%) were of low usefulness, and the majority had low reliability scores. Engagement metrics showed a mean view ratio of 1,500 views per day and an average like ratio of 5.6 but were not predictive of video quality.
CONCLUSION: The majority of YouTube videos on postmortem procedures lack educational value and reliability, emphasizing the need for credible organizations to create high-quality digital resources.

Keywords:  educational videos; medical content; postmortem procedures; reliability; usefulness; youtube™

DOI:  https://doi.org/10.7759/cureus.79412
Indian J Community Med. 2025 Jan-Feb;50(1):50(1): 169-174

YouTube Hindi Videos Proved to be a Valid and Reliable Source of Health Information: A Case Study of COVID-19 Vaccine.

Nibha Sinha, Sonali Patle.

   Background: When the coronavirus vaccine was launched, many Hindi language videos on YouTube were uploaded and audiences were accessing these videos in good numbers. However, the scientific evaluation of these videos was never done. There is a need to understand the accuracy, validity, and quality of these videos so that they can further be utilized for other health awareness purposes. The study objective was to evaluate the validity, quality, and accuracy of the Hindi videos that were viewed the most on YouTube.
Material and Methods: Two search terms, "COVID-19 Vaccine-Hindi" and "Corona vaccine Hindi," were used, and the most viewed 50 videos were selected. All the videos were manually coded and statistically evaluated. Two public health researchers evaluated all the videos blinded to each other using the DISCERN, GQS, and JAMA scales. The correlation score was calculated to know the agreement between them. Scores of professionals and news-based organizations were also compared.
Results: Out of all 50 videos, professionals uploaded 48% of videos, and news-based organizations uploaded 46% of videos. The DISCERN median was found to be 9.00 (5-10), the JAMA median score was 6.00 (3-8), and the DISCERN score was 40.50 (16-49). DISCERN, JAMA, and GQS scores were also calculated separately for both observers, and no statistically significant difference was found. Between both the observer's agreement was found with statistical significance.
Conclusion: YouTube was found to be an excellent source of information on the coronavirus vaccine in the Hindi language. Almost all 50 videos had wide viewership, so health professionals and news media organizations cannot ignore this popular platform to improve the health awareness of citizens in a developing country like India.

Keywords:  Corona vaccine; YouTube videos; health communication; internet; social media

DOI:  https://doi.org/10.4103/ijcm.ijcm_813_23
Rheumatol Int. 2025 Mar 22. 45(4): 77

YouTube as a source of information for stroke rehabilitation: a cross-sectional analysis of quality and reliability of videos.

Meirgul I Assylbek, Olena Zimba, Ahmet Akyol, Marlen Yessirkepov, Burhan Fatih Kocyigit.

   INTRODUCTION: Due to YouTube's meteoric rise in popularity, the quality and reliability of health-related videos on YouTube are being questioned, particularly in specialized fields like stroke rehabilitation. This research aimed to assess the quality and reliability of YouTube videos relevant to stroke rehabilitation.
METHOD: Video listing was conducted on December 17, 2024, using the keywords "Stroke Rehabilitation", "Stroke Physical Therapy", "Stroke Neurophysiotherapy", and "Stroke Physical Therapy Techniques" as query terms. A final sample of 72 videos was selected upon completion and evaluated according to inclusion and exclusion criteria. The Global Quality Scale (GQS), Modified DISCERN Questionnaire, JAMA Benchmark Criteria, and Patient Education Materials Assessment Tool for Audio/Visual Materials (PEMAT-A/V) were among the evaluation tools used to analyze each video. Researchers captured the videos' fundamental components and compared the quality classifications.
RESULTS: Of the 72 videos examined, 29.2% (n = 21) were categorized as low quality, 20.8% (n = 15) as intermediate level, and 50% (n = 36) as high quality. Videos generated by academic medical centers (77.8%) and nonphysician healthcare professionals (59.4%) were primarily of high quality, while videos from independent users (100%) and TV channels (66.7%) displayed the lowest quality. Significant differences were observed when comparing quality groups based on daily views, likes, and comments (p < 0.05). The lowest scores were detected in the low-quality group. Significant correlations were identified between GQS and other evaluative instruments (p < 0.001), indicating consistency across evaluation frameworks.
CONCLUSION: YouTube possesses considerable potential as an instructional tool for stroke rehabilitation. The inconsistency in video quality underscores the necessity for enhanced content control, editing, and the advocacy of high-quality, evidence-based resources. Promoting collaboration among academics, healthcare professionals, and content producers could augment the platform's instructional efficacy.

Keywords:  Information science; Internet; Neurological rehabilitation; Social media; Stroke rehabilitation

DOI:  https://doi.org/10.1007/s00296-025-05832-4
J Am Podiatr Med Assoc. 2025 Jan-Feb;115(1):pii: 22-054. [Epub ahead of print]115(1):

Quality Assessment of YouTube Videos as a Source of Information on Ingrown Toenails.

Erdi Imre.

BACKGROUND: YouTube is one of the most widely used Internet sources, and many patients watch YouTube videos for gathering more information, especially about health problems. This study aimed to investigate the informative capabilities of YouTube videos about ingrown toenails. We hypothesize that most of the shared information is of low quality independent of source and that the attraction effect of videos is unrelated to quality.
METHODS: The first 50 videos in the English language using the keyword query ingrown toenail in YouTube search were analyzed. Journal of the American Medical Association (JAMA) benchmark criteria were used to assess video reliability, and Global Quality Score (GQS) and toenail specific score (TSS) were used to assess the quality of educational content.
RESULTS: The first 50 videos had 71,842,230 views (median, 333,585). Forty-one videos (82%) were from health-care professionals, seven (14%) were educational videos, and two (4%) were personal videos. The median JAMA score was 2, with the highest scores coming from academic sources. When grouped by view count (>300,000 versus ≤300,000) and like count (>10,000 versus ≤10,000), there was no significant difference in JAMA and GQS scores. The median GQS and toenail specific score were 3.0 and 5.5, respectively. Video duration was a significant predictor of GQS as a result of regression analysis (P = .002; β = 0.425).
CONCLUSIONS: Illustrated by the high number of views, ingrown toenail is a popular health topic on YouTube. Although popular and with content mostly uploaded by health-care professionals, content quality was found to be poor and videos to be unreliable and insufficient for informing patients because most videos seem to be geared toward entertainment rather than direct patient education. Health-care professionals should be aware of the generally low-quality data available.

DOI: https://doi.org/10.7547/22-054
Cureus. 2025 Feb;17(2): e79531

Evaluating the Quality and Readability of Online Health Information on Snapping Hip Syndrome: A Cross-Sectional Analysis.

Mehmet Fatih Uzun, Alper Özer, Aydogan Askin, Mehmet O Atahan, Göker Yurdakul, Fatih Gölgelioğlu.

   AIM: This study aims to evaluate the quality and readability of online health information related to snapping hip syndrome (SHS).
METHODS: A cross-sectional analysis was conducted by searching the term "Snapping Hip Syndrome" on Google, Bing, and Yahoo. The first 30 results from each search engine were assessed, and duplicate or irrelevant websites were excluded. The remaining 90 unique web pages were categorized into academic, physician, commercial, medical professional, and non-identified groups. Quality was assessed using the DISCERN instrument, Journal of the American Medical Association (JAMA) Benchmark Criteria, and HONcode certification, while readability was evaluated with the Flesch-Kincaid Grade Level (FKGL) and Flesch-Kincaid Reading Score (FKRS). The SHS Content Score (SHS-CS) was also developed for a comprehensive content-specific evaluation.
RESULTS: Academic websites had the highest quality scores, with DISCERN (52.10 ± 6.85), JAMA (3.48 ± 0.50), and SHS-CS (27.85 ± 2.15), but demonstrated lower readability (FKGL: 11.76 ± 0.40, FKRS: 21.45 ± 7.12). Commercial and non-identified websites scored lowest across all quality measures. Significant correlations were found between DISCERN and JAMA (r = 0.932, p = 0.000*), SHS-CS and DISCERN (r = 0.918, p = 0.000*), and a negative correlation with readability metrics (DISCERN vs. FKRS, r = -0.668, p = 0.000*).
CONCLUSION: The quality of SHS-related online information varies significantly across website types. While academic websites provide the highest quality content, they often lack readability. HONcode-certified websites exhibited superior quality but did not differ significantly in readability compared to non-certified sites. Future efforts should focus on improving the readability of high-quality health information.

Keywords:  discern; jama benchmark criteria; online health information; readability; snapping hip syndrome

DOI:  https://doi.org/10.7759/cureus.79531
PeerJ. 2025 ;13 e19151

Assessment of YouTube videos on post-dural puncture headache: a cross-sectional study.

Seher İlhan, Turan Evran.

   Background: Post-dural puncture headache (PDPH) is a common complication of central neuroaxis anesthesia or analgesia, causing severe headaches. YouTube is widely used for health information, but the reliability and quality of PDPH-related content are unclear. This study evaluates the content adequacy, reliability, and quality of YouTube videos on PDPH.
Methods: This cross-sectional study analyzed English-language YouTube videos on PDPH with good audiovisual quality. Two independent reviewers assessed the videos using the DISCERN instrument, Journal of American Medical Association (JAMA) benchmark criteria, and Global Quality Scale (GQS). Correlations between video characteristics and their reliability, content adequacy, and quality scores were examined.
Results: Out of 71 videos, 42.3% were uploaded by health-related websites, 36.6% by physicians, and 21.1% by patients. Strong correlations were found between DISCERN, JAMA, and GQS scores (p < 0.001). Videos from physicians and health-related websites had significantly higher scores than those from patients (p < 0.001). No significant correlations were observed between descriptive characteristics and scores (p > 0.05).
Conclusion: YouTube videos on PDPH uploaded by health-related websites or physicians are more reliable, adequate, and higher in quality than those uploaded by patients. Source credibility is crucial for evaluating medical information on YouTube.

Keywords:  Consumer health information; Digital technology; Post-dural puncture headache; Social media; YouTube

DOI:  https://doi.org/10.7717/peerj.19151
Clin Exp Ophthalmol. 2025 Mar 22.

Evaluating the Quality and Reliability of YouTube Videos on Optic Neuritis: A Cross-Sectional Study Using Validated Scoring Systems.

Bedia Kesimal, Sıdıka Gerçeker Demircan, Sücattin İlker Kocamış.

   BACKGROUND: YouTube is one of the largest internet platforms used worldwide. People often use this platform to get information. Since little is known about YouTube as a source of information on optic neuritis (ON), we investigated the quality, reliability and perception of ON videos related to content among YouTube users.
METHODS: An online search was conducted on YouTube using the keyword 'optic neuritis'. According to the inclusion and exclusion criteria, 50 videos were included in the study. All videos were evaluated based on likes, comments, views, view count, source of information and video content. The Journal of the American Medical Association (JAMA), DISCERN and Global Quality Scores (GQS) were independently evaluated by two ophthalmologists.
RESULTS: This study analysed 50 YouTube videos with a total of 604 283 views. The mean scores for JAMA, GQS and DISCERN were 1.95 ± 0.43, 2.96 ± 1.21 and 41.59 ± 1.46, respectively. Videos created by physicians (46% ophthalmologists, 14% non-ophthalmologist physicians) had significantly higher JAMA and GQS scores (p < 0.05) compared to others; though no differences were observed in the DISCERN scores. Video metrics such as views, likes and comments did not show a significant association with quality scores.
CONCLUSIONS: The overall quality of the videos was suboptimal. Videos created by physicians demonstrated higher quality based on JAMA and GQS scores, highlighting the importance of expert authorship in online health content. The video metrics did not correlate with quality, highlighting the need for reliable evaluation criteria beyond the popularity measures.

Keywords:   Journal of the American Medical Association score ; DISCERN score; YouTube; global quality score; optic neuritis

DOI:  https://doi.org/10.1111/ceo.14527
JMIR Form Res. 2025 Mar 25. 9 e65114

Changes in Health Education Literacy After Structured Web-Based Education Versus Self-Directed Online Information Seeking in Patients Undergoing Carpal Tunnel Release Surgery: Nonrandomized, Controlled Study.

Mariella Seel, Julian Alexander Mihalic, Stefan Mathias Froschauer, Bernhard Holzner, Jens Meier, Tobias Gotterbarm, Matthias Holzbauer.

   Background: With advancements in anesthesiologic and surgical techniques, many surgeries are now performed as day-surgery procedures, requiring greater responsibilities for self-management from patients during the perioperative process. Online health information often lacks reliability and comprehensibility, posing risks for patients with low health literacy. Carpal tunnel release (CTR) surgery, a common day-surgery procedure, necessitates effective patient education for optimal recovery and self-management.
Objective: This study introduces the CTS Academy, a web-based education program designed for patients undergoing CTR day surgery. The study aimed to evaluate the CTS Academy's impact on patients' health education literacy (HEL) compared with self-directed online information seeking.
Methods: A scoping review on education programs focusing on the perioperative process of CTR was conducted before this study. In a nonrandomized controlled study, 60 patients scheduled for CTR were assigned to 2 groups based on the patients' preferences; the test group used the CTS Academy, while the control group performed self-directed online searches. HEL was assessed using the Health Education Literacy of Patients with chronic musculoskeletal diseases (HELP) questionnaire, focusing on patients's comprehension of medical information (COMPR), patients's ability to apply health-related information in an everyday life (APPLY), and patient's ability to communicate with health care professional (COMM). Secondary outcomes included content comprehensibility, patient preferences, platform usability, and clinical carpal tunnel syndrome (CTS)-related parameters.
Results: In the scoping review, 17 studies could be identified and included for full-text analysis. Eighteen patients each were included in the test group (13 women and 5 men) and in the control group (11 women and 7 men). The average time spent in the study was 167 and 176 days for the test and control groups, respectively. The test group showed significant improvements in APPLY (mean 28, SD 7.99 vs mean 24, SD 5.14; P<.05) and COMM (mean 30, SD 10.52 vs mean 25, SD 6.01; P=.02) after using the CTS Academy in a longitudinal analysis. No significant changes were observed in the control group. In a comparison between groups, the test group had significantly higher APPLY scores at follow-up (mean 24, SD 5.14 vs mean 33, SD 14.78; P=.044) and fewer comprehension issues at baseline (mean 38, SD 16.60 vs mean 50, SD 19.00; P=.03). The CTS-related knowledge assessment yielded 92% (66/72) versus 90% (65/72) correct answers in the test and control groups, respectively. The test group rated the CTS Academy highly in usability (6.22 of 7.00 points) and utility (6.13 of 7.00 points). Preferences leaned toward using CTS Academy alongside doctor consultations (16/18, 89%) and over self-directed searches (15/18, 84%). No significant differences were found in CTS-related symptoms between groups.
Conclusions: The CTS Academy effectively enhanced patients' HEL, especially in applying and communicating medical information. The platform's usability and utility were rated favorably, and patients preferred it over independent online information seeking. This suggests that structured, web-based education enhances patient self-management during the day surgery process.

Keywords:  carpal; carpal tunnel; carpal tunnel release; carpal tunnel release surgery; carpal tunnel surgery; controlled study; day surgery; health education; health education literacy; health literacy; information seeking; non-randomized; online health information; online information; online search; patient education; perioperative; self-management; structured web-based education; web-based

DOI:  https://doi.org/10.2196/65114
BMC Complement Med Ther. 2025 Mar 21. 25(1): 111

Health information-seeking behavior among users of traditional, complementary and integrative medicine (TCIM).

Miriam Trübner, Alexander Patzina, Judith Lehmann, Benno Brinkhaus, Christian S Kessler, Rasmus Hoffmann.

   BACKGROUND: The use of traditional, complementary, and integrative medicine (TCIM) is widespread among the German population and driven by various motives, including both supplementing and avoiding treatments with conventional medicine. The aim of this article is to examine how these motives relate to different health information-seeking behaviors.
METHODS: The study uses regression analysis based on data from a German online access panel, which explored the use and acceptance of TCIM in Germany in 2022. From this study, we use information on 1,696 individuals (aged 18-75 years) who vary in their motives for using TCIM (subjective statements on five-point Likert scales) and have used TCIM to treat health problems.
RESULTS: Overall, TCIM is considered more a health-promoting measure than it is driven by aversion towards conventional medicine. Our analysis of information-seeking behavior for certain therapeutic procedures reveals that, as respondents' propensity to use TCIM as a health-promoting measure rises, they are more likely to perceive themselves as being influenced by scientific studies (AME: 0.04, p = 0.004), personal advice (AME: 0.09, p = 0.000), and their social circle's experiences (AME: 0.08, p = 0.000). In contrast, respondents who use TCIM more due to aversion to conventional medicine are less likely to perceive themselves as being influenced by scientific studies (AME: -0.04, p = 0.004) and doctors (AME: -0.07, p = 0.000). When analyzing respondents' most important medical information source, our results reveal that the more individuals indicate using TCIM out of aversion, the more likely they are to consider (online) media outlets their most important medical resource (AME: 0.05, p = 0.000), while the likelihood of considering medical professionals most important decreases (AME -0.06, p = 0.000).
CONCLUSION: Motives behind TCIM use vary and correspond to differences in individuals' health information-seeking behavior. Beyond these motive-related differences, TCIM users value sources of health information other than their medical practitioners. This calls for an intensification of TCIM training among medical professionals to provide high-quality consultation and the creation of reputable online portals to ensure the provision of trustworthy information about TCIM.

Keywords:  Alternative medicine; Complementary medicine; Health information-seeking behavior; Integrative medicine; Medical advice; Medical information; Traditional medicine

DOI:  https://doi.org/10.1186/s12906-025-04843-9