bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–04–13
fifteen papers selected by
Thomas Krichel, Open Library Society



  1. ASAIO J. 2025 Apr 09.
      The National Institute of Health (NIH) recommends that healthcare education material be written at a 6th-7th grade reading level. There are yet to be studies that investigate the readability of extracorporeal membrane oxygenation (ECMO) educational materials. Educational materials published by Extracorporeal Life Support Organization's platinum, gold, and silver centers of excellence in the United States were included. Each material was analyzed for content related to ECMO. These topics were also input into Google and the top 20 results were included in this study. Readability was measured using the Measure of Gobbledygook Readability Formula (SMOG), Coleman-Liau index, and Flesch-Kincaid grade level (FKGL). The average reading level for the educational material from the platinum centers was 8.54 for SMOG, 11.38 for Coleman-Liau, and 9.44 for FKGL. The average reading level of the gold centers' material was 9.11 for SMOG, 11.62 for Coleman-Liau, and 10.21 for FKGL. The average reading level of the silver centers' material was 8.82 for SMOG, 11.61 for Coleman-Liau, and 11.53 for FKGL. The average reading level of the internet search results was 9.11 for SMOG, 10.92 for Coleman-Liau, and 9.77 for FKGL. Extracorporeal membrane oxygenation education material had a readability level above the NIH's recommendation.
    Keywords:  extracorporeal life support; readability; reading level
    DOI:  https://doi.org/10.1097/MAT.0000000000002425
  2. Urology. 2025 Apr 08. pii: S0090-4295(25)00339-5. [Epub ahead of print]
       OBJECTIVE: To evaluate the quality of information and counseling for patients regarding erectile dysfunction across major direct-to-consumer telehealth platforms.
    MATERIALS AND METHODS: We identified the five largest direct-to-consumer men's telehealth platforms by monthly site visits using Semrush®, a web traffic analysis tool. We then analyzed the quality, reliability, accessibility, and readability of patient information on erectile dysfunction on each site using a series of validated metrics for evaluating online health information (Journal of the American Medical Association criteria, DISCERN instrument, LIDA instrument, Flesch Readability Score).
    RESULTS: Five platforms (Hims®, Roman®, Lemonaid®, BlueChew®, Numan®) were included in the study. Each site offered virtual care and counseling for patients with erectile dysfunction. Overall scores for information quality were highest for the two largest platforms (Hims®, Roman®) and lower on smaller sites (Lemonaid®, BlueChew®, Numan®). LIDA scores for site accessibility also favored the larger platforms, while reliability and supplement performance was universally poor. Flesch readability scores for written content fell in the "fairly difficult" or "difficult" range for all platforms.
    CONCLUSIONS: The quality of patient counseling varies widely between popular direct-to-consumer telehealth platforms. Men seeking online care for erectile dysfunction are at risk of receiving incomplete or inaccurate information, and urologists should be prepared to guide patients toward reliable sources.
    DOI:  https://doi.org/10.1016/j.urology.2025.04.018
  3. J Sex Med. 2025 Apr 10. pii: qdaf075. [Epub ahead of print]
       INTRODUCTION: Gender-affirming surgeries significantly improve the well-being of transgender and gender-diverse individuals. However, patients often rely on online patient education materials (OPEMs) to navigate surgical options, making readability, quality, and accessibility critical factors in informed decision-making.
    OBJECTIVE: The objective of this study is to evaluate the readability, quality, and accessibility of online patient education materials related to gender-affirming surgeries.
    METHODS: This systematic review analyzed nine studies evaluating 898 OPEMs related to gender-affirming surgeries and transgender voice care. Readability was assessed using Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), and Flesch Reading Ease Score (FRES), while quality was evaluated using DISCERN and the Patient Education Materials Assessment Tool. A meta-analysis synthesized readability scores, and qualitative trends were examined to assess readability-quality trade-offs.
    RESULTS: OPEMs consistently exceeded the recommended 6th-grade reading level, with a pooled FKGL mean of 12.49 (95% CI: 12.41-12.57), indicating high school to university-level complexity. SMOG scores averaged 11.89 (95% CI: 11.79-11.99), suggesting materials required at least some college education. FRES scores (mean: 37.49, 95% CI: 37.17-37.80) classified most materials as "difficult" to "very difficult" to read. Healthcare-affiliated websites had significantly higher FKGL scores than non-healthcare sources (P < 0.01). DISCERN scores were highly variable, with 68.33% of facial feminization materials rated poor or very poor. Physician-created TikTok content scored higher in reliability (P < 0.001) but had lower engagement than non-physician videos. Spanish-language materials were slightly more readable (SMOG 11.7 vs. 14.2 in English) but less available.
    CONCLUSIONS: Most OPEMs for gender-affirming care fail to meet health literacy guidelines, limiting accessibility. To improve patient comprehension, materials should be simplified without sacrificing accuracy, incorporate multimedia tools, and undergo usability testing. Standardized, trans-affirming, and linguistically inclusive resources are essential for equitable access and informed decision-making.
    Keywords:  gender-affirming surgery; online health information; patient education materials; readability; systematic review
    DOI:  https://doi.org/10.1093/jsxmed/qdaf075
  4. Mayo Clin Proc Digit Health. 2023 Sep;1(3): 226-234
       Objective: To evaluate the quality of the answers and the references provided by ChatGPT for medical questions.
    Patients and Methods: Three researchers asked ChatGPT 20 medical questions and prompted it to provide the corresponding references. The responses were evaluated for the quality of content by medical experts using a verbal numeric scale going from 0% to 100%. These experts were the corresponding authors of the 20 articles from where the medical questions were derived. We planned to evaluate 3 references per response for their pertinence, but this was amended on the basis of preliminary results showing that most references provided by ChatGPT were fabricated. This experimental observational study was conducted in February 2023.
    Results: ChatGPT provided responses varying between 53 and 244 words long and reported 2 to 7 references per answer. Seventeen of the 20 invited raters provided feedback. The raters reported limited quality of the responses, with a median score of 60% (first and third quartiles: 50% and 85%, respectively). In addition, they identified major (n=5) and minor (n=7) factual errors among the 17 evaluated responses. Of the 59 references evaluated, 41 (69%) were fabricated, although they appeared real. Most fabricated citations used names of authors with previous relevant publications, a title that seemed pertinent and a credible journal format.
    Conclusion: When asked multiple medical questions, ChatGPT provided answers of limited quality for scientific publication. More importantly, ChatGPT provided deceptively real references. Users of ChatGPT should pay particular attention to the references provided before integration into medical manuscripts.
    DOI:  https://doi.org/10.1016/j.mcpdig.2023.05.004
  5. Spine Deform. 2025 Apr 05.
       PURPOSE: Patients increasingly rely on online resources to better understand their health conditions. ChatGPT could satisfy the demand for reliable and accessible online health education resources, yet few studies have applied this to pediatric orthopaedic counseling. This study quantifies the accuracy and comprehensibility of ChatGPT responses to frequently asked questions (FAQs) regarding scoliosis.
    METHODS: Twelve FAQs regarding scoliosis were compiled following a literature review, and ChatGPT Version 3.5 was utilized to answer them. The responses were analyzed for accuracy and clarity using the Mika et al. scoring system and modified DISCERN score in collaboration with two fellowship-trained pediatric orthopaedic surgeons. Readability was assessed using several published educational-level indices.
    RESULTS: The ChatGPT responses received a Mika et al. average of 2.4 (satisfactory requiring minimal to moderate clarification) and an averaged mean DISCERN score of 45.9. The estimated reading level necessary for comprehension ranged from 11th grade to college graduate.
    CONCLUSIONS: When prompted with 12 scoliosis FAQs, ChatGPT produces responses of satisfactory accuracy but require further clarification and are written at an inappropriately high reading level for the scoliosis patient population. Future research should explore strategies to verify the reliability of AI services for counseling on other pediatric orthopaedic conditions.
    Keywords:  Artificial intelligence; ChatGPT; Patient information; Pediatric orthopaedics; Scoliosis
    DOI:  https://doi.org/10.1007/s43390-025-01087-y
  6. Urogynecology (Phila). 2025 Apr 08.
       IMPORTANCE: As the volume of medical literature continues to expand, the provision of artificial intelligence (AI) to produce concise, accessible summaries has the potential to enhance the efficacy of content review.
    OBJECTIVES: This project assessed the readability and quality of summaries generated by ChatGPT in comparison to the Plain Text Summaries from Cochrane Review, a systematic review database, in incontinence research.
    STUDY DESIGN: Seventy-three abstracts from the Cochrane Library tagged under "Incontinence" were summarized using ChatGPT-3.5 (July 2023 Version) and compared with their corresponding Cochrane Plain Text Summaries. Readability was assessed using Flesch Kincaid Reading Ease, Flesch Kincaid Grade Level, Gunning Fog Score, Smog Index, Coleman Liau Index, and Automated Readability Index. A 2-tailed t test was used to compare the summaries. Each summary was also evaluated by 2 blinded, independent reviewers on a 5-point scale where higher scores indicated greater accuracy and adherence to the abstract.
    RESULTS: Compared to ChatGPT, Cochrane Review's Plain Text Summaries scored higher in the numerical Flesch Kincaid Reading Ease score and showed lower necessary education levels in the 5 other readability metrics with statistical significance, indicating better readability. However, ChatGPT earned a higher mean accuracy grade of 4.25 compared to Cochrane Review's mean grade of 4.05 with statistical significance.
    CONCLUSIONS: Cochrane Review's Plain Text Summaries provide clearer summaries of the incontinence literature when compared to ChatGPT, yet ChatGPT generated more comprehensive summaries. While ChatGPT can effectively summarize the medical literature, further studies can improve reader accessibility to these summaries.
    DOI:  https://doi.org/10.1097/SPV.0000000000001688
  7. Ulus Travma Acil Cerrahi Derg. 2025 Apr;31(4): 389-393
       BACKGROUND: This study aims to evaluate the accuracy and reliability of Generative Pre-trained Transformer (ChatGPT; OpenAI, San Francisco, California) in answering patient-related questions about trigger finger. This evaluation has the potential to enhance patient education prior to treatment and provides insight into the role of artificial intelligence (AI)-based systems in the patient educa-tion process.
    METHODS: The ten most frequently asked questions regarding trigger finger were compiled from patient education websites and a literature review, then posed to ChatGPT. Two orthopedic specialists evaluated the responses using the Journal of the American Medical Association (JAMA) Benchmark criteria and the DISCERN instrument (A Tool for Judging the Quality of Written Consumer Health Information on Treatment Choices). Additionally, the readability of the responses was assessed using the Flesch-Kincaid Grade Level.
    RESULTS: The DISCERN scores for ChatGPT's responses to trigger finger questions ranged from 35 to 47, with an average of 42, indicating "moderate" quality. While 60% of the responses were satisfactory, 40% contained deficiencies. According to the JAMA Benchmark criteria, the absence of scientific references was a significant drawback. The average readability level corresponded to the university level, making the information difficult to understand for patients with low health literacy. Improvements are needed to enhance the accessibility and comprehensibility of the content for a broader patient population.
    CONCLUSION: To the best of our knowledge, this is the first study to investigate the use of ChatGPT in the context of trigger finger. While ChatGPT shows reasonable effectiveness in providing general information on trigger finger, expert oversight is necessary before it can be relied upon as a primary source for patient education.
    DOI:  https://doi.org/10.14744/tjtes.2025.32735
  8. JMIR Infodemiology. 2025 Apr 08. 5 e59767
       BACKGROUND: Social media has been extensively used by the public to seek information and share views on health issues. Recently, the proper and off-label use of semaglutide drugs for weight loss has attracted huge media attention and led to temporary supply shortages.
    OBJECTIVE: The aim of this study was to perform a content analysis on English YouTube (Google) videos related to semaglutide.
    METHODS: YouTube was searched with the words semaglutide, Ozempic, Wegovy, and Rybelsus. The first 30 full-length videos (videos without a time limit) and 30 shorts (videos that are no longer than 1 minute) resulting from each search word were recorded. After discounting duplicates resulting from multiple searches, a total of 96 full-length videos and 93 shorts were analyzed. Video content was evaluated by 3 tools, that is, a custom checklist, a Global Quality Score (GQS), and Modified DISCERN. Readability and sentiment of the transcripts were also assessed.
    RESULTS: There was no significant difference in the mean number of views between full-length videos and shorts (mean 288,563.1, SD 513,598.3 vs mean 188,465.2, SD 780,376.2, P=.30). The former had better content quality in terms of GQS, Modified DISCERN, and the number of mentioned points from the custom checklist (all P<.001). The transcript readability of both types of videos was at a fairly easy level and mainly had a neutral tone. Full-length videos from health sources had a higher content quality in terms of GQS and Modified DISCERN (both P<.001) than their counterparts.
    CONCLUSIONS: The analyzed videos lacked coverage of several important aspects, including the lack of long-term data, the persistence of side effects due to the long half-life of semaglutide, and the risk of counterfeit drugs. It is crucial for the public to be aware that videos cannot replace consultations with physicians.
    Keywords:  Ozempic; Rybelsus; Wegovy; YouTube; assessment; consultation; drugs; health issues; knowledge exchange; long-term data; online; online information; safety; semaglutide; side effects; social media; videos; weight loss
    DOI:  https://doi.org/10.2196/59767
  9. J Med Internet Res. 2025 Apr 08. 27 e56080
       BACKGROUND: An estimated 93% of adults in the United States access the internet, with up to 80% looking for health information. However, only 12% of US adults are proficient enough in health literacy to interpret health information and make informed health care decisions meaningfully. With the vast amount of health information available in multimedia formats on social media platforms such as YouTube and Facebook, there is an urgent need and a unique opportunity to design an automated approach to curate online health information using multiple criteria to meet the health literacy needs of a diverse population.
    OBJECTIVE: This study aimed to develop an automated approach to assessing the understandability of patient educational videos according to the Patient Education Materials Assessment Tool (PEMAT) guidelines and evaluating the impact of video understandability on viewer engagement. We also offer insights for content creators and health care organizations on how to improve engagement with these educational videos on user-generated content platforms.
    METHODS: We developed a human-in-the-loop, augmented intelligence approach that explicitly focused on the human-algorithm interaction, combining PEMAT-based patient education constructs mapped to features extracted from the videos, annotations of the videos by domain experts, and cotraining methods from machine learning to assess the understandability of videos on diabetes and classify them. We further examined the impact of understandability on several dimensions of viewer engagement with the videos.
    RESULTS: We collected 9873 YouTube videos on diabetes using search keywords extracted from a patient-oriented forum and reviewed by a medical expert. Our machine learning methods achieved a weighted precision of 0.84, a weighted recall of 0.79, and an F1-score of 0.81 in classifying video understandability and could effectively identify patient educational videos that medical experts would like to recommend for patients. Videos rated as highly understandable had an average higher view count (average treatment effect [ATE]=2.55; P<.001), like count (ATE=2.95; P<.001), and comment count (ATE=3.10; P<.001) than less understandable videos. In addition, in a user study, 4 medical experts recommended 72% (144/200) of the top 10 videos ranked by understandability compared to 40% (80/200) of the top 10 videos ranked by YouTube's default algorithm for 20 ramdomly selected search keywords.
    CONCLUSIONS: We developed a human-in-the-loop, scalable algorithm to assess the understandability of health information on YouTube. Our method optimally combines expert input with algorithmic support, enhancing engagement and aiding medical experts in recommending educational content. This solution also guides health care organizations in creating effective patient education materials for underserved health topics.
    Keywords:  AI; artificial intelligence; augmented intelligence; cotraining; human-in-the-loop; machine learning; patient education; video analysis; video understandability
    DOI:  https://doi.org/10.2196/56080
  10. Int J Nurs Pract. 2025 Apr;31(2): e70015
       BACKGROUND: Digital healthcare has turned social media, especially YouTube, into a key platform for patient education. Videos on insulin administration attract significant viewership, but content reliability remains a concern.
    AIM: This study aimed to assess the quality and reliability of Turkish YouTube videos on insulin administration.
    METHODS: The first 200 videos on the YouTube platform related to "insulin administration" were reviewed, and 33 videos that met the inclusion criteria were evaluated using DISCERN, the Global Quality Score (GQS), and a guideline-based survey for insulin usefulness.
    RESULTS: Among the 33 videos that met the inclusion criteria, 39.4% were posted by nurses. The usefulness of the videos regarding insulin administration was analyzed using DISCERN and GQS and was evaluated based on their usefulness scores. According to this classification, 45.4% of the videos were found to be very useful, 36.4% were moderately useful, and 18.2% were somewhat useful. The mean ± SD GQS score of the videos was 2.51 ± 1.09 (between "generally poor" and "moderate") and the mean ± SD DISCERN score was 30.21 ± 8.33, indicating that the videos lacked essential evidence-based information.
    CONCLUSION: Many videos raise concerns about their educational value, with only a small portion being highly informative. This negatively impacts health literacy and complicates education. Providing accurate, reliable content is crucial, and nurses can enhance health literacy, safety, and equity through quality materials.
    Keywords:  DISCERN score; YouTube; global quality score; insulin; insulin administration; nurses
    DOI:  https://doi.org/10.1111/ijn.70015
  11. J Korean Med Sci. 2025 Apr 07. 40(13): e34
       BACKGROUND: Extracorporeal membrane oxygenation (ECMO) is a medical intervention employed to provide life-sustaining support for patients. YouTube is a dynamic and widely utilized platform for distributing health-related information. The aim of this study was to evaluate ECMO-related videos on YouTube and assess the frequency of misleading information in the accumulation of ECMO videos.
    METHODS: On September 17, 2024, an in-depth examination on YouTube was conducted using search phrases "Extracorporeal Membrane Oxygenation" and "ECMO treatment." The study included 55 selected videos. Video parameters and sources were analyzed. Content assessments were conducted utilizing the Global Quality Scale (GQS), the modified DISCERN instrument, the Journal of the American Medical Association (JAMA) Benchmark Criteria, and the Patient Education Materials Assessment Tool for Audio/Visual Materials (PEMAT-A/V). The authors conducted comparisons among quality groups.
    RESULTS: Among the 55 videos analyzed, 30.9% (n = 17) were categorized as low quality, 21.8% (n = 12) as intermediate quality, and 47.3% (n = 26) as high quality. Physicians (75%) provided the most high-quality videos. News outlets (83.3%) provided the most low-quality videos. No statistically significant difference was observed between quality groups in daily views, likes, and comments (P > 0.05). Significant correlations were identified between video duration and GQS (r = 0.585), modified DISCERN questionnaire (r = 0.557), JAMA Benchmark Criteria (r = 0.511), PEMAT-A/V Understandability (r = 0.530), and PEMAT-A/V Actionability scores (r = 0.433) (P < 0.001 for all correlation analyses).
    CONCLUSION: There is a wide variety in the quality of YouTube ECMO videos. Although YouTube content created by physicians is more likely to provide accurate and beneficial information, substandard videos present a significant public health threat by disseminating misinformation. The critical role of quality control methods on social media platforms in ensuring the accurate and high-quality transmission of health-related information is readily evident.
    Keywords:  ECMO Treatment; Extracorporeal Membrane Oxygenation; Information Science; Internet; Social Media
    DOI:  https://doi.org/10.3346/jkms.2025.40.e34
  12. Arthroscopy. 2025 Apr 08. pii: S0749-8063(25)00259-2. [Epub ahead of print]
       PURPOSE: To assess the reliability, quality, and completeness of YouTube videos on orthobiologics and evaluate whether the content aligns with current clinical evidence and regulatory guidelines.
    METHODS: One hundred YouTube videos on orthobiologics were analyzed using the Journal of the American Medical Association (JAMA) Benchmark Score, Global Quality Scale (GQS), Modified DISCERN, and a novel Orthobiologics Grading System (OGS). Video views, duration, source, and content type were examined to determine their impact on informational quality.
    RESULTS: Of the 100 videos reviewed, 18 were excluded for reasons such as unrelated content or duplication, leaving 82 for analysis. The average number of views per video was 5,217, with a total of 427,825 views. Most videos (33%) were uploaded by independent users, while only 1% were from government or news agencies. The mean JAMA score was 2.8 (indicating low-moderate transparency and credibility), GQS 3.2 (reflecting moderate overall quality), Modified DISCERN 3.7 (representing moderate reliability in discussion of treatments), and OGS 9.6 (indicating limited comprehensiveness with many videos lacking critical details). There were no significant associations between video source or verification status and any scoring metrics (P > .05). Longer videos were associated with higher JAMA, GQS, DISCERN, and OGS scores (P < .05). Health information websites had higher OGS scores (P = .001).
    CONCLUSION: YouTube videos on orthobiologics demonstrate low to moderate reliability and quality, with limited comprehensiveness. Most content is produced by independent users, with minimal contributions from verified health organizations. Longer videos were associated with higher quality scores, while verification status and video source showed no significant correlation with content quality.
    CLINICAL RELEVANCE: Given YouTube's role as a health information source, this study highlights the need to enhance the quality of educational content on orthobiologics to better support patient understanding and decision-making.
    DOI:  https://doi.org/10.1016/j.arthro.2025.03.062
  13. Urol Int. 2025 Apr 06. 1-17
       INTRODUCTION: The rising prevalence of internet usage and smartphone applications among urology patients underscores the critical role of digital health literacy. This study investigates the acceptability of digital health technologies among urology patients and identifies factors influencing their acceptance.
    MATERIALS AND METHODS: A cross-sectional, anonymous survey consisting of 12 questions was developed based on literature research. It was conducted online using SurveyMonkey and targeted patients in the CUROS network in Germany. The Data were analysed descriptively using SPSS.
    RESULTS: Of 1,039 participants, 99.1% reported using the internet, with 84.4% using it several times daily. YouTube emerged as the most popular social media platform. While 90.2% searched for health information online, trust in online resources was low (mean score 4.63). Only 35.2% used medical apps, but 62.8% expressed willingness to use them if prescribed. Furthermore, 74.2% supported the use of electronic patient records (EPRs), although concerns regarding privacy were noted.
    CONCLUSIONS: Urology patients demonstrate a high engagement with digital resources but express concerns about the reliability of online health information. Enhancing education on digital health tools and fostering trust in these resources is essential for improving patient outcomes and encouraging the integration of digital health in urological care.
    DOI:  https://doi.org/10.1159/000544873