bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–08–10
thirty-one papers selected by
Thomas Krichel, Open Library Society



  1. Med Ref Serv Q. 2025 Aug 03. 1-17
       BACKGROUND: Little is known about whether or how librarians and nursing faculty collaborate to create or use OER resources.
    METHODS: A comprehensive search was performed in 2024 across five databases. Three articles met the inclusion criteria and were analyzed.
    RESULTS: The included articles had unclear reporting regarding the exact nature of the role of the librarian author, but two articles described the work of librarians in the curation of OER materials for a nursing course.
    DISCUSSION: Incorporation of OER offered an opportunity for collaboration as well as descriptions of work that librarians did and suggestions for future opportunities.
    Keywords:  Library and information professionals; nursing; nursing education; open access; review; scoping
    DOI:  https://doi.org/10.1080/02763869.2025.2537070
  2. Front Digit Health. 2025 ;7 1555290
      
    Keywords:  accessibility; credible resources; empowerment; health literacy; internet-based patient education; misinformation
    DOI:  https://doi.org/10.3389/fdgth.2025.1555290
  3. Health Info Libr J. 2025 Aug 09.
       OBJECTIVES: Although the concept of overdiagnosis was first referenced in MEDLINE 100 years ago, consensus on a clear definition has been lacking. In 2021, the MeSH term "Overdiagnosis" was officially introduced, which defined the concept. A key goal of the new term is to improve the reliability of literature searches and enhance the conceptual understanding of overdiagnosis.
    METHODS: We conducted a systematic bibliometric review of all citations indexed under the MeSH term for "Overdiagnosis" in MEDLINE. We compared the citations with citations identified through a text-word search for overdiagnosis not indexed under the MeSH term. Searches were performed on 15 September 2024.
    RESULTS: We found that a higher percentage of citations indexed under the new MeSH term used it according to the definition compared with the text-word search (73.2% vs. 49.5%). The remainder used the term to describe misdiagnosis, false positives, and overtreatment. The citations indexed under the MeSH term were primarily descriptive in nature (68.7%), focusing on oncology (54.2%) and screening practices (31.2%).
    DISCUSSION: Despite advancements, the field of overdiagnosis is still in its early stages, with potential for expansion into studies addressing prevention and mitigation strategies. The introduction of the MeSH term has facilitated some degree of conceptual alignment.
    CONCLUSION: Our review provides insights into the current state of the overdiagnosis literature, emphasising prevalent themes and areas for further research, and improvements in MeSH indexing accuracy. Residual conceptual ambiguity surrounding overdiagnosis terminology and indexing practices may explain discrepancies in MeSH categorisation and definition adherence.
    Keywords:  MeSH; Medline; bibliometrics; indexation; overdiagnosis; review; subheadings
    DOI:  https://doi.org/10.1111/hir.70000
  4. Stud Health Technol Inform. 2025 Aug 07. 329 723-727
      The fundamental process of evidence extraction in evidence-based medicine relies on identifying PICO elements, with Outcomes being the most complex and often overlooked. To address this, we introduce EvidenceOutcomes, a large annotated corpus of clinically meaningful outcomes. A robust annotation guideline was developed in collaboration with clinicians and NLP experts, and three annotators annotated the Results and Conclusions of 500 PubMed abstracts and 140 EBM-NLP abstracts, achieving an inter-rater agreement of 0.76. A fine-tuned PubMedBERT model achieved F1 scores of 0.69 (entity level) and 0.76 (token level). EvidenceOutcomes offers a benchmark for advancing machine learning algorithms in extracting clinically meaningful outcomes.
    Keywords:  Biomedical Literature Research; NLP; PICO Outcomes; RCT
    DOI:  https://doi.org/10.3233/SHTI250935
  5. Stud Health Technol Inform. 2025 Aug 07. 329 2010-2011
      Digital accessibility of websites ensures that websites are usable by people with different abilities, such as older adults. Due to the large number of Arabic speakers, assessing the digital accessibility of websites offering health information to promote inclusivity and allow Arabic speakers equal access to health information is crucial. This study aims to investigate whether websites offering health information for older adults in Arabic are digitally accessible. The WAVE Web Accessibility Evaluation Tool was used to evaluate the accessibility of five websites offering health information for older adults in the Arabic language. All five websites displayed findings of accessibility errors. Accessibility features were present on all websites as well. This study emphasizes the need for digital accessibility in Arabic health websites, for equal access to health information.
    Keywords:  Arabic; Web accessibility; health information; older adults; websites
    DOI:  https://doi.org/10.3233/SHTI251324
  6. J Otolaryngol Head Neck Surg. 2025 Jan-Dec;54:54 19160216251360651
      ImportanceOnline patient education materials (PEMs) and large language model (LLM) outputs can provide critical health information for patients, yet their readability, quality, and reliability remain unclear for Meniere's disease.ObjectiveTo assess the readability, quality, and reliability of online PEMs and LLM-generated outputs on Meniere's disease.DesignCross-sectional study.SettingPEMs were identified from the first 40 Google Search results based on inclusion criteria. LLM outputs were extracted from unique interactions with ChatGPT and Google Gemini.ParticipantsThirty-one PEMs met inclusion criteria. LLM outputs were obtained from 3 unique interactions each with ChatGPT and Google Gemini.InterventionReadability was assessed using 5 validated formulas [Flesch Reading Ease (FRE), Flesch Kincaid Grade Level (FKGL), Gunning-Fog Index, Coleman-Liau Index, and Simple Measure of Gobbledygook Index]. Quality and reliability were assessed by 2 independent raters using the DISCERN tool.Main Outcome MeasuresReadability was assessed for adherence to the American Medical Association's (AMA) sixth-grade reading level guideline. Source reliability, as well as the completeness, accuracy, and clarity of treatment-related information, was evaluated using the DISCERN tool.ResultsThe most common PEM source type was academic institutions (32.2%), while the majority of PEMs (61.3%) originated from the United States. The mean FRE score for PEMs corresponded to a 10th- to 12th-grade reading level, whereas ChatGPT and Google Gemini outputs were classified at post-graduate and college reading levels, respectively. Only 16.1% of PEMs met the AMA's sixth-grade readability recommendation using the FKGL readability index, and no LLM outputs achieved this standard. Overall DISCERN scores categorized PEMs and ChatGPT outputs as "poor quality," while Google Gemini outputs were rated "fair quality." No significant differences were found in readability or DISCERN scores across PEM source types. Additionally, no significant correlation was identified between PEM readability, quality, and reliability scores.ConclusionsOnline PEMs and LLM-generated outputs on Meniere's disease do not meet AMA readability standards and are generally of poor quality and reliability.RelevanceFuture PEMs should prioritize improved readability while maintaining high-quality, reliable information to better support patient decision-making for patients with Meniere's disease.
    Keywords:  Meniere’s disease; artificial intelligence; medical education; quality of life; vertigo
    DOI:  https://doi.org/10.1177/19160216251360651
  7. Stud Health Technol Inform. 2025 Aug 07. 329 961-965
      This study evaluated the readability of ClinicalTrials.gov trial information using traditional readability measures (TRMs) and compared it to summaries generated by large language models (LLMs), specifically ChatGPT and a fine-tuned BART-Large-CNN (FBLC). The study involved: 1) assessing required reading levels (RRL) with TRMs, 2) generating sample LLM-based summaries, and 3) evaluating summary quality based on scores provided by two independent reviewers. The results show that the original ClinicalTrials.gov trial descriptions were scored above the recommended readability level. In contrast, ChatGPT-generated summaries had significantly lower RRLs and higher quality scores. We conclude that ChatGPT shows great promise of creating readable, high-quality summaries. Future research is warranted to assess whether LLMs could be a viable solution to improve the readability of ClinicalTrials.gov to facilitate comprehension by laypersons.
    Keywords:  Readability assessment; clinical trials; large language models
    DOI:  https://doi.org/10.3233/SHTI250982
  8. Public Health. 2025 Aug 01. pii: S0033-3506(25)00322-1. [Epub ahead of print]247 105876
       OBJECTIVES: Generative AI interfaces like ChatGPT offer a new way to access health information, but it is unclear if information presented is credible compared to traditional search engines. This study aimed to compare the credibility of vaccination information across generative AI interfaces and traditional search engines.
    STUDY DESIGN: Cross sectional content analysis and comparison.
    METHODS: Questions were drawn from existing literature on common questions about vaccines and vaccination. Responses were retrieved in December 2023 by querying Google, Bing, Bard, ChatGPT 3.5, ChatGPT 4.0, and Claude AI. Credibility was measured using DISCERN and grade reading score was measured using standard measures via the SHeLL Editor.
    RESULTS: Across 12 questions, traditional search engines scored higher than generative AI in specific aspects of DISCERN, namely clarity of information sources (P < 0.0001), clarity of information recency (P < 0.0001) and provision of additional sources (P < 0.001). Generative AI interfaces performed better in relevance of information (P < 0.0001) and overall quality (P < 0.05).
    CONCLUSION: Overall credibility of generative AI interfaces and traditional search engines is similar, but generative AI interfaces rarely provide sources and external links to high-quality information. In their current forms, generative AI interfaces may make information easy to read and appear credible, without providing typical credibility cues.
    Keywords:  Artificial intelligence; ChatGPT; Misinformation; Vaccination
    DOI:  https://doi.org/10.1016/j.puhe.2025.105876
  9. Front Public Health. 2025 ;13 1605908
       Background: With the rapid advancement and widespread adoption of artificial intelligence (AI), patients increasingly turn to AI for initial medical guidance. Therefore, a comprehensive evaluation of AI-generated responses is warranted. This study aimed to compare the performance of DeepSeek and ChatGPT in answering urinary incontinence-related questions and to delineate their respective strengths and limitations.
    Methods: Based on the American Urological Association/Society of Urodynamics, Female Pelvic Medicine & Urogenital Reconstruction (AUA/SUFU) and European Association of Urology (EAU) guidelines, we designed 25 urinary incontinence-related questions. Responses from DeepSeek and ChatGPT-4.0 were evaluated for reliability, quality, and readability. Fleiss' kappa was employed to calculate inter-rater reliability. For clinical case scenarios, we additionally assessed the appropriateness of responses. A comprehensive comparative analysis was performed.
    Results: The modified DISCERN (mDISCERN) scores for DeepSeek and ChatGPT-4.0 were 28.24 ± 0.88 and 28.76 ± 1.56, respectively, showing no practically meaningful difference [P = 0.188, Cohen's d = 0.41 (95% CI: -0.15, 0.97)]. Both AI chatbots rarely provided source references. In terms of quality, DeepSeek achieved a higher mean Global Quality Scale (GQS) score than ChatGPT-4.0 (4.76 ± 0.52 vs. 4.32 ± 0.69, P = 0.001). DeepSeek also demonstrated superior readability, as indicated by a higher Flesch Reading Ease (FRE) score (76.43 ± 10.90 vs. 70.95 ± 11.16, P = 0.039) and a lower Simple Measure of Gobbledygook (SMOG) index (12.26 ± 1.39 vs. 14.21 ± 1.88, P < 0.001), suggesting easier comprehension. Regarding guideline adherence, DeepSeek had 11 (73.33%) fully compliant responses, while ChatGPT-4.0 had 13 (86.67%), with no significant difference [P = 0.651, Cohen's w = 0.083 (95% CI: 0.021, 0.232)].
    Conclusion: DeepSeek and ChatGPT-4.0 might exhibit comparable reliability in answering urinary incontinence-related questions, though both lacked sufficient references. However, DeepSeek outperformed ChatGPT-4.0 in response quality and readability. While both AI chatbots largely adhered to clinical guidelines, occasional deviations were observed. Further refinements are necessary before the widespread clinical implementation of AI chatbots in urology.
    Keywords:  ChatGPT; DeepSeek; artificial intelligence; comparative analysis; urinary incontinence
    DOI:  https://doi.org/10.3389/fpubh.2025.1605908
  10. BMC Oral Health. 2025 Aug 02. 25(1): 1293
       BACKGROUND: The field of artificial intelligence (AI) has experienced considerable growth in recent years, with the advent of technologies that are transforming a range of industries, including healthcare and dentistry. Large language models (LLMs) and natural language processing (NLP) are pivotal to this transformation. This study aimed to assess the efficacy of AI-supported chatbots in responding to questions frequently asked by patients to their doctors regarding oral health.
    METHODS: Frequently asked questions in the oral health section of the World Dental Federation FDI website were asked about Google-Gemini Trends and ChatGPT-4 chatbots on July 9, 2024. Responses from ChatGPT and Gemini, as well as those from the FDI webpage, were recorded. The accuracy of the responses given by ChatGPT-4 and Gemini to the four specified questions, the detection of similarities and differences, and the comprehensive examination of ChatGPT-4 and Gemini's capabilities were analyzed and reported by the researchers. Furthermore, the content of the texts was evaluated in terms of their similarity with respect to the following criteria: "Main Idea," "Quality Analysis," "Common Ideas," and "Inconsistent Ideas."
    RESULTS: It was observed that both ChatGPT-4 and Gemini exhibited performance comparable to that of the FDI responses in terms of completeness and clarity. Compared with Gemini, ChatGPT-4 provided responses that were more similar to the FDI responses in terms of relevance. Furthermore, ChatGPT-4 provided responses that were more accurate than those of Gemini in terms of the "Accuracy" criterion.
    CONCLUSIONS: This study demonstrated that, according to the assessment conducted by FDI, the ChatGPT-4 and Gemini applications contain contemporary and comprehensible information in response to general inquiries concerning oral health. These applications are regarded as a prevalent and dependable source of information for individuals seeking to access such data.
    Keywords:  Artificial intelligence; Consumer health information; Informatics applications
    DOI:  https://doi.org/10.1186/s12903-025-06624-9
  11. Clin Exp Dent Res. 2025 Aug;11(4): e70195
       OBJECTIVE: To investigate the quality of online information provided by dental-related websites regarding periodontal surgery.
    METHODS: The term "Gum Surgery" was entered into three search engines (Google, Yahoo, and Bing). The content of websites satisfying selection criteria was assessed with five validated quality of information tools (DISCERN, The Patient Education Materials Assessment Tool [PEMAT], Journal of the American Medical Association [JAMA] benchmarks, and HONCode and @TRUST certification). The Simple Measure of Gobbledygook (SMOG) was used to evaluate the readability of content.
    RESULTS: A total of 55 websites satisfied selection criteria. The mean (SD) DISCERN score for all website categories was 2.89 (0.57). The quality of information related to the risks of each treatment scored poorly in most websites. The healthcare portals obtained the highest mean PEMAT score of 71.74%, a statistically significant outcome. Healthcare portal websites also recorded the highest mean (SD) JAMA score of 3.72 (0.75) out of 4. The mean (SD) SMOG score was 9.56 (1.07). Cohen's κ inter-rater reliability for DISCERN and PEMAT scores were 0.75 and 0.79, respectively.
    CONCLUSION: The information available online about periodontal surgery was variable and difficult to read, falling short of established standards for accuracy, reliability, and credibility. Vital information was often omitted.
    Keywords:  health literacy; online information; periodontal surgery; quality of information; website readability
    DOI:  https://doi.org/10.1002/cre2.70195
  12. J Med Imaging Radiat Oncol. 2025 Aug 09.
       INTRODUCTION: The emergence of search engines powered by artificial intelligence and large language models (LLMs), such as ChatGPT, provides easy access to seemingly accurate health information. However, the accuracy of the information produced is uncertain. The purpose of this research is to assess the quality of information produced by ChatGPT about the treatment of health conditions commonly managed by Interventional Radiologists (IRs).
    METHODS: ChatGPT was asked "what is the best treatment" in relation to six conditions commonly managed by IRs. The output statements were assessed using the DISCERN instrument and compared against the current evidence base for the management of those conditions.
    RESULTS: Six conditions were assessed. The mean overall score for the ChatGPT output statements was 1.3 compared to 3.8 for the reference articles. This poor performance by ChatGPT is largely attributable to the lack of transparency regarding sources. Although ChatGPT performed well in some areas such as presenting information in an unbiased manner, it showed significant weaknesses regarding source materials, the risks and benefits of each treatment, and the treatment's mechanism of action.
    CONCLUSION: LLMs signify a considerable shift in how patients obtain and consume medical information. Understanding the strengths and weaknesses of ChatGPT's outputs regarding conditions commonly treated by IRs will enable tailored messaging and constructive discussions with patients in consultation with their IR.
    Keywords:  ChatGPT; artificial intelligence; education; interventional radiology
    DOI:  https://doi.org/10.1111/1754-9485.13881
  13. Am J Otolaryngol. 2025 Jul 29. pii: S0196-0709(25)00113-9. [Epub ahead of print]46(5): 104710
       OBJECTIVE: Head and neck cancers (HNCs) are a significant global health concern, contributing to substantial morbidity and mortality. AI-powered chatbots such as ChatGPT, Google Gemini, Microsoft Copilot, and Open Evidence are increasingly used by patients seeking health information. While these tools provide immediate access to medical content, concerns remain regarding their reliability, readability, and potential impact on patient outcomes.
    METHODS: Responses to 25 patient-like HNC symptom queries were assessed using four leading AI platforms: ChatGPT, Google Gemini, Microsoft Copilot, and Open Evidence. Responses were evaluated using modified DISCERN criteria for quality and SMOG scoring for readability, with ANOVA and post hoc analysis conducted afterward.
    RESULTS: Microsoft Copilot achieved the highest mean DISCERN score of 41.40 (95 % CI: 40.31 to 42.49) and the lowest mean SMOG reading levels of 12.56 (95 % CI: 11.82 to 13.31), outperforming ChatGPT, Google Gemini, and Open Evidence in overall quality and accessibility (p < .001). Open Evidence scored lowest in both quality averaging 30.52 (95 % CI: 27.52 to 33.52) and readability of 17.49 (95 % CI: 16.66 to 18.31), reflecting a graduate reading level.
    CONCLUSION: Significant variability exists in the readability and quality of AI-generated responses to HNC-related queries, highlighting the need for platform-specific validation and oversight to ensure accurate, patient-centered communication.
    LEVEL OF EVIDENCE: Our study is a cross-sectional analysis that evaluates chatbot responses using established grading tools. This aligns best with level 4 evidence.
    Keywords:  Artificial Intelligence; Head and neck cancer patient education
    DOI:  https://doi.org/10.1016/j.amjoto.2025.104710
  14. J Cancer Educ. 2025 Aug 05.
      The Internet has become a major source of health-related information for patients, yet the quality of online content is unregulated, leading to the potential for misinformation and patient distress. This study aims to assess the quality of online patient information on lymphedema. A search for the term "lymphedema" was conducted on the three most popular search engines (Google, Yahoo, and Bing), identifying the first 100 relevant websites, yielding 300 websites in total. These websites were evaluated using the expanded Ensuring Quality Information for Patients (EQIP) instrument, which consists of 36 items. From 300 websites identified, 105 (35%) met the criteria for final analysis, after excluding duplicates, irrelevant sites, non-English content, sites requiring user accounts, and scientific publications targeting professionals. The median EQIP score of these sites was 22, indicating suboptimal quality based on the EQIP criteria. A significant number of sites (62.9%) did not mention surgical treatment as an option for lymphedema, and 75.2% failed to describe treatment risks. Additionally, 98.1% did not report quantitative risks. Our analysis using the EQIP tool revealed that the quality of online information about lymphedema is generally poor, largely due to the omission of key content elements such as treatment options, associated risks, and quantitative data.
    Keywords:  Internet; Lymphedema; Lymphedema surgery; Patient information; Quality; Surgery
    DOI:  https://doi.org/10.1007/s13187-025-02691-2
  15. Aesthetic Plast Surg. 2025 Aug 08.
       BACKGROUND: The use of artificial intelligence (AI) chatbots has demonstrated considerable promise in assisting medical consultations. However, their potential for application in online hair transplantation consultations remains largely unexplored.
    OBJECTIVES: This study aims to assess the effectiveness of AI chatbots in responding to patient inquiries during online hair transplantation consultations.
    METHODS: We evaluated responses to 10 common patient questions collected from online hair transplantation clinics, comparing answers generated by three AI chatbots-ChatGPT-4o mini, Claude 3.5 Sonnet, and Gemini Advanced-with those from senior surgeons. Each response was scored based on medical accuracy, empathy, understandability, actionability, and readability, with a focus on determining how well AI can match or exceed human expert performance.
    RESULTS: All three AI chatbots matched or outperformed the response capabilities of senior surgeons in medical accuracy, empathy, understandability, actionability, and readability. Among them, Gemini Advanced showed the most comprehensive advantages, including significantly higher scores in medical accuracy (4.5 vs. 3.9, P < .001), empathy (4.9 vs. 2.5, P < .001), and understandability (82.7% vs. 63.9%, P < .001). Additionally, Gemini Advanced demonstrated a lower Flesch-Kincaid Grade Level (10.5 vs. 18.7, P < .001) and higher Flesch Reading Ease Score (40.3 vs. 16.2, P < .001), suggesting better readability.
    CONCLUSION: AI chatbots show strong potential for use in online hair transplantation consultations, providing accurate, empathetic, and easily understandable responses. Nevertheless, challenges such as privacy concerns, ethical considerations, and potential biases need to be addressed before their adoption in clinical practice.
    LEVEL OF EVIDENCE IV: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
    Keywords:  Artificial intelligence (AI); Chatbot; Hair transplantation; Online consultation; Telemedicine
    DOI:  https://doi.org/10.1007/s00266-025-05103-4
  16. Stud Health Technol Inform. 2025 Aug 07. 329 1586-1587
      Whether ChatGPT's answers to medical questions are accurate, reliable, and trustworthy, and whether the public, not having a health background, knows how to evaluate ChatGPT's answers remains unclear. This study assessed ChatGPT's performance in answering medical questions posed by the public. An existing clinical question dataset of consumer questions from the NIH Genetic and Rare Diseases Information Center (GARD) was used for this study. API calls produced 1467 question-answer pairs for GPT-4-0613 (ChatGPT-4.0). 100 question-answer pairs were randomly selected as the sample of this study. They were evaluated on two criteria, Scientific Accuracy, and Comprehensiveness, with scales from 0 to 5. The results showed that ChatGPT-4.0 provided about 90% above average performance on scale points 4 and 5 on Scientific Accuracy, 84% on Comprehensiveness, and approximately 7% and 14% on average on scale point 3 on these two quality criteria. No statistical differences were found in the quality of answers to the questions following a question framework and those without. These study results indicate that healthcare consumers must consult healthcare providers and/or other reliable information resources to verify the answers and gauge the applicability to individual situations. Further studies can include investigating the impact of how the medical questions were asked on the quality of ChatGPT answers, comparing healthcare consumers' evaluations with healthcare and information professionals' evaluations, and so on.
    Keywords:  ChatGPT; Consumer Informatics; Evaluation; Information Quality
    DOI:  https://doi.org/10.3233/SHTI251114
  17. A A Pract. 2025 Aug 01. 19(8): e02025
      Epidural analgesia is a common labor pain management method, yet patients often require clear education on its risks and benefits. This study assessed 2 large language models (LLM), ChatGPT (OpenAI) and MediSearch (Create.ai), as potential tools for educating patients about epidural analgesia. Using 100 patient-focused questions categorized by the Rothwell System, responses were evaluated with DISCERN reliability scores and readability metrics (Flesch-Kincaid Grade Level, Coleman-Liau Index). MediSearch achieved higher reliability (P < .0001), while ChatGPT excelled in readability (P =.0013). Findings highlight an imbalance between reliability and readability between the 2 LLMs.
    DOI:  https://doi.org/10.1213/XAA.0000000000002025
  18. Breast Cancer Res Treat. 2025 Aug 06.
       PURPOSE: Breast cancer remains a global public health burden. This study aimed to evaluate the readability of breast cancer articles shared on X (formerly Twitter) during Breast Cancer Awareness Month (October 2024), and it explores the possibility of using artificial intelligence (AI) to improve readability.
    METHODS: We identified the top articles (n = 377) from posts containing #breastcancer on X during October 2024. Each article was analyzed using 9 established readability tests: Automated Readability Index (ARI), Coleman-Liau, Flesch-Kincaid, Flesch Reading Ease, FORCAST Readability Formula, Fry Graph, Gunning Fog Index, Raygor Readability Estimate, and Simple Measure of Gobbledygook (SMOG) Readability Formula. The study categorized sharing entities into five groups: academic medical centers, healthcare providers, government institutions, scientific journals, and all others. This comprehensive approach aimed to evaluate the readability of breast cancer articles across various sources during a critical awareness period of peak public engagement. A pilot study was simultaneously conducted using AI to improve readability. Statistical analysis was performed using SPSS.
    RESULTS: A total of 377 articles shared by the following entities were analyzed: academic medical centers (35, 9.3%), healthcare providers (57, 15.2%), government institutions (21, 5.6%), scientific journals (63, 16.8%), and all others (199, 53.1%). Government institutions shared articles with the lowest average readability grade level (12.71 ± 0.79). Scientific journals (16.57 ± 0.09), healthcare providers (15.49 ± 0.32), all others (14.89 ± 0.17), and academic medical centers (13.56 ± 0.39) had higher average readability grade levels. Article types were also split into different categories: patient education (222, 58.9%), open-access journal (119, 31.5%), and full journal (37, 9.6%). Patient education articles (15.19 ± 0.17) had the lowest average readability grade level. Open-access and full journals had an average readability grade level of 16.65 ± 0.06 and 16.53 ± 0.10, respectively. The mean values for Flesch Reading Ease Score are patient education 38.14 ± 1.2, open-access journals 16.14 ± 0.89, full journals 17.69 ± 2.14. Of note, lower readability grade levels indicate easier-to-read text, while higher Flesch Reading Ease scores indicate more ease of reading. In a demonstration using AI to improve readability grade level of 5 sample articles, AI successfully lowered the average readability grade level from 12.58 ± 0.83 to 6.56 ± 0.28 (p < 0.001).
    CONCLUSIONS: Our findings highlight a critical gap between the recommended and actual readability levels of breast cancer information shared on a popular social media platform. While some institutions are producing more accessible content, there is a pressing need for standardization and improvement across all sources. To address this issue, sources may consider leveraging AI technology as a potential tool for creating patient resources with appropriate readability levels.
    Keywords:  Breast cancer; Health literacy; Online health information; Patient education; Readability
    DOI:  https://doi.org/10.1007/s10549-025-07799-z
  19. J Hum Nutr Diet. 2025 Aug;38(4): e70107
       BACKGROUND: Paediatric tube feeding is a crucial intervention for children unable to meet nutritional needs orally, yet information available to families is often insufficient. This study explores the availability and quality of online patient education materials (OPEMs) on paediatric tube feeding and discusses their applicability to Aotearoa New Zealand.
    METHODS: A naturalistic search strategy mimicking how parents would use Google search was employed using pre-defined search terms regarding support for paediatric tube feeding. Webpages were included if they provided educational content for caregivers. The first page of each search was screened, and webpages within relevant Aotearoa sites were also reviewed. Data on accessibility, readability, understandability, actionability, content analysis and completeness were collected and analysed.
    RESULTS: Fifty-nine webpages were included and analysed. Readability consistently exceeded the maximum recommended eighth-grade level. Official sources targeting parents scored high in understandability and actionability. Official webpages also demonstrated the highest content coverage. The content analysis identified 34 individual topic codes. Rarely addressed topics included emotional aspects of the child and social management of tube feeding. Some content provided general advice that did not account for variations in children's medical conditions, developmental stages or family contexts.
    CONCLUSION: These findings are clinically relevant, guiding professionals on the effective use of existing OPEMs. Despite high understandability and actionability scores from some sources, significant gaps remain. OPEMs have the potential to improve health equity by improving website content, centralising information and enhancing access to education that empowers family-centred care and wellbeing.
    Keywords:  health information; paediatrics; parent; tube feeding; tube weaning
    DOI:  https://doi.org/10.1111/jhn.70107
  20. Health Informatics J. 2025 Jul-Sep;31(3):31(3): 14604582251363538
      The aim was to evaluate the content of videos titled "How to administer subcutaneous immunoglobulin in immunodeficiency" on YouTube. The search term 'How to administer subcutaneous immunoglobulin in immunodeficiency?' was searched on YouTube™ (https://www.youtube.com) and the first 200 videos were reviewed on December 16, 2023. The majority of the 40 videos included in the study were uploaded by patients (62.5%). It was found that the understandable rate of patients' uploads was significantly lower (4.0%) than other (46.7%) (p = .000). The number of likes and comments per 1000 views were higher in the patient group (p = .000, p = .006, respectively), but the GQS and mDISCERN scores were lower and statistically significant (p = .040, p = .000, respectively). Healthcare professionals and organizations have not shared enough videos on the use of subcutaneous immunoglobulin, and studies on this subject appear insufficient. In addition, a control mechanism is needed for video content on the internet related to health.
    Keywords:  YouTube; actionability; immunodeficiency; quality; reliability; subcutaneous immunoglobulin; understandability
    DOI:  https://doi.org/10.1177/14604582251363538
  21. PLoS One. 2025 ;20(8): e0329291
       OBJECTIVE: This study aimed to assess the scientific accuracy, content quality, and educational value of YouTube™ videos related to computer-controlled local anesthesia (CCLA) techniques in dentistry.
    MATERIALS AND METHODS: A total of 100 videos were screened using predefined keywords, and 48 met the inclusion criteria. Videos were assessed using the Global Quality Scale (GQS), DISCERN tool, JAMA benchmark criteria, and the Video Information and Quality Index (VIQI). Scientific content was scored using a structured rubric across six domains. Interobserver reliability was evaluated using Weighted Kappa and Intraclass Correlation Coefficient (ICC). Confidence intervals were calculated for key metrics. Group comparisons were performed using the Mann-Whitney U test, and correlations were analyzed using Spearman's rho (p < 0.05).
    RESULTS: Videos from academic sources had significantly higher scores across all quality and reliability indicators. The mean GQS was 2.6 (95% CI: 2.3-2.9), DISCERN 11.7 (95% CI: 10.8-12.6), JAMA 1.8 (95% CI: 1.7-1.9), and VIQI 12.5 (95% CI: 11.7-13.3). Strong positive correlations were found between DISCERN and VIQI (r = 0.809), and between total content score and both DISCERN (r = 0.803) and VIQI (r = 0.655).
    CONCLUSION: Although YouTube™ provides accessible information on CCLA, many videos lack scientific rigor and educational depth. Content produced by academic institutions is significantly more reliable. Dental educators are encouraged to integrate high-quality video content into curricula to improve media literacy and student learning outcomes.
    DOI:  https://doi.org/10.1371/journal.pone.0329291
  22. J Craniofac Surg. 2025 Aug 07.
       BACKGROUND: Digital platforms have become significant sources for patients to obtain health information. As a widely used platform, YouTube contains numerous health-related videos; however, the accuracy and reliability of this content are often questioned.
    OBJECTIVE: This study aims to assess the quality and reliability of YouTube videos related to Eustachian tube dysfunction (ETD).
    METHODS: In this study, 41 YouTube videos on ETD were evaluated using the DISCERN scale by 2 ENT specialists. In addition, the relationship between video parameters and the type of publishing organization (healthcare professional versus non-healthcare), as well as the presence of surgical content, was analyzed.
    RESULTS: Of the evaluated videos, 53.7% were published by healthcare professionals. The average DISCERN score was 3.96±0.75 (Researcher 1) and 3.97±0.76 (Researcher 2). Videos published by healthcare organizations had significantly higher DISCERN scores than those published by other sources (P=0.002).
    CONCLUSIONS: YouTube videos on ETD are generally of moderate quality, with those published by healthcare professionals being more reliable. Improving the quality of health-related videos necessitates professional input and guidance.
    Keywords:  Eustachian tube dysfunction; YouTube videos; online health information; online patient education videos
    DOI:  https://doi.org/10.1097/SCS.0000000000011736
  23. Medicine (Baltimore). 2025 Aug 01. 104(31): e43628
      Epidural blood patch (EBP) is widely recognized as the definitive treatment for postdural puncture headache, a rare but debilitating complication of spinal anesthesia or unintentional dural puncture. However, the lack of sufficient research on available online content about EBP, especially on social media platforms such as YouTube and TikTok, raises concerns about its reliability. In this context, this study was conducted to evaluate the quality and reliability of YouTube and TikTok videos about EBP and to identify sources that provide high-quality and reliable information. The material of this cross-sectional, observational study consisted of videos about EBP uploaded to YouTube and TikTok. Two independent reviewers evaluated these videos for their reliability and quality using the DISCERN instrument, the Journal of the American Medical Association (JAMA) benchmark, and the global quality scale. The median duration of the 72 videos included in the study material was 233 seconds. Of these videos, 77.8% were uploaded by patients. Only 13.9% of the videos were rated as high-quality based on DISCERN scores, and 81.9% were deemed insufficient by JAMA standards. The reliability and quality scores of the videos uploaded by doctors and healthcare channels were significantly higher than those uploaded by patients (P < .05). TikTok videos were significantly shorter and received significantly more likes than YouTube videos (P = .028). On the other hand, the overall quality of YouTube videos was significantly higher than TikTok videos (P < .05). Most YouTube and TikTok videos on EBP, especially those uploaded by patients, were of low quality and reliability. Videos uploaded by doctors and healthcare channels were typically more reliable than those uploaded by patients, highlighting the need for increased professional contributions to improve the quality of online health content.
    Keywords:  TikTok; YouTube; consumer health information; digital technology; epidural blood patch; postdural puncture headache; social media
    DOI:  https://doi.org/10.1097/MD.0000000000043628
  24. Digit Health. 2025 Jan-Dec;11:11 20552076251365086
       Objective: Short videos are increasingly being used to disseminate health information. However, the quality of videos on common ophthalmic conditions such as cataract has not been systematically evaluated.
    Methods: This study employed a cross-sectional design. The TikTok platform was searched using the term "cataract" from 20:00 to 24:00 on 8 November 2024, without any restrictions. The top 100 retrieved videos were included in the study. They were rated using The Journal of American Medical Association (JAMA) benchmark criteria, Global Quality Score (GQS) scale, modified Decision-making Information Support Criteria for Evaluating the Reliability of Non-randomised Studies score, and Patient Education Materials Assessment Tool for Audio Visual Content. Videos by different groups were compared for quality and their underlying factors.
    Results: The top 100 videos had an average of 2009.1 likes, 795.65 comments, 2628.91 shares, and 554.08 saves. Their JAMA benchmark criteria, GQS, the modified Decision-making Information Support Criteria for Evaluating the Reliability of Non-randomised Studies score, and the Patient Education Materials Assessment Tool for Audio Visual Content score ratings differed (p < .05) with account ownership, doctor rank, and video content. More videos were uploaded by institutions and physicians than by nonphysicians (p < .05). The number of likes, comments, favorites, and shares of videos was not correlated with quality (Spearman correlation; p > .05). Further regression analysis confirmed that video quality can be predicted using account ownership.
    Conclusion: The quality of cataract-related short videos on platforms has room for improvement. Users may estimate video quality based on the identity of the content creator.
    Keywords:  Cataract; GQS scale; JAMA benchmark criteria; PEMAT-A/V; TikTok; modified DISCERN score
    DOI:  https://doi.org/10.1177/20552076251365086
  25. J Pediatr Soc North Am. 2025 Aug;12 100207
       Background: The lack of presence by spine deformity surgeons on TikTok, in addition to the platform's unregulated nature, raises concerns about the potential spread of misinformation regarding pediatric orthopaedic conditions. The purpose of this study is to assess the prevalence of scoliosis misinformation on TikTok with a specific focus on what types of scoliosis content contain the most misinformation and which content creators produce the most videos containing this misinformation.
    Methods: A comprehensive search was conducted on TikTok using the following hashtags: #scoliosis, #scoliosischiropractor, #scoliosisbrace, #scoliosissurgery. A total of 239 videos were reviewed over a 7-day period by three reviewers. Videos were categorized based on tone, content type, and healthcare provider involvement. Three pediatric spine surgeons reviewed flagged videos for misinformation related to scoliosis. Quality assessment was performed using the Global Quality Scale (GQS) and the DISCERN scoring system, with a score of 5 denoting the highest quality.
    Results: TikTok videos related to scoliosis received on average 2.4 million views. Most TikTok scoliosis videos, 72.8% (n = 174), were created by patients sharing their experiences. When measuring video content quality, videos by physicians scored significantly higher with mean DISCERN and GQS scores of 3.3 ± 0.5 and 3.7 ± 0.4, respectively, compared to chiropractors with mean DISCERN and GQS scores of 2.3 ± 0.6 and 2.5 ± 0.5 (P < .0001). Forty-four percent (n = 24) of videos offering scoliosis advice were found to contain misinformation. The majority of these misinformation videos were produced by chiropractors (46%, n = 11) compared to physicians (12.5%, n = 3), although this was not statistically significant. Videos containing misinformation related to scoliosis garnered 2.2 ± 5.2 million views versus videos that did not contain misinformation, which received 1.6 ± 5.1 million views (P = .7).
    Conclusions: Chiropractors are the most frequent healthcare providers offering scoliosis advice on TikTok. The quality of information presented by chiropractors was found to be significantly lower than that of physicians. Spine deformity surgeons should be aware of TikTok's market dominance and provide high-quality information to counter the misinformation currently present on the platform related to scoliosis.
    Key Concepts: (1)The limited presence of spine deformity surgeons on TikTok contributes to the spread of scoliosis misinformation.(2)Patient-generated TikTok videos dominate scoliosis content but frequently lack evidence-based guidance.(3)Chiropractors are the most common healthcare providers posting scoliosis advice, although their content often scores lower in reliability.(4)Physician-led videos generally demonstrate higher DISCERN and GQS scores, emphasizing the value of expert-produced content.(5)Greater involvement of spine deformity surgeons on TikTok could reduce misinformation and improve patient education.
    Level of Evidence: Level IV study.
    Keywords:  Misinformation; Scoliosis; Social media
    DOI:  https://doi.org/10.1016/j.jposna.2025.100207
  26. Sci Rep. 2025 Aug 07. 15(1): 28967
      Orthognathic surgery is commonly used to correct dentofacial deformities for both functional and aesthetic reasons. As social media increasingly becomes a source of health information, concerns have arisen regarding the quality and reliability of such content. This study aimed to evaluate and compare the quality and reliability of orthognathic surgery-related videos on BiliBili and TikTok, and to identify factors influencing video quality. A search using the term "orthognathic surgery" was conducted on both platforms in February 2025. Videos were categorized by content and source, and assessed using the Global Quality Scale (GQS) and modified DISCERN (mDISCERN) tools. The results showed that videos on BiliBili had significantly higher quality and reliability than those on TikTok (P < 0.001), but the overall quality across both platforms was poor (mean GQS = 2.20; mDISCERN = 1.77). Videos created by medical professionals, particularly those focused on disease knowledge, were significantly more reliable (P < 0.05). Orthodontists' videos scored higher than those from orthognathic surgeons (P < 0.05). Positive correlations were observed between video quality and reliability, number of saves, shares, and duration. These findings highlight the need for better regulation of online medical content and encourage greater involvement of healthcare professionals in producing accurate, high-quality health information.
    Keywords:  Health education; Orthognathic surgery; Social media; Video quality
    DOI:  https://doi.org/10.1038/s41598-025-13941-0
  27. Zhonghua Nei Ke Za Zhi. 2025 Aug 01. 64(8): 759-765
      Objective: To evaluate the quality of information on autoimmune liver disease in videos on the TikTok short video platform. Methods: The keyword "autoimmune liver disease" was used to search the top 200 videos on TikTok in the default sorting order. Using the DISCERN video quality assessment tool and the structured content integrity evaluation tool, we assessed the quality of the information in each video in relation to the pertinent disease guidelines. Furthermore, we investigated any relationships between the quality of the videos and their characteristics (likes, comments, retweets, days and duration of uploading). Results: A total of 140 videos were included, 96.4% of which were provided by medical professionals. The content completeness scores for each dimension were as follows: definition, 1.0 (0.0, 1.0); symptoms, 0.0 (0.0, 1.0); risk factors, 0.0 (0.0, 0.5); assessment, 0.5 (0.0, 1.5); management, 0.5 (0.0, 1.0); and outcome, 0.0 (0.0, 1.0). Furthermore, 91.4% of videos with DISCERN scores of ≤50 were of "fair" quality or below. Additionally, the difference in DISCERN scores between videos from different publishers was not statistically significant (P>0.05). The number of likes, comments, favorites, retweets, and video duration had a strong positive correlation with the overall DISCERN score (r=0.17, 0.18, 0.25, 0.26, 0.44, all P<0.05). Conclusions: The overall quality of videos related to autoimmune liver disease on the TikTok video platform is low. Therefore, publishers should focus on the comprehensiveness and accuracy of the information. Additionally, the TikTok platform should optimize its video review mechanism to provide the public with more accurate and reliable health information.
    DOI:  https://doi.org/10.3760/cma.j.cn112138-20241101-00722
  28. Front Vet Sci. 2025 ;12 1628421
       Background: Many people seek health-related information online, not only for themselves but also on behalf of others who cannot articulate their symptoms. This proxy information-seeking behavior is particularly relevant for animal owners, who must interpret their animals' symptoms without direct verbal feedback. While online health information-seeking in the context of one's own health is well-studied, the specific challenges of searching by proxy, especially for animal health information, remain largely unexplored.
    Objective: This study aimed to determine the specific information needs and search behavior of animal owners. As a case study, horse owners were selected, representing a group regularly searching the web for health-related advice concerning their animals.
    Methods: A mixed-methods approach was used with 17 horse owners in Germany. Participants first described a recent search for equine health information. They were then shown a video of a horse experiencing an asthma attack and asked to conduct a search on how to proceed with the horse's condition. Afterwards, they were questioned about their respective search behavior.
    Results: The participants' main initial questions revolved around the cause of the horse's condition, its urgency of veterinary treatment and the cost of treatment. All participants chose the Google search engine as the starting point for their search and formulated an average of 3.71 (SD: 2.02) queries. Each of these queries contained an average of 3.81 words (SD: 1.57). Most searches (52%) were evidence-directed with 29% using multiple descriptors of the horse's situation. An average of 0.97 results (SD: 1.38) were clicked per query, with titles containing all search terms in 13% of cases. Participants reported experiencing several barriers to their search, including difficulties in formulating precise queries and the need for additional guidance during the search process.
    Conclusion: The findings highlight the need for improved online information systems, offering better guidance, context-aware search support, and trustworthy sources. The insights could inform veterinarians on how to better address their clients' communication and information needs, provide them with the skills and knowledge they need to conduct online research and therefore build a better animal health partnership with them.
    Keywords:  animal health information seeking; animal owners; horse health information seeking; human computer interaction; online health information seeking; proxy seeking
    DOI:  https://doi.org/10.3389/fvets.2025.1628421
  29. J Adolesc Health. 2025 Aug 06. pii: S1054-139X(25)00240-X. [Epub ahead of print]
      Given the importance of adolescents' well-being and the growing accessibility of health resources, it is imperative to examine the adolescents' health information-seeking behaviors. The aim of this review is to provide a thorough assessment of the health information-seeking behavior of adolescents as evidenced in contemporary literature. A literature search using appropriate keywords was conducted from 2013 to 2023 across various academic databases, including, Web of Science, Scopus, PubMed, Embase, ProQuest, Cochrane, and Persian databases. Initially, 9,162 publications were identified through database searches. After deduplication, 7,774 unique publications were screened, yielding 62 articles that satisfied the inclusion criteria. Findings revealed that adolescents' health information needs often relate to sexual health, disease conditions, and lifestyle aspects. Adolescents typically obtain health information from both the internet and their parents. Several factors influence adolescents' information-seeking behavior, with gender, age, and information-seeking skills garnering more attention than other factors. Despite a plethora of recent investigations into adolescent health information, the complexity of the topic suggests that future research should explore adolescents' use of social networks and their evaluation of health information using a mixed-method approach and encouraging interdisciplinary collaboration.
    Keywords:  Adolescents; Health information; Health information behavior; Information-seeking behavior
    DOI:  https://doi.org/10.1016/j.jadohealth.2025.05.030