bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–10–12
25 papers selected by
Thomas Krichel, Open Library Society



  1. Health Info Libr J. 2025 Oct 07.
      Managing and applying evidence from research and learning from experience to better effect are part of the solution to the challenges faced by healthcare systems. Health library and information professionals often struggle to convey what is meant by 'knowledge mobilisation'. This editorial examines definitions of 'knowledge' and 'knowledge mobilisation' in the context of information overload. Drawing on prior experience and existing Knowledge Management models and related frameworks, it offers a synthesis of these to identify key dimensions of knowledge mobilisation in the practice of information professionals, and examines the information functions required to mobilise knowledge. Aiming to support more effective communication, 'knowledge mobilisation' is expressed using three approaches: a mnemonic, a diagram and a table. The ambition is to stimulate dialogue and build consensus, potentially by conducting a modified e-Delphi study, in order to assist health librarians and knowledge managers to better position themselves to engage in knowledge mobilisation.
    Keywords:  evidence‐based medicine (EBM); knowledge management; knowledge mobilisation; knowledge transfer; knowledge translation
    DOI:  https://doi.org/10.1111/hir.70004
  2. Biomed Hub. 2025 Jan-Dec;10(1):10(1): 162-170
       Introduction: The internet is a major source of medical information for patients, yet the quality of online health content remains highly variable. Existing assessment tools are often labor-intensive, invalidated, or limited in scope. We developed and validated MedReadr, an in-browser, rule-based natural language processing (NLP) algorithm that automatically estimates the reliability of consumer health articles for patients and providers.
    Methods: Thirty-five consumer medical articles were independently assessed by two reviewers using validated manual scoring systems (QUEST and Sandvik). Interrater reliability was evaluated with Cohen's κ, and metrics with κ > 0.6 were selected for model fitting. MedReadr extracted key features from article text and metadata using predefined NLP rules. A multivariable linear regression model was trained to predict manual reliability scores, with internal validation performed on an independent set of 20 articles.
    Results: High interrater reliability was achieved across all QUEST and most Sandvik domains (Cohen's κ > 0.6). The MedReadr model demonstrated strong performance, achieving R 2 = 0.90 and RMSE = 0.05 on the development set and R 2 = 0.83 and RMSE = 0.07 on the validation set. All model coefficients were statistically significant (p < 0.05). Key predictive features included currency and reference scores, sentiment polarity, engagement content, and the frequency of provider contact, intervention endorsement, intervention mechanism, and intervention uncertainty phrases.
    Conclusion: MedReadr demonstrates that structural reliability scoring of online health articles can be automated using a transparent, rule-based NLP approach. Applied to English-language articles from mainstream search results on common medical conditions, the tool showed strong agreement with validated manual scoring systems. However, it has only been validated on a narrow scope of content and is not designed to analyze search results for specific questions or detect misinformation. Future research should assess its performance across a broader range of web content and evaluate whether its integration improves patient comprehension, digital health literacy, and clinician-patient communication.
    Keywords:  Algorithms; Consumer health information; Information literacy; Internet use
    DOI:  https://doi.org/10.1159/000548163
  3. Rehabil Nurs. 2025 Oct 10.
      A person who experiences a spinal cord injury (SCI) is often faced with significant alterations to nearly every aspect of their day-to-day life. Providing patients with SCI and their families all the key education during an inpatient rehabilitation stay is challenging for nurses due to short inpatient stays. Rehabilitation nurses and other clinicians must prioritize teaching the information and skills most essential for a safe discharge. After transitioning back into the community, people with SCI will have significant ongoing needs for trustworthy health information that will help them restore, maintain, and promote optimal health and function. This article provides background information and selected trustworthy resources, developed by health professionals and persons with SCI, that nurses can share with their clients. We also provide an online resource that can be shared with persons with SCI and their families that covers educational sources, health information, individual support, and online peer groups and mentoring.
    Keywords:  Consumer health information; health promotion; patient education as topic; rehabilitation nursing; spinal cord injuries
    DOI:  https://doi.org/10.1097/RNJ.0000000000000518
  4. J Clin Exp Dent. 2025 Sep;17(9): e1099-e1107
       Background: Large Language Models (LLMs) are transforming clinical decision-making by offering rapid, context-aware access to evidence-based knowledge. However, their efficacy in pediatric dentistry remains underexplored, especially across multiple LLM platforms.Objective: To comparatively evaluate the clinical quality, readability, and originality of responses generated by nine contemporary LLMs for pediatric dental queries.
    Material and Methods: A cross-sectional study assessed the performance of ChatGPT-3.5, ChatGPT-4o, Gemini 2.0, Gemini 2.5, Claude 3.5 Haiku, Claude 3.7 Sonnet, Grok-3, Grok-3 Mini, and DeepSeek-V3. Twenty pediatric dental questions were posed in one-shot queries to each LLM. Responses were evaluated by ten pediatric dental experts using the Modified Global Quality Scale (MGQS), Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), and Turnitin Similarity Index. ANOVA and Cohen's Kappa were used for statistical analysis.
    Results: ChatGPT-4o demonstrated the highest overall MGQS (4.28 ± 0.24), followed by ChatGPT-3.5 (3.45 ± 0.27). DeepSeek-V3 scored lowest (2.18 ± 0.19). Topic-wise, ChatGPT-4o consistently outperformed others across all subdomains. FRES and FKGL scores indicated moderate readability, with Claude models exhibiting the highest linguistic complexity. Turnitin analysis revealed low-to-moderate similarity across models. Inter-rater agreement was substantial (κ = 0.78).
    Conclusions: Among evaluated LLMs, ChatGPT-4o exhibited superior performance in clinical relevance, coherence, and originality, suggesting its potential utility as an adjunct in pediatric dental decision-making. Nonetheless, variability across models underscores the need for critical appraisal and cautious integration into clinical workflows. Key words:Artificial Intelligence, Clinical decision support, Health Communication, Large language models, Natural Language Processing.
    DOI:  https://doi.org/10.4317/jced.63136
  5. Beyoglu Eye J. 2025 ;10(3): 168-174
       Objectives: This study compared the performance of ChatGPT, Google Gemini, and Microsoft Copilot in answering 25 questions about dry eye disease and evaluated comprehensiveness, accuracy, and readability metrics.
    Methods: The artificial intelligence (AI) platforms answered 25 questions derived from the American Academy of Ophthalmology's Eye Health webpage. Three reviewers assigned comprehensiveness (0-5) and accuracy (-2 to 2) scores. Readability metrics included Flesch-Kincaid Grade Level, Flesch Reading Ease Score, sentence/word statistics, and total content measures. Responses were rated by three independent reviewers. Readability metrics were also calculated, and platforms were compared using Kruskal-Wallis and Friedman tests with post hoc analysis. Reviewer consistency was assessed using the intraclass correlation coefficient (ICC).
    Results: Google Gemini demonstrated the highest comprehensiveness and accuracy scores, significantly outperforming Microsoft Copilot (p<0.001). ChatGPT produced the most sentences and words (p<0.001), while readability metrics showed no significant differences among models (p>0.05). Inter-observer reliability was highest for Google Gemini (ICC=0.701), followed by ChatGPT (ICC=0.578), with Microsoft Copilot showing the lowest agreement (ICC=0.495). These results indicate Google Gemini's superior performance and consistency, whereas Microsoft Copilot had the weakest overall performance.
    Conclusion: Google Gemini excelled in content volume while maintaining high comprehensiveness and accuracy, outperforming ChatGPT and Microsoft Copilot in content generation. The platforms displayed comparable readability and linguistic complexity. These findings inform AI tool selection in health-related contexts, emphasizing Google Gemini's strengths in detailed responses. Its superior performance suggests potential utility in clinical and patient-facing applications requiring accurate and comprehensive content.
    Keywords:  Artificial intelligence; ChatGPT; Google Gemini; Microsoft Copilot; dry eye disease
    DOI:  https://doi.org/10.14744/bej.2025.76743
  6. Shoulder Elbow. 2025 Oct 06. 17585732251365178
       Hypothesis: Large language models (LLMs) like ChatGPT have increasingly been used as online resources for patients with orthopedic conditions. Yet there is a paucity of information assessing the ability of LLMs to accurately and completely answer patient questions. The present study comparatively assessed both ChatGPT 3.5 and GPT-4 responses to frequently asked questions on common elbow pathologies, scoring for accuracy and completeness. It was hypothesized that ChatGPT 3.5 and GPT-4 would demonstrate high levels of accuracy for the specific query asked, but some responses would lack completeness, and GPT-4 would yield more accurate and complete responses than ChatGPT 3.5.
    Methods: ChatGPT was queried to identify five most common elbow pathologies (lateral epicondylitis, medial epicondylitis, cubital tunnel syndrome, distal biceps rupture, elbow arthritis). ChatGPT was then queried on the five most frequently asked questions for each elbow pathology. These 25 total questions were then individually asked of ChatGPT 3.5 and GPT-4. Responses were recorded and scored on 6-point Likert scale for accuracy and 3-point Likert scale for completeness by three fellowship-trained upper extremity orthopedic surgeons. ChatGPT 3.5 and GPT-4 responses were compared for each pathology using two-tailed t-tests.
    Results: Average accuracy scores for ChatGPT 3.5 ranged from 4.80 to 4.87. Average GPT-4 accuracy scores ranged from 4.80 to 5.13. Average completeness scores for ChatGPT 3.5 ranged from 2.13 to 2.47, and average completeness scores for GPT-4 ranged from 2.47 to 2.80. Total average accuracy for ChatGPT 3.5 was 4.83, and total average accuracy for GPT-4 was 5.0 (p = 0.05). Total average completeness for ChatGPT 3.5 was 2.35, and total average completeness for GPT-4 was 2.66 (p = 0.01).
    Conclusion: ChatGPT 3.5 and GPT-4 are capable of providing accurate and complete responses to frequently asked patient questions, with GPT-4 providing superior responses. Large language models like ChatGPT have potential to serve as a reliable online resource for patients with elbow conditions.
    Keywords:  chatGPT; cubital tunnel; distal biceps rupture; elbow arthritis; epicondylitis; large language model
    DOI:  https://doi.org/10.1177/17585732251365178
  7. Front Artif Intell. 2025 ;8 1618378
       Background: Idiopathic Pulmonary Fibrosis (IPF) information from AI-powered large language models (LLMs) like ChatGPT-4 and Gemini 1.5 Pro is unexplored for quality, reliability, readability, and concordance with clinical guidelines.
    Research question: What is the quality, reliability, readability, and concordance to clinical guidelines of LLMs in medical and clinically IPF-related content?
    Study design and methods: ChatGPT-4 and Gemini 1.5 Pro responses to 23 ATS/ERS/JRS/ALAT IPF guidelines questions were compared. Six independent raters evaluated responses for quality (DISCERN), reliability (JAMA Benchmark Criteria), readability (Flesch-Kincaid), and guideline concordance (0-4). Descriptive analysis, Intraclass Correlation Coefficient, Wilcoxon signed-rank test, and effect sizes (r) were calculated. Statistical significance was set at p < 0.05.
    Results: According to JAMA Benchmark, ChatGPT-4 and Gemini 1.5 Pro provided partially reliable responses; however, readability evaluations showed that both models were difficult to understand. The Gemini 1.5 Pro provided significantly better treatment information (DISCERN score: 56 versus 43, p < 0.001). Gemini had considerably higher international IPF guidelines concordance than ChatGPT-4 (median 3.0 [3.0-3.5] vs. 3.0 [2.5-3.0], p = 0.0029).
    Interpretation: Both models gave useful medical insights, but their reliability is limited. Gemini 1.5 Pro gave greater quality information than ChatGPT-4 and was more compliant with worldwide IPF guidelines. Readability analyses found that AI-generated medical information was difficult to understand, stressing the need to refine it.
    What is already known on this topic: Recent advancements in AI, especially large language models (LLMs) powered by natural language processing (NLP), have revolutionized the way medical information is retrieved and utilized.
    What this study adds: This study highlights the potential and limitations of ChatGPT-4 and Gemini 1.5 Pro in generating medical information on IPF. They provided partially reliable information in their responses; however, Gemini 1.5 Pro demonstrated superior quality in treatment-related content and greater concordance with clinical guidelines. Nevertheless, neither model provided answers in full concordance with established clinical guidelines, and their readability remained a major challenge.
    How this study might affect research practice or policy: These findings highlight the need for AI model refinement as LLMs evolve as healthcare reference tools to help doctors and patients make evidence-based decisions.
    Keywords:  artificial intelligence; clinical decision-making; health information systems; idiopathic pulmonary fibrosis; large language models; machine learning; natural language processing; quality of health care
    DOI:  https://doi.org/10.3389/frai.2025.1618378
  8. Cureus. 2025 Sep;17(9): e92506
      Introduction Atrial fibrillation (AF) is the most common sustained arrhythmia and is associated with increased risks of stroke, heart failure, and healthcare burden. Access to clear and up-to-date educational content is essential for effective decision-making in complex cases such as AF. Evidence-based resources like UpToDate are often time-consuming to read, and clinicians frequently face time constraints in fast-paced clinical settings. With the growing role of artificial intelligence in healthcare, tools like ChatGPT-3.5 (OpenAI, San Francisco, CA, USA) offer fast and accessible medical summaries. However, their suitability in professional education remains inadequately studied, particularly in comparison with evidence-based resources like UpToDate. Methodology A cross-sectional study was conducted in June 2025. Educational content was generated using ChatGPT-3.5 based on structured prompts and retrieved from UpToDate. Non-textual elements were excluded. Readability was assessed using the Flesch-Kincaid Reading Ease (FRE) score, the Flesch-Kincaid Grade Level (FKGL), the Simple Measure of Gobbledygook (SMOG) Index, word count, sentence count, average words per sentence, and both count and percentage of difficult words. Statistical comparison was done using the Mann-Whitney U test (p < 0.05), analysed with R software (v4.3.2; R Foundation for Statistical Computing, Vienna, Austria). Results ChatGPT content was significantly shorter (median 495 vs. 3381 words; p = 0.029), had shorter sentences (14.3 vs. 19.3 words; p = 0.029), but a higher percentage of difficult words (29.6% vs. 23.3%; p = 0.029). Other differences were not statistically significant. Conclusions ChatGPT provides concise educational content with readability scores comparable to UpToDate but with a higher proportion of complex vocabulary. While promising as a supplementary resource, its integration into clinical decision-making should be guided by expert review and validation.
    Keywords:  artificial intelligence in healthcare; atrial fibrillation; chatgpt; evidence-based medicine; readability analysis; uptodate
    DOI:  https://doi.org/10.7759/cureus.92506
  9. J Hand Surg Glob Online. 2025 Nov;7(6): 100831
       Purpose: The rise of artificial intelligence (AI) in health care comes with increasing concerns about the use and integrity of the information it generates. Chat Generative Pre-Trained Transformer (ChatGPT) 3.5, Google Gemini, and Bing Copilot are free AI chatbot platforms that may be used for answering medical questions and disseminating medical information. Given that carpal tunnel syndrome accounts for 90% of all neuropathies, it is important to understand the accuracy of the information patients may be receiving. The purpose of this study is to determine the use and accuracy of responses generated by ChatGPT, Google Gemini, and Bing Copilot in answering frequently asked questions about carpal tunnel syndrome.
    Methods: Two independent authors scored responses using the DISCERN tool. DISCERN consists of 15 questions assessing health information on a five-point scale, with total scores ranging from 15 to 75 points. Then, a two-factor analysis of variance was conducted, with scorer and chatbot type as the factors.
    Results: One-way analysis of variance revealed no significant difference in DISCERN scores among the three chatbots. The chatbots each scored in the "fair" range, with means of 45 for ChatGPT, 48 for Bing Copilot, and 46 for Google Gemini. The average Journal of the American Medical Association score for ChatGPT and Google Gemini surpassed that of Bing Copilot, with averages of 2.3, 2.3, and 1.8, respectively.
    Conclusions: ChatGPT, Google Gemini, and Bing Copilot platforms generated relatively reliable answers for potential patient questions about carpal tunnel syndrome. However, users should continue to be aware of the shortcomings of the information provided, given the lack of citations, potential for misconstrued information, and perpetuated biases that inherently come with using such platforms. Future studies should explore the response quality for less common orthopedic pathologies and assess patient perceptions of response readability to determine the value of AI as a patient resource across the medical field.
    Type of study/level of evidence: Cross-sectional study V.
    Keywords:  Carpal tunnel syndrome; Generative artificial intelligence; Hand; Orthopedic surgery
    DOI:  https://doi.org/10.1016/j.jhsg.2025.100831
  10. J Med Internet Res. 2025 Oct 07. 27 e78625
       BACKGROUND: Cardiovascular disease (CVD) remains the leading cause of death worldwide, yet many web-based sources on cardiovascular (CV) health are inaccessible. Large language models (LLMs) are increasingly used for health-related inquiries and offer an opportunity to produce accessible and scalable CV health information. However, because these models are trained on heterogeneous data, including unverified user-generated content, the quality and reliability of food and nutrition information on CVD prevention remain uncertain. Recent studies have examined LLM use in various health care applications, but their effectiveness for providing nutrition information remains understudied. Although retrieval-augmented generation (RAG) frameworks have been shown to enhance LLM consistency and accuracy, their use in delivering nutrition information for CVD prevention requires further evaluation.
    OBJECTIVE: To evaluate the effectiveness of off-the-shelf and RAG-enhanced LLMs in delivering guideline-adherent nutrition information for CVD prevention, we assessed 3 off-the-shelf models (ChatGPT-4o, Perplexity, and Llama 3-70B) and a Llama 3-70B+RAG model.
    METHODS: We curated 30 nutrition questions that comprehensively addressed CVD prevention. These were approved by a registered dietitian providing preventive cardiology services at an academic medical center and were posed 3 times to each model. We developed a 15,074-word knowledge bank incorporating the American Heart Association's 2021 dietary guidelines and related website content to enhance Meta's Llama 3-70B model using RAG. The model received this and a few-shot prompt as context, included citations in a Context Source section, and used vector similarity to align responses with guideline content, with the temperature parameter set to 0.5 to enhance consistency. Model responses were evaluated by 3 expert reviewers against benchmark CV guidelines for appropriateness, reliability, readability, harm, and guideline adherence. Mean scores were compared using ANOVA, with statistical significance set at P<.05. Interrater agreement was measured using the Cohen κ coefficient, and readability was estimated using the Flesch-Kincaid readability score.
    RESULTS: The Llama 3+RAG model scored higher than the Perplexity, GPT-4o, and Llama 3 models on reliability, appropriateness, guideline adherence, and readability and showed no harm. The Cohen κ coefficient (κ>70%; P<.001) indicated high reviewer agreement.
    CONCLUSIONS: The Llama 3+RAG model outperformed the off-the-shelf models across all measures with no evidence of harm, although the responses were less readable due to technical language. The off-the-shelf models scored lower on all measures and produced some harmful responses. These findings highlight the limitations of off-the-shelf models and demonstrate that RAG system integration can enhance LLM performance in delivering evidence-based dietary information.
    Keywords:  artificial intelligence; cardiovascular dietary guidelines; large language models; qualitative evaluation; retrieval-augmented generation
    DOI:  https://doi.org/10.2196/78625
  11. Sci Rep. 2025 Oct 10. 15(1): 35454
      The complexity of scoliosis-related terminology and treatment options often hinders patients and caregivers from understanding their choices, making it difficult to make informed decisions. As a result, many patients seek guidance from artificial intelligence (AI) tools. However, AI-generated health content may suffer from low readability, inconsistency, and questionable quality, posing risks of misinformation. This study evaluates the readability and informational quality of scoliosis-related content produced by AI. We evaluated five AI models-ChatGPT-4o, ChatGPT-o1, ChatGPT-o3 mini-high, DeepSeek-V3, and DeepSeek-R1-by querying each on three types of scoliosis: congenital, adolescent idiopathic, and neuromuscular. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL) and FleschKincaid Reading Ease (FKRE), while content quality was evaluated using the DISCERN score. Statistical analyses were performed in R-Studio. Inter-rater reliability was calculated using the Intraclass Correlation Coefficient (ICC). DeepSeek-R1 achieved the lowest FKGL (6.2) and the highest FKRE (64.5), indicating superior readability. In contrast, ChatGPT-o1 and ChatGPT-o3 mini-high scored above FKGL 12.0, requiring college-level reading skills. Despite readability differences, DISCERN scores remained stable across models (~ 50.5/80) with high inter-rater agreement (ICC = 0.85-0.87), suggesting a fair level quality. However, all responses lacked citations, limiting reliability. AI-generated scoliosis education materials vary significantly in readability, with DeepSeek-R1 being the most accessible. Future AI models should enhance readability without compromising information accuracy and integrate real-time citation mechanisms for improved trustworthiness.
    Keywords:  AI-generated health information; DISCERN score; Patient health literacy; Readability assessment; Scoliosis
    DOI:  https://doi.org/10.1038/s41598-025-19370-3
  12. J Plast Reconstr Aesthet Surg. 2025 Sep 19. pii: S1748-6815(25)00562-5. [Epub ahead of print]110 239-252
       BACKGROUND: Online patient education materials (OPEMs) play a critical role in shaping patient decision-making for breast augmentation and reduction surgery. However, concerns persist regarding their readability, quality, and inclusivity. We present the first comprehensive systematic review and meta-analysis to evaluate OPEMs across traditional, artificial intelligence (AI)-generated, and social media platforms.
    METHODS: We systematically reviewed 23 studies evaluating OPEMs related to breast augmentation and reduction. Outcomes included the Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG) scores, DISCERN and Ensuring quality information for patients quality assessments, diversity in visual representation, and AI performance. A random-effects meta-analysis was conducted on FKGL scores and binomial test was used to assess the proportion of studies exceeding the recommended readability thresholds.
    RESULTS: Meta-analysis of 3 studies revealed a pooled FKGL of 12.28 (95% CI: 11.16-13.41), with significant heterogeneity (I² = 96.4%, p < 0.0001). A binomial test confirmed that 100% of the studies evaluating readability concluded that OPEMs exceeded the sixth-grade level (p = 0.0005). Nine studies reported suboptimal content quality, with common deficiencies in risk disclosure, source attribution, and citation. Three studies found representation and linguistic disparities in educational visuals and content accessibility. AI-generated materials showed promise but often lacked surgical nuance and detail.
    CONCLUSION: Our findings suggest that OPEMs for breast surgery are consistently written above the recommended readability levels, frequently omit essential content, and exhibit inequities in representation. These findings demonstrate the need for standardizing and improving digital patient education content to meet the informational and cultural needs of all surgical candidates.
    Keywords:  Breast augmentation; Breast reduction; Breast surgery; Digital information; Health literacy; Online patient education materials
    DOI:  https://doi.org/10.1016/j.bjps.2025.09.013
  13. World J Mens Health. 2025 Sep 09.
       PURPOSE: Artificial intelligence (AI) tools have demonstrated considerable potential for the dissemination of medical information. However, variability may exist in the quality and readability of prostate-cancer-related content generated by different AI platforms. This study aimed to evaluate the quality, accuracy, and readability of prostate-cancer-related medical information produced by ChatGPT and DeepSeek.
    MATERIALS AND METHODS: Frequently asked questions related to prostate cancer were collected from the American Cancer Society website, ChatGPT, and DeepSeek. Three urologists with over 10 years of clinical experience reviewed and confirmed the relevance of the selected questions. The Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) was used to assess the understandability and actionability of AI-generated content. The DISCERN instrument was used to evaluate the quality of the treatment-related information. Additionally, readability was assessed using four established indices: Automated Readability Index (ARI), Flesch Reading Ease Score, Gunning Fog Index, and Flesch-Kincaid Grade Level.
    RESULTS: No statistically significant differences were observed between ChatGPT and DeepSeek in PEMAT-P scores (70.66±8.13 vs. 69.35±8.83) or DISCERN scores (59.07±3.39 vs. 58.88±3.66) (p>0.05). However, the ARI for DeepSeek was higher than that for ChatGPT (12.63±1.42 vs. 10.85±1.93, p<0.001), indicating greater textual complexity and reading difficulty.
    CONCLUSIONS: AI tools, such as ChatGPT and DeepSeek, hold significant potential for enhancing patient education and disseminating medical information on prostate cancer. Nevertheless, further refinement of content quality and language clarity is needed to prevent potential misunderstandings, decisional uncertainty, and anxiety among patients due to difficulty in comprehension.
    Keywords:  Artificial intelligence; Comprehension; Large language models; Prostate
    DOI:  https://doi.org/10.5534/wjmh.250144
  14. J Med Internet Res. 2025 Oct 08. 27 e73185
       BACKGROUND: New media have become vital sources of cancer-related health information. However, concerns about the quality of that information persist.
    OBJECTIVE: This study aims to identify characteristics of studies considering cancer-related information on new media (including social media and artificial intelligence chatbots); analyze patterns in information quality across different platforms, cancer types, and evaluation tools; and synthesize the quality levels of the information.
    METHODS: We systematically searched PubMed, Web of Science, Scopus, and Medline databases for peer-reviewed studies published in English between 2014 and 2023. The validity of the included studies was assessed based on risk of bias, reporting quality, and ethical approval, using the Joanna Briggs Institute Critical Appraisal and the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) checklists. Features of platforms, cancer types, evaluation tools, and trends were summarized. Ordinal logistic regression was used to estimate the associations between the conclusion of quality assessments and study features. A random-effects meta-analysis of proportions was conducted to synthesize the overall levels of information quality and corresponding 95% CIs for each assessment indicator.
    RESULTS: A total of 75 studies were included, encompassing 297,519 posts related to 17 cancer types across 15 media platforms. Studies focusing on video-based media (odds ratio [OR] 0.02, 95% CI 0.01-0.12), rare cancers (OR 0.32, 95% CI 0.16-0.65), and combined cancer types (OR 0.04, 95% CI 0.01-0.14) were statistically less likely to yield higher quality conclusions compared to those on text-based media and common cancers. The pooled estimates reported moderate overall quality (DISCERN 43.58, 95% CI 37.80-49.35; Global Quality Score 49.91, 95% CI 43.31-56.50), moderate technical quality (Journal of American Medical Association Benchmark Criteria 46.13, 95% CI 38.87-53.39; Health on the Net Foundation Code of Conduct 49.68, 95% CI 19.68-79.68), moderate-high understandability (Patient Education Material Assessment Tool for Understandability 66.92, 95% CI 59.86-73.99), moderate-low actionability (Patient Education Materials Assessment Tool for Actionability 37.24, 95% CI 18.08-58.68; usefulness 48.86, 95% CI 26.24-71.48), and moderate-low completeness (34.22, 95% CI 27.96-40.48). Furthermore, 27.15% (95% CI 21.36-33.35) of posts contained misinformation, 21.15% (95% CI 8.96-36.50) contained harmful information, and 12.46% (95% CI 7.52-17.39) contained commercial bias. Publication bias was detected only in misinformation studies (Egger test: bias -5.67, 95% CI -9.63 to -1.71; P=.006), with high heterogeneity across most outcomes (I²>75%).
    CONCLUSIONS: Meta-analysis results revealed that the overall quality of cancer-related information on social media and artificial intelligence chatbots was moderate, with relatively higher scores for understandability but lower scores for actionability and completeness. A notable proportion of content contained misleading, harmful, or commercially biased information, posing potential risks to users. To support informed decision-making in cancer care, it is essential to improve the quality of information delivered through these media platforms.
    TRIAL REGISTRATION: PROSPERO CRD420251058032; https://www.crd.york.ac.uk/PROSPERO/view/CRD420251058032.
    Keywords:  cancer; consumer health information; health literacy; misinformation; social communication; social media; systematic review
    DOI:  https://doi.org/10.2196/73185
  15. Front Med (Lausanne). 2025 ;12 1613526
       Background: Pulmonary nodules (PNs) are often overlooked, potentially leading to health risks. Social media platforms are increasingly used for health information dissemination. This study evaluates the quality and engagement of PN-related videos on YouTube, Bilibili, and TikTok.
    Methods: On March 1, 2025, we searched each platform using "pulmonary nodule" or its Chinese equivalent. After screening, 271 videos were analyzed. Video characteristics were documented, and quality was assessed using PEMAT, VIQI, GQS, and mDISCERN tools. Inter-rater reliability was high (κ = 0.81).
    Results: The final sample included 98 (YouTube), 74 (Bilibili), and 99 (TikTok) videos. TikTok videos were the shortest (median 114 s) yet had the highest engagement. Nonprofit organizations dominated YouTube uploads; physicians were most common on Bilibili and TikTok. Treatment was the most covered topic. YouTube scored highest in comprehensibility and actionability (PEMAT-T/A), while Bilibili and TikTok scored higher in production quality (VIQI, GQS). Video quality did not differ significantly between professional and non-professional uploaders. Most quality metrics showed weak correlation with audience engagement.
    Conclusion: Long-form platforms (YouTube and Bilibili) offer higher-quality PN information but lower engagement, whereas short-form platforms (TikTok) show high interaction but lower informational depth. Social media can play a supportive role in public PN education. We provide recommendations for creators, platforms, and viewers to improve the quality and reliability of medical content.
    Keywords:  information quality; online video; public health; pulmonary nodule; social media
    DOI:  https://doi.org/10.3389/fmed.2025.1613526
  16. Pain Manag. 2025 Oct 07. 1-11
       BACKGROUND: This study aimed to identify the most frequently searched keywords and questions related to the Achilles, knee, shoulder, and elbow tendons in Italy. It further aimed to evaluate the credibility, readability, and content of the most visited web pages.
    METHODS: Semrush Inc. (2008) machine learning models were used for data mining in December 2024. Credibility and readability of the most visited web pages were assessed through the QUality Evaluation Scoring Tool (QUEST) and Gulpease index, respectively. A content analysis of web pages was used to determine alignment with evidence-based literature.
    RESULTS: The most searched question was "How to treat foot tendonitis?" (2,750 searches). Only two web pages (2.2%) were rated as providing unbiased information using the QUEST, with credibility values ranging from 4.0 (±1.6) to 11.4 (±4.0) across all searches. Gulpease indices ranged from 34.0 (±2.1) to 42.8 (±2.9) across all web pages. Notably, content analysis revealed only a small percentage of web pages that aligned to best available evidence.
    CONCLUSION: Credibility, readability, and overall quality of online content on tendons were poor. Healthcare professionals may play a role in promoting accurate terminology and supporting the production of high-quality, evidence-based web page content to improve public health literacy.
    Keywords:  E-health; health literacy; tendinopathy; tendon; web data
    DOI:  https://doi.org/10.1080/17581869.2025.2571389
  17. Cureus. 2025 Sep;17(9): e91585
       AIM:  This study aimed to evaluate the quality and readability of online information on thumb-sucking habits among children, using the DISCERN instrument, Health on the Net (HON) seal, and Journal of American Medical Association (JAMA) benchmarks.
    METHODS:  A systematic search was conducted on Google, Yahoo, and Bing for "thumb-sucking habit". A total of 450 websites were screened; irrelevant content, duplicates, and heavily commercial or video-only sites were excluded, resulting in 143 sites. The DISCERN tool, with 16 questions rated from 1 to 5, assessed information quality, while the HON seal verification checked for HON compliance. JAMA benchmarks evaluated authorship, attribution, disclosure, and currency, and readability was assessed using Flesch Reading Ease, Flesch-Kincaid Grade Level, and SMOG (Simple Measure of Gobbledygook).
    RESULTS:  The DISCERN mean score value was 2.73 (±0.57) out of 5, indicating moderate quality. Most sites were non-profit (86.01%), followed by commercial (7.69%), and university/medical centres (6.29%). DISCERN highlighted strengths in relevance and balance but weaknesses in source citations and references. Only 12 sites displayed a HON seal. Readability varied: university/medical centres scored the highest, whereas commercial websites, despite showing relatively higher DISCERN scores, had lower readability, making the information less accessible to the general public.
    CONCLUSION:  This study revealed that online information on thumb sucking is of moderate quality, with notable differences across website types. Clinicians should guide patients toward non-profit and HON-certified sites for more reliable resources. Enhancing transparency, citation practices, and readability remains essential to support informed health decisions regarding thumb-sucking habits.
    Keywords:  discern; hon seal; internet health education; jama; online health information; readability; thumb-sucking habit
    DOI:  https://doi.org/10.7759/cureus.91585
  18. Ocul Immunol Inflamm. 2025 Oct 11. 1-4
       PURPOSE: TikTok has emerged as one of the most popular video-based social media platforms with over 1 billion active users. It has also become a popular source for medical information, posing safety concerns about potentially misleading or inaccurate content. This study explored scleritis-related content on TikTok to evaluate content quality, engagement metrics, and misinformation.
    METHODS: Using TikTok's search function, videos tagged with 'scleritis,' 'anterior scleritis,' and 'posterior scleritis' were analyzed. Videos were categorized by creator type, engagement (views, likes, comments, shares), and content type (informative, misinformation, personal experience, diagnosis, miscellaneous). The Patient Education Materials and Assessment Tool - Audiovisual (PEMAT-AV) assessed understandability and actionability, focusing on word choice, organization, visual aids, and actionable advice.
    RESULTS: A total of 69 videos were analyzed; most were created by patients (88.4%, n = 61). Average engagement per video was 29 001 views, 403 likes, 24 comments, and 6 shares. While 84.1% (n=58) were informative, 10.1% (n = 7) contained misinformation. The mean PEMAT understandability score was 37.0%, and actionability was 6.0%, indicating poor educational quality.
    CONCLUSION: Most scleritis-related TikToks are from non-medical professionals with 10.1% containing misinformation. Content demonstrated low understandability and actionability. Physician-created videos are needed to improve scleritis-related medical information on TikTok and ensure more accurate, accessible, and actionable content for users seeking reliable health information.
    Keywords:  Misinformation; TikTok; ocular inflammation; scleritis; social media
    DOI:  https://doi.org/10.1080/09273948.2025.2570059
  19. JMIR Form Res. 2025 Oct 06. 9 e76723
       Background: YouTube has become a major source of health information, with 2.5 billion monthly users. Despite efforts taken to promote reliable sources, misinformation remains prevalent, particularly regarding medical cannabis.
    Objective: This study aims to evaluate the quality and reliability of medical cannabis information on YouTube and to examine the relationship between video popularity and content quality.
    Methods: A systematic review of YouTube videos on medical cannabis was conducted. Search terms were selected based on Google Trends, and 800 videos were retrieved on July 8, 2024. After applying exclusion criteria, 516 videos were analyzed. Videos were categorized by content creators: (1) nonmedical educational channels, (2) medical education channels, and (3) independent users. Two independent reviewers (SK and SE) assessed content quality using the DISCERN grade and the Health on the Net (HON) code. Statistical analysis included one-way ANOVA and Pearson correlation coefficient.
    Results: Of the 516 videos analyzed, 48.5% (n=251) were from the United States, and 17.2% (n=89) from the United Kingdom. Only 12.2% (n=63) were produced by medical education channels, while 84.3% (n=435) were by independent users. The total views reached 119 million, with nonmedical educational channels having the highest median views with 274,957 (IQR 2161-546,887) and medical education channels having the lowest median views at 5721 (IQR 2263-20,792.50). The mean DISCERN and HON code scores for all videos were 34.63 (SD 9.49) and 3.93 (SD 1.20), respectively. Nonmedical educational creators had the highest DISCERN score (mean 47.78, SD 10.40) and independent users had the lowest score (mean 33.5, SD 8.50; P<.001). Similarly, nonmedical educational creators had the highest HON code score (mean 5.33, SD 1.22), while independent users had the lowest (mean 3.78, SD 1.10; P=.007). Weak positive correlations were found between video views and DISCERN scores (r=0.34, P<.001) and likes and DISCERN scores (r=0.30, P<.001).
    Conclusions: YouTube is a key source of information on medical cannabis, but the credibility of videos varies widely. Independent users attract the highest viewers but have reduced reliability according to the DISCERN and HON scores. Educational channels, despite increased reliability received the least engagement. The weak correlation between views and content quality emphasizes the need for content moderation to ensure that the most reliable and accurate information on health issues is widely disseminated. Future research should identify strategies to promote verified sources of information and limit misinformation.
    Keywords:  cannabis; consumer health information; health education; health literacy; health promotion
    DOI:  https://doi.org/10.2196/76723
  20. Medicine (Baltimore). 2025 Oct 03. 104(40): e45006
      In recent years, short videos have shown significant promise in spreading health-related content. Yet, to the best of our knowledge, there has been no research that has evaluated the content and quality of atherosclerosis-related videos on short-video platforms. The goal of this research was to evaluate the content and quality of atherosclerosis-related videos on short-video platforms. We searched 4 platforms using predefined keywords (atherosclerosis, atherosclerotic disease, arteriosclerosis, arterial occlusion or vascular occlusion). We collected the top 50 videos per term on each platform. Data were collected on TikTok, Kwai, Rednote, and Bilibili from December 31, 2024 to January 8, 2025. Two independent researchers evaluated the content and quality of these videos by measures of Journal of the American Medical Association score, Global Quality Scale, modified DISCERN, and Patient Education Materials Assessment Tool (PEMAT). The data analysis was performed using SPSS and GraphPad Prism. Descriptive statistics were produced, and comparisons were made between different groups. The relationship between quantitative variables was examined using Spearman correlation analysis. A total of 764 suitable videos were selected for in-depth analysis, with the majority focusing on disease-related information (n = 670, 87.7%). The primary contributors were medical professionals (n = 546, 75.1%). The videos attained a mean Journal of the American Medical Association score of 1.8 (standard deviation [SD] 0.6), a Global Quality Scale rating of 3.1 (SD 0.8), and an modified DISCERN score of 2.7 (SD 0.6). They also had a PEMAT-Understandability score of 84.2% (SD 11.6%) and a PEMAT-Actionability percentage of 70% (SD 39.6%). The content shared by medical professionals and the videos that included information about illnesses were typically of superior quality and attracted significantly more engagement. Content related to treatment received more likes, comments, saves, and shares than other topics (P < .01). A significant positive relationship was found between the number of likes, comments, saves, and shares. Furthermore, the duration of the videos, the time elapsed since they were uploaded, and the follower count were all positively linked to both the popularity and perceived quality of the videos (P < .001). While short-video platforms host substantial content, the overall quality is suboptimal and requires systematic improvement and professional oversight.
    Keywords:  atherosclerosis; digital health; health communication; health information quality; short video; social media
    DOI:  https://doi.org/10.1097/MD.0000000000045006
  21. World Neurosurg. 2025 Oct 07. pii: S1878-8750(25)00885-X. [Epub ahead of print] 124527
       BACKGROUND: Medulloblastoma is the most common malignant cerebellar tumor in children. With increasing health information-seeking behavior, YouTube has emerged as a popular source for patient education. However, the unregulated nature of its medical content raises concerns regarding accuracy and reliability.
    OBJECTIVE: To evaluate the quality, reliability, and popularity of YouTube videos related to medulloblastoma using validated assessment tools.
    METHODS: This retrospective cross-sectional study analyzed the first 100 YouTube videos retrieved using the keyword "medulloblastoma" (June 26, 2025). After applying inclusion and exclusion criteria, 96 videos were evaluated. Data collected included video source, views, likes, comments, and age. Quality was assessed using the DISCERN instrument, JAMA benchmarks, and Global Quality Score (GQS). Non-parametric statistical tests and Spearman correlation were applied.
    RESULTS: Videos originated primarily from the United States (45.8%), the United Kingdom (18.8%), and India (15.6%). Sources included private institutions (47.9%), public institutions (20.8%), physicians (12.5%), patient experiences (10.4%), and health channels (8.3%). Mean DISCERN, JAMA, and GQS scores were 56.39 ± 14.18, 2.68 ± 0.83, and 3.74 ± 0.89, respectively. Physician-uploaded videos had the highest quality scores (DISCERN: 62.88, GQS: 4.13; p < 0.001), whereas patient-experience videos scored lowest. Popularity metrics showed no significant correlation with quality scores.
    CONCLUSION: Medulloblastoma-related YouTube videos generally exhibit moderate-to-high quality, with physician and public institution uploads providing the most reliable information. Given the weak association between popularity and quality, healthcare professionals and institutions should actively contribute accurate, evidence-based content to improve online health literacy.
    Keywords:  Information Quality; Medulloblastom; Patient Education; Youtube
    DOI:  https://doi.org/10.1016/j.wneu.2025.124527
  22. Front Public Health. 2025 ;13 1640105
       Background: Short videos that popularize health science have become essential for disseminating health information and enhancing public health literacy. However, previous research has primarily focused on health information content, with a significant gap in assessing the quality of health science popularization in short videos.
    Methods: This study developed a quality assessment scale for the popularization of health science short videos based on multimodal theory, utilizing literature analysis and the creation of custom measurement items. Data were collected from scales completed by 796 residents through online surveys conducted on mobile devices. Both exploratory and confirmatory factor analyses were employed to evaluate the quality of mobile health science popularized short videos.
    Results: The results revealed that the quality scale for health science popularization in short videos could be divided into seven dimensions and 22 indicators, each a significant determinant of video quality.
    Conclusion: This research provides a more intuitive, reliable, and standardized tool for assessing the quality of health science popularization in short videos. Also, it offers essential guidance for the future design, development, and promotion of short health science popularization videos.
    Keywords:  confirmatory factor analysis; exploratory factor analysis; health science popularization short videos; multimodal theory; scale development; scale validation; short video quality
    DOI:  https://doi.org/10.3389/fpubh.2025.1640105
  23. J Med Internet Res. 2025 Oct 07. 27 e79961
       BACKGROUND: Online health information seeking is undergoing a major shift with the advent of artificial intelligence (AI)-powered technologies such as voice assistants and large language models (LLMs). While existing health information-seeking behavior models have long explained how people find and evaluate health information, less is known about how users engage with these newer tools, particularly tools that provide "one" answer rather than the resources to investigate a number of different sources.
    OBJECTIVE: This study aimed to explore how people use and perceive AI- and voice-assisted technologies when searching for health information and to evaluate whether these tools are reshaping traditional patterns of health information seeking and credibility assessment.
    METHODS: We conducted in-depth qualitative research with 27 participants (ages 19-80 years) using a think-aloud protocol. Participants searched for health information across 3 platforms-Google, ChatGPT, and Alexa-while verbalizing their thought processes. Prompts included both a standardized hypothetical scenario and a personally relevant health query. Sessions were transcribed and analyzed using reflexive thematic analysis to identify patterns in search behavior, perceptions of trust and utility, and differences across platforms and user demographics.
    RESULTS: Participants integrated AI tools into their broader search routines rather than using them in isolation. ChatGPT was valued for its clarity, speed, and ability to generate keywords or summarize complex topics, even by users skeptical of its accuracy. Trust and utility did not always align; participants often used ChatGPT despite concerns about sourcing and bias. Google's AI Overviews were met with caution-participants frequently skipped them to review traditional search results. Alexa was viewed as convenient but limited, particularly for in-depth health queries. Platform choice was influenced by the seriousness of the health issue, context of use, and prior experience. One-third of participants were multilingual, and they identified challenges with voice recognition, cultural relevance, and data provenance. Overall, users exhibited sophisticated "mix-and-match" behaviors, drawing on multiple tools depending on context, urgency, and familiarity.
    CONCLUSIONS: The findings suggest the need for additional research into the ways in which search behavior in the era of AI- and voice-assisted technologies is becoming more dynamic and context-driven. While the sample size is small, participants in this study selectively engaged with AI- and voice-assisted tools based on perceived usefulness, not just trustworthiness, challenging assumptions that credibility is the primary driver of technology adoption. Findings highlight the need for digital health literacy efforts that help users evaluate both the capabilities and limitations of emerging tools. Given the rapid evolution of search technologies, longitudinal studies and real-time observation methods are essential for understanding how AI continues to reshape health information seeking.
    Keywords:  Alexa; ChatGPT; Google; artificial intelligence; health information–seeking behavior; large language models; search engine; trust
    DOI:  https://doi.org/10.2196/79961
  24. Psychooncology. 2025 Oct;34(10): e70298
       AIMS: To undertake a comprehensive systematic review of currently available instruments designed to assess health information-seeking behaviors among cancer patients, appraising their psychometric properties and methodological rigor to identify the most robust instrument for clinical application.
    DESIGN: A systematic review based on COSMIN methodology.
    DATE SOURCES: Nine electronic databases CNKI, Wanfang, VIP, SinoMed, PubMed, Embase, Web of Science, CINAHL, and APA PsycINFO were systematically searched from inception until February 2025.
    REVIEW METHODS: Employing a rigorously validated search methodology developed by Terwee, we systematically interrogated nine multinational databases spanning Chinese and English publications from inception through February 20, 2025, and targeting cancer patient populations. Following independent dual screening by researchers, the psychometric characteristics of identified instruments were systematically assessed using COSMIN quality criteria for measurement tool evaluation.
    RESULTS: From the initial 6545 studies, 16 met the eligibility criteria, involving 11 instruments for evaluating health information-seeking behaviors in cancer populations. High-quality evidence revealed insufficient content validity for the BIAS and structural validity deficiencies in the HIOS, PSM, and MBSS, all assigned a class C recommendation, while the remaining seven instruments received class B ratings.
    CONCLUSION: Compared to the other 10 instruments, the MHISBQ measurement attributes are relatively comprehensive and can be provisionally recommended for use. However, there is still a need for large-scale studies involving diverse cancer populations to directly or indirectly compare psychological attributes, such as the stability and responsiveness of the MHISBQ. Additionally, these studies should track changes in HISB and explore its impact on patients, so that they can help guide the delivery of accurate health information and support for cancer patients.
    REGISTRATION: PROSPERO (CRD42024606469).
    Keywords:  COSMIN checklist; Psycho‐oncology; assessment tool; health information; information seeking behavior; measurement properties; systematic review
    DOI:  https://doi.org/10.1002/pon.70298
  25. J Health Psychol. 2025 Oct 05. 13591053251378224
      Black young adults go online frequently and comprise a large proportion of digital consumers, with implications for their health. Yet their engagement with online health information and advertising is not well understood. In this descriptive mixed-methods study, Black young adults from the United States (N = 179; mean age = 23.27) completed an online survey (February to March 2022). They reported searching for health information and seeing health-related advertisements online via closed- and open-ended questions. Participants searched using multiple sources, including video-based social media platforms. Over 85% of participants saw health-related advertisements weekly, and one-third daily. Over half reported that these advertisements were targeted, yet over 60% did not frequently purchase or share them. Black young adult social media users use video-based social media platforms, where content may not be accurate or regulated. They see advertisements frequently but rarely engage, perhaps due to their digital skills or the abundance of competing content.
    Keywords:  Black; advertising; digital; health information; online; search; young adult
    DOI:  https://doi.org/10.1177/13591053251378224