bims-librar Biomed News
on Biomedical librarianship
Issue of 2026–04–12
twenty-six papers selected by
Thomas Krichel, Open Library Society



  1. Med Ref Serv Q. 2026 Apr 06. 1-18
      As health sciences libraries look to the future, one area of focus should be onboarding new hires. This project investigates the onboarding experiences of newly hired health sciences librarians. This study used a mixed method design to quantitatively and qualitatively analyze the attitudes and perceptions of new health sciences librarians toward their most recent onboarding experiences. A survey was designed focusing on onboarding experiences of health sciences librarians and how these experiences have affected their career. Individual interviews were arranged with 14 survey respondents. Libraries need to explore more connections for new hires both inside and outside of the library.
    Keywords:  Faculty; health sciences librarianship; new hires; onboarding; orientation
    DOI:  https://doi.org/10.1080/02763869.2026.2641622
  2. Front Psychol. 2026 ;17 1730597
      This meta-analysis examines the effectiveness of university library reading promotion activities on college students' cultural development and identifies the underlying psychological mechanisms and moderating factors. Synthesizing 15 studies with 22,321 participants, the analysis revealed a medium-to-large effect size [d = 0.52, 95% CI (0.41, 0.63)] with significant heterogeneity (I 2 = 79.7%), indicating positive impacts on cultural knowledge, cultural identity, cultural literacy, and intercultural competence. Two-stage structural equation modeling identified five psychological mediators-reading motivation, cognitive engagement, emotional experience, self-efficacy, and cultural identity-explaining 68.4% of the total effect, with reading motivation as the predominant pathway (34.6%). Mixed-effects subgroup analyses demonstrated that blended delivery formats outperformed single-mode interventions, longer program durations yielded stronger effects, and East Asian samples exhibited larger effect sizes than Western samples. These findings provide an evidence-based framework for optimizing library-led cultural education programs in higher education settings worldwide.
    Keywords:  cultural development; higher education; meta-analysis; psychological mechanisms; reading promotion; university libraries
    DOI:  https://doi.org/10.3389/fpsyg.2026.1730597
  3. Res Synth Methods. 2026 Apr 06. 1-19
      Evidence synthesis findings hinge upon well-designed, effective search strategies. When developing these strategies, evidence synthesis teams make multiple decisions (e.g., selecting information sources, developing search string architecture, and picking supplementary search methods) that directly affect the breadth of discovered evidence and thus evidence synthesis outcomes. Despite the number of decisions required when developing search strategies, limited guidance exists to inform these decisions using a data-driven approach. To help address this gap, we developed CiteSource, an R package and accompanying Shiny application, that supports data-driven search strategy development and reporting. CiteSource allows users to assign and retain metadata across three custom fields: source, label, and string to indicate where the records were found, what method or string was used to find them, and whether they were included after screening. CiteSource allows users to visually map the overlap between sets of records, create data summaries of citation records, and export citation records with the newly assigned metadata. CiteSource's analysis and visualization outputs can be harnessed for a variety of use cases, such as optimizing literature source selection, honing and understanding the effectiveness of search strings, and evaluating the impacts of literature sources and supplementary search methods. Overall, CiteSource provides a tool for evidence synthesizers to make informed data-driven decisions that boost the efficiency, rigor, and transparency of search strategies and associated reporting.
    Keywords:  evidence synthesis; information retrieval; reproducibility; search strategy; systematic searching
    DOI:  https://doi.org/10.1017/rsm.2026.10084
  4. Clin Rheumatol. 2026 Apr 06.
       BACKGROUND: Gout and hyperuricemia, linked to purine metabolism abnormalities or impaired uric acid excretion, are rising with lifestyle changes. Effective self-management and health literacy are crucial for gout management. While large language models (LLMs) show promise in enhancing health management, their potential on gout patient education remains underexplored. This study aimed to evaluate the accuracy and readability of responses generated by three LLMs, including DeepSeek-V3, DeepSeek-R1, and GPT-5, to questions based on the gout and hyperuricemia guidelines published by the American College of Rheumatology (ACR).
    METHODS: Based on the ACR gout guidelines, a set of 42 questions was curated and submitted to three LLMs. Their responses were independently rated on a 5-point Likert scale by three expert gout and hyperuricemia specialists against the guidelines. Accuracy was rated as either an average score of ≥ 4 (low threshold) or 5 (high threshold). The readability of the responses was assessed by Microsoft Word, which provided metrics including the word count, character count, Flesch Reading Ease (FRE) score, Flesch-Kincaid Grade Level (FKGL), and Automated Readability Index (ARI) were calculated.
    RESULTS: Our findings reveal that response accuracy was significantly higher for GPT-5 compared to DeepSeek-V3 (P < 0.001), with no significant difference between GPT-5 and DeepSeek-R1 (P > 0.05). In terms of readability, GPT-5 produced the most complex responses (FKGL: 12.89 ± 2.22, ARI: 14.87 ± 2.40), while DeepSeek-R1 generated the longest outputs.
    CONCLUSION: LLMs show potential in generating responses that are consistent with clinical guidelines for gout management. The deployment of LLMs in gout patient education and clinical decision support necessitates the simultaneous optimization of both accuracy and readability. Key Points • This study is the first to systematically evaluate the agreement with ACR gout guidelines and readability of three state-of-the-art LLMs (DeepSeek-V3, DeepSeek-R1, GPT-5) in gout management, filling the gap of LLM benchmarking for gout patient education. • Response accuracy was significantly higher for GPT-5 compared to DeepSeek-V3, with no significant difference between GPT-5 and DeepSeek-R1. However, GPT-5 produced texts with the lowest readability. • The study innovatively combined expert-rated accuracy (5-point Likert scale by gout and hyperuricemia specialists) and objective readability metrics (FRE, FKGL, ARI), providing a comprehensive framework for assessing LLM utility in chronic disease self-management. • Findings confirm LLMs' potential for gout patient education but emphasize the need for simultaneous optimization of medical accuracy and health literacy, guiding future LLM refinement for clinical application.
    Keywords:  DeepSeek; GPT-5; Gout management; Large language models
    DOI:  https://doi.org/10.1007/s10067-026-08073-3
  5. World J Otorhinolaryngol Head Neck Surg. 2026 Apr;12(2): 202-210
       Objective: This study aimed to explore the accuracy and comprehensiveness of 10 frequently asked questions posed to ChatGPT, an online chatbot, regarding obstructive sleep apnea (OSA) and hypoglossal nerve stimulation (HNS) therapy.
    Methods: Ten questions were formulated after extensive literature review alongside the guidance of the senior author (R.N.), a board-certified otolaryngologist specializing in HNS therapy. We employed ChatGPT version 3.5 to answer each question, limiting responses to one paragraph. No follow-up queries were posed. The precision and thoroughness of responses were graded via the grading scale proposed by Mika et al. (within the context of hip arthroplasty) and the DISCERN criteria. Each response was graded by three independent, blinded reviewers (L.L., D.M., and R.N.).
    Results: On average, the responses produced by ChatGPT were satisfactory but typically required moderate clarification (inter-rater reliability [IRR] 0.533). Of the 10 responses, only one response was deemed excellent not requiring clarification. Two of the responses were deemed satisfactory requiring minimal clarification and the remaining seven responses were all deemed satisfactory requiring moderate clarification. No response was deemed unsatisfactory requiring substantial clarification. The average DISCERN score was 41.0 (IRR = 0.513), suggesting that the written health information was "average."
    Conclusions: The responses produced by ChatGPT were fairly accurate, but often benefited from further clarification to ensure a comprehensive understanding. Ultimately, it seems that ChatGPT currently functions as a powerful resource to aid patients in further understanding topics related to OSA and HNS therapy, but should not replace direct consultation with knowledgeable healthcare providers.
    Keywords:  ChatGPT; Inspire; artificial intelligence; hypoglossal nerve stimulation; obstructive sleep apnea
    DOI:  https://doi.org/10.1002/wjo2.70009
  6. Zoonoses Public Health. 2026 Apr 10.
       INTRODUCTION: AI-based chatbots are increasingly used in accessing health information. However, there are significant differences in the accuracy, transparency of sources, readability and reliability of the information provided by these systems. In infectious diseases with heterogeneous clinical courses, requiring long-term follow-up and where patient information is critical, such as brucellosis, the quality of digital information sources is of particular importance. This study aims to compare the multidimensional performance of different AI-based chatbots in the delivery of health information related to brucellosis.
    METHODS: Eight chatbots (ChatGPT-4o, Gemini 2.5 Pro, Claude 3.5 Sonnet, Microsoft Copilot, Perplexity AI, Grok-1.5, Mistral Le Chat and DeepSeek) were evaluated using standardized clinical questions. Clinical accuracy, source transparency, readability/patient-friendliness, ethical safety and perceived trust level were analyzed using the QUEST, DISCERN, JAMA criteria, PEMAT-P, Ateşman Readability Index and Trust/Confidence scales. Scores were normalized to create a heat map. Subgroup analyses were also conducted.
    RESULTS: Significant performance differences were found among the chatbots. DeepSeek achieved the highest scores in clinical accuracy, structured information presentation and source transparency. Claude stood out with higher perceived trust in addition to accuracy and transparency. It was observed that readability and perceived trust do not always coincide, and response length alone is not an indicator of quality.
    CONCLUSION: This study demonstrates that AI-based chatbots should not be evaluated in a clinical context using a single 'best practice' approach. For public health-critical diseases such as brucellosis, selective and controlled chatbot integration tailored to the intended use may offer a safer and more effective approach.
    Keywords:  artificial intelligence; brucellosis; health literacy; large language models; public health communication
    DOI:  https://doi.org/10.1111/zph.70059
  7. Int J Med Inform. 2026 Apr 03. pii: S1386-5056(26)00161-9. [Epub ahead of print]214 106421
       BACKGROUND: Patients with chronic obstructive pulmonary disease (COPD) increasingly turn to online platforms for health information. However, the quality of information shared in social media forums remains uncertain. This study compared the accuracy of user-generated Facebook responses to AI-generated replies from ChatGPT.
    METHODS: Posts and comments were extracted from two COPD-related Facebook groups in October 2024. A total of 48 posts from each group were selected, yielding 2,761 comments. Each post was also submitted to ChatGPT. Responses were categorized as 'useful', 'misleading', or 'neither' and rated on a 5-point Likert scale by three independent reviewers. Interrater reliability was assessed using Fleiss' kappa. Differences in quality scores were analyzed using the Wilcoxon signed-rank test and Hodges-Lehmann 95% CI.
    RESULTS: In the Danish group, 47% (667/1,433) of comments were relevant, with 34% deemed useful and 9% misleading. Results were similar in the English group (734/1,315 relevant; 38% useful, 9% misleading). Critically, 16% and 15% of comments encouraging behavioral changes contained misleading information in the Danish and English group, respectively. Likert ratings showed significantly higher accuracy for ChatGPT compared to Facebook. For the Danish group, the median score for Facebook was 4.0 (IQR 3.0-4.5) versus 5.0 (IQR 4.0-5.0) for ChatGPT (p < 0.001; Hodges-Lehmann difference: -0.75, 95% CI -1.0 to -0.5). For the English group, Facebook scored a median of 4.0 (IQR 3.0-4.0) versus 5.0 (IQR 5.0-5.0) for ChatGPT (p < 0.001; Hodges-Lehmann difference: -1.0, 95% CI -1.25 to -0.75). No significant performance differences were found between the Danish and English groups for either platform.
    CONCLUSION: ChatGPT delivered more accurate responses than Facebook users, highlighting its potential as a reliable educational tool. However, Facebook remains valuable for peer support, suggesting that combining AI and social platforms may enhance digital care strategies for COPD.
    Keywords:  Artificial intelligence; COPD; Patient education; Public health; Social media
    DOI:  https://doi.org/10.1016/j.ijmedinf.2026.106421
  8. Can Urol Assoc J. 2026 Mar 30.
       INTRODUCTION: Patients rely on online searches for patient education materials (PEMs). PEMs are recommended to be written at or below a sixth-grade reading level but are regularly written at a college reading level. Using prompt engineering, we assess the information, misinformation, and readability of ChatGPT responses to urologic oncology questions.
    METHODS: Forty-five questions relating to prostate, bladder, and kidney cancer were presented to ChatGPT (version 4o, OpenAI). Quality of health information was assessed using DISCERN (1 [low] to 5 [high]). Understandability and actionability were assessed using PEMAT-P (0 [low] - 100% [high]). Misinformation was scored from 1 [no misinformation] to 5 [high misinformation]. Grade and reading level were calculated using the Flesch-Kincaid scale [5 (easy) to 16 (difficult), and 100-90 (5th grade level) to 10-0 (professional level), respectively]. Prompt engineering was then applied to responses and evaluated.
    RESULTS: ChatGPT answers are highly accurate but too advanced of a reading level and lacked explanations of benefits, risks, visual aids, actionability, and citations. Using prompt engineering, DISCERN (3.42-4.47, p<0.0001), PEMAT-P understandability (88.4-95.5%, p<0.0001), and actionability (25.6-84.2%, p<0.0001), grade reading level (10.5-5.3, p<0.0001), and reading level (42 [college level] to 71.7 [7th grade], p<0.0001), all significantly improved. Misinformation did not change significantly.
    CONCLUSIONS: Using prompt engineering, ChatGPT provides highly accurate and understandable PEMs at a patient appropriate reading level and provides concrete resources for patient action. Urologists should understand prompt engineering and be involved in the development of artificial chatbots to optimize results.
    DOI:  https://doi.org/10.5489/cuaj.9578
  9. Arch Esp Urol. 2026 Mar;79(2): 247-254
       BACKGROUND: Pediatric urolithiasis is an increasingly important health concern, and affected children and their families require information that is both accurate and easily understandable. Artificial intelligence (AI)-powered chatbots have become widely used sources of health information; however, the readability, quality, and reliability of their outputs remain insufficiently evaluated. This study aimed to assess the effectiveness and reliability of AI chatbots in providing patient-oriented information on pediatric kidney stone disease and to identify factors influencing the quality and readability of their responses.
    METHODS: Four AI chatbots (ChatGPT-5, Google Gemini, Claude 3 Opus, and DeepSEEK) were queried with 30 standardized questions related to pediatric kidney stones. Readability was evaluated using the Average Reading Level Consensus (ARLC), Automated Readability Index (ARI), and Simple Measure of Gobbledygook (SMOG). Response quality and reliability were asssessed using the Ensuring Quality Information for Patients (EQIP) tool and Modified DISCERN score. Statistical analyses included one-way analysis of variance ANOVA, Kruskal-Wallis tests, and appropriate post hoc comparisons.
    RESULTS: Readability differed significantly among the chatbots. Google Gemini demonstrated the highest reading levels across all metrics (ARLC: 14.93, ARI: 16.2, and SMOG: 13.32), whereas ChatGPT, Claude, and DeepSEEK produced less complex test (p < 0.001; large effect sizes, η2 = 0.195-0.512). EQIP scores did not differ significantly between models (p = 0.491, ε2 = 0.021, negligible effect), indicating comparable informational quality. In contrast, reliability varied significantly: ChatGPT and Google Gemini achieved higher Modified DISCERN scores (median 4.00) than Claude and DeepSEEK (median 3.00; p = 0.001, ε2 = 0.318, large effect). Subgroup analyses by question category revealed notable differences in performance, highlighting model-specific strenghts and limitations.
    CONCLUSIONS: Substantial variability exists in the readability and reliability of AI-generated health information on pediatric urolithiasis. Although ChatGPT and Google Gemini provided more reliable information, Google Gemini's responses were consistently more complex and less accessible. These findings emphasize the need for careful validation and language simplification of AI-generated content before its use in patient and caregiver education.
    Keywords:  artificial intelligence; chatbots; information quality; patient education; pediatric urolithiasis; readability
    DOI:  https://doi.org/10.56434/j.arch.esp.urol.20267902.30
  10. Front Public Health. 2026 ;14 1805848
       Background: Autoimmune hepatitis (AIH) is a chronic immune-mediated liver disease that requires long-term management, in which effective patient education plays a critical role. With the rapid development of large language models (LLMs), AI-generated health information is increasingly accessed by patients; however, the readability, quality, and educational suitability of LLM-generated AIH-related content remain insufficiently evaluated.
    Methods: Five widely used LLMs-ChatGPT, Doubao, DeepSeek, Wenxin Yiyan, and Tongyi Qianwen-were assessed based on their responses to 20 frequently asked AIH patient education questions covering five thematic categories. Text readability was evaluated using multiple indices, including the Automated Readability Index, Flesch Reading Ease Score, Gunning Fog Index, Flesch-Kincaid Grade Level, Coleman-Liau Index, SMOG, and Linsear Write formula. Information quality and educational suitability were assessed using the Global Quality Score (GQS) and the Chinese version of the Patient Education Materials Assessment Tool (C-PEMAT). Clinical Intent Alignment (CIA) was used to evaluate the coverage of guideline-defined medical key points based on the 2025 EASL Clinical Practice Guidelines. Inter-rater reliability was analyzed using Cohen's kappa, and comparative and correlation analyses were performed.
    Results: Significant differences were observed among the LLMs in readability, information quality, and educational suitability (all p < 0.05). ChatGPT achieved the highest GQS and C-PEMAT scores, followed by Doubao and DeepSeek, whereas Wenxin Yiyan and Tongyi Qianwen showed lower performance and greater variability. CIA analysis indicated comparable coverage of guideline-defined clinical intent across models. Readability varied significantly across content themes, with texts related to disease mechanisms and diagnostic processes exhibiting higher linguistic complexity. Correlation analysis demonstrated moderate associations between GQS and grade-level readability indices, whereas C-PEMAT and CIA showed weak correlations with traditional readability metrics.
    Conclusion: Substantial variability exists among LLMs in generating AIH patient education materials. Model selection critically influences information quality and educational suitability, whereas content theme primarily affects linguistic complexity. Although most models produced moderate-to-good quality information, relatively high readability levels suggest that further simplification may be needed for general patient populations. A multidimensional evaluation framework integrating readability, quality, educational suitability, and clinical intent alignment is essential for the responsible use of LLMs in AIH patient education.
    Keywords:  autoimmune hepatitis; health information quality; large language models; patient education; readability
    DOI:  https://doi.org/10.3389/fpubh.2026.1805848
  11. Front Public Health. 2026 ;14 1822049
      [This corrects the article DOI: 10.3389/fpubh.2026.1760872.].
    Keywords:  generative artificial intelligence; health information quality; large language models; perinatal depression; postpartum depression; readability
    DOI:  https://doi.org/10.3389/fpubh.2026.1822049
  12. Cephalalgia. 2026 Apr;46(4): 3331024261435594
      AimThis study aimed to assess the impact of online health information and AI-based tools on treatment decisions, trust, and care-seeking behaviors among migraine patients in Arab speaking countries from MENA region.MethodsA multinational, cross-sectional online survey was conducted among 4276 adults with migraine across 13 MENA countries. Data collected included sociodemographic characteristics, migraine history, digital health literacy (eHEALS), AI tool usage, and trust in health information sources.ResultsThe mean eHealth literacy score was 29.9 ± 6.2. Overall, 75.6% demonstrated adequate digital health literacy. Neurologists and physicians were the most trusted sources, whereas social media influencers were the least trusted. Approximately one-third of participants reported modifying migraine treatment or delaying medical consultation based on online information. In multivariable analyses, higher trust in online information was strongly associated with delayed medical consultation (aOR 6.48, 95% CI 5.53-7.58, p < 0.001). In contrast, use of AI tools was associated with lower odds of reporting treatment modification based on online advice (aOR 0.29, 95% CI 0.17-0.49, p < 0.001). Higher trust in online information was consistently associated with both delayed care and treatment changes. Younger age, male sex, and active online information-seeking independently predicted AI use.ConclusionDigital health engagement, including trust in online sources and AI tool use, was significantly associated with migraine-related decision behaviors in this multinational MENA cohort. While AI use was linked to more cautious treatment behaviors, higher trust in online information was associated with delayed medical consultation and treatment modification. These findings highlight the importance of strengthening digital health literacy and promoting reliable online resources.
    Keywords:  MENA region; artificial intelligence; health information; internet; migraine; multinational
    DOI:  https://doi.org/10.1177/03331024261435594
  13. Front Oral Health. 2026 ;7 1754009
       Aim: To identify the primary social media and digital platforms used by parents and/or caregivers of children, analysing search habits regarding interests in OH, the frequency of usage, and the level of reliability attributed to the consulted platforms.
    Methods: A cross-sectional analysis with an anonymous survey was conducted. Following a non-probabilistic sampling, parents or caregivers were invited to complete a 14 questions' survey. A descriptive and analytical statistical analysis were conducted with a 95% level of confidence.
    Results: A total of 112 surveys were obtained, mainly filled by females (70.5%) between 31 and 40 years (57.1%). Mostly of the respondents (61.3%) stated that the search for information about their children's OH, being the main reason for searching interest in the topic, and the device more used the mobile phone. Maternity websites were the first search choice, followed by Instagram® and scientific databases. Half of the studied sample (52.7%) consider the information as not very reliable or not reliable at all, most respondents stated that they consulted with family and friends.
    Conclusion: Most parents search for information about their children's OH on online platforms. Only 21.74% of respondents verified the information with a paediatric dentist, rating the information as not very reliable or moderately reliable.
    Keywords:  digital platforms; oral health; parenting; pediatric dentistry; social media
    DOI:  https://doi.org/10.3389/froh.2026.1754009
  14. Clin Orthop Surg. 2026 Apr;18(2): 292-302
       Backgroud: Elbow epicondylitis (EEC) is a painful condition that affects the common flexor or extensor tendons of the elbow. This study aimed to evaluate the quality and readability of online information regarding EEC using several established methods.
    Methods: Websites were examined using the Google search engine on September 13, 2024, to identify the top 100 ranked websites using the following search terms: "lateral epicondylitis," "medial epicondylitis," "golfer's elbow," "tennis elbow," and "elbow pain." The inclusion criteria were accessible English-language websites containing health information related to the search terms. Among the retrieved websites, those that were inaccessible, duplicate, non-English, irrelevant, and registration- or subscription-based websites, as well as those limited to scientific articles or video clips, were excluded. The websites were categorized into 5 groups. The quality of each website was evaluated using Health on the Net Foundation (HON) grade scale, instrument for judging the quality of written consumer health information on treatment choice (DISCERN instrument), and Ensuring Quality Information for Patients (EQIP) score. Additionally, website readability was assessed using Flesch-Kincaid reading ease (FRE) score, Flesch-Kincaid grade (FKG) level, Gunning-Fog index, Simple Measure of Gobbledygook (SMOG) grade level, and Coleman-Liau index.
    Results: Of 500 websites, 201 were selected based on exclusion criteria. News portals and non-profit websites generally exhibited higher quality than other website types. Although 17 websites were considered high quality, none were rated as such according to the DISCERN instrument. On average, each type of website had an FKG level above 7.7, and the average FRE score was below 54 points, indicating a challenging reading level for the general audience.
    Conclusions: The quality of online information regarding EEC should be improved and made more accessible to the general public. Although the readability of this information remains inadequate, the results may indicate some improvement compared with those of previous studies. Given the high prevalence of EEC and the increasing use of web-based health information, further efforts could be made to enhance the readability and quality of online resources on EEC.
    Keywords:  Epicondylitis; Internet; Online information; Quality; Readability
    DOI:  https://doi.org/10.4055/cios25176
  15. Ann Plast Surg. 2026 May 01. 96(5): 463-469
       BACKGROUND: Peripheral nerve (PN) disorders are common, yet the patient journey from diagnosis to surgery and recovery can be lengthy and difficult. Many patients rely on online resources during this process, and their understanding of the disease process and treatments may influence management and outcomes. This study aimed to evaluate the readability of online patient resources across different PN disorders, sources, and website sections to identify opportunities to improve clarity and accessibility.
    METHODS: Three search engines were queried using terms for various peripheral nerve disorders. After excluding duplicates, scientific publications, and physician-oriented and restricted-access content, 200 patient-directed articles were included. Articles were scored based on full-text and individual sections (eg, Symptoms, Causes, Diagnosis, Preop, Intraop, Recovery, Rehab). Data were analyzed using ANOVA and Tukey's HSD.
    RESULTS: Two hundred websites were included from 4 source types: hospitals/academic centers (n=111), private practices (n=58), patient advocacy organizations (n=24), and other nonprofit organizations (n=8). Across sources, average Flesh-Kincaid Reading Ease (FKRE) score ranged from 37.6 (SD ±11.5) to 42.6 (SD ±13.1), and both college-level reading scores and grade-level scores ranged from 11.8 (SD ±1.93) to 15.6 (SD ±2.9). ANOVA demonstrated significant differences (P=0.001) between sources regarding the number of complex words; the Tukey test showed patient advocacy organizations had significantly more complex words than hospitals/academic centers and private practices (P<0.5). Article sections on "Symptoms" and "Recovery" were significantly more difficult to read than all other sections across a variety of metrics (ie, FKRE, GFS, CLI, SMOG, number and percentage of complex words; P<0.05).
    CONCLUSIONS: Online patient resources for peripheral nerve disorders were written well-above the recommended fifth-grade level, with grade levels ranging from late high school to college graduate. "Symptoms" and "Recovery" sections were written at the highest reading levels, and patient advocacy organizations used more complex words than other sources. To improve readability, future resources should adopt clearer, simpler language and integrate supportive media such as visuals, infographics, or structured text-based aids to enhance patient comprehension without increasing textual complexity.
    Keywords:  health literacy; online patient resources; patient education; peripheral nerve surgery; peripheral neuropathy; readability
    DOI:  https://doi.org/10.1097/SAP.0000000000004737
  16. J Sport Rehabil. 2026 Apr 09. 1-6
       CONTEXT: Although ankle sprains are among the most common musculoskeletal injuries, prior studies have not systematically reviewed the educational quality of related YouTube videos. We aimed to evaluate the characteristics, sources, and educational quality of YouTube videos addressing ankle sprains.
    DESIGN: Cross-sectional analysis of online material.
    METHODS: We performed 4 independent YouTube searches on ankle sprains and recorded the top 25 results for each search. After screening 100 videos, we included 73 total videos for final analysis. We evaluated video content using 3 validated tools: Global Quality Scale (GQS), the Patient-Friendliness Scale, and the Journal of the American Medical Association criteria.
    RESULTS: A total of 73 YouTube videos accumulated over 20.7 million views (mean: 283,546). Videos received 14.8 likes and 1.04 comments per 1000 views. Among included videos, the mean GQS score measured 3.75, Patient-Friendliness Scale score measured 3.48, and Journal of the American Medical Association score measured 1.19, indicating good accessibility, moderate educational quality, and poor transparency. Physicians produced the highest overall quality and transparency (GQS: 3.89, Journal of the American Medical Association: 1.50). Rehabilitation-focused content emerged as the most common theme (36%), maintaining good quality and accessibility (GQS: 3.77, Patient-Friendliness Scale: 4.23).
    CONCLUSION: YouTube offers wide access to ankle sprain information, yet video quality, accessibility, and transparency remain low. Physician-created and rehabilitation-focused videos demonstrated superior educational value, although highly viewed videos did not correspond with higher quality. These findings highlight the need for more reliable, high-quality ankle sprain content on online platforms.
    Keywords:  health literacy; online health information; patient education; sports injury
    DOI:  https://doi.org/10.1123/jsr.2025-0504
  17. Front Public Health. 2026 ;14 1749849
       Background: Obesity, a chronic condition affecting multiple physiological systems, poses major public health challenges. Dietary interventions are widely recognized as effective strategies for weight management. With over 4 billion Internet users worldwide, an increasing number of individuals rely on online platforms for health information. In China, TikTok, Bilibili, and Kwai are major channels for disseminating health-related content. However, the quality of dietary weight loss information on these platforms remains unclear.
    Objective: This study aims to assess the reliability and quality of the information in Chinese videos on dietary weight loss shared on the BiliBili, TikTok, and Kwai, three video-sharing platforms.
    Methods: We identified the top 100 dietary weight-loss videos on each platform in February 2024, resulting in a total of 300 videos. Video information quality and reliability were assessed using the Global Quality Score (GQS) and modified DISCERN (mDISCERN). Correlations between video quality and video characteristics were also analyzed.
    Results: The average GQS scores for BiliBili, TikTok, and Kwai were 2.04, 1.81, and 1.70, respectively, while the average mDISCERN scores were 2.01, 1.81, and 1.73. Median scores for both tools across all platforms were 2. BiliBili showed significantly higher GQS scores than TikTok and Kwai (p < 0.01 and p < 0.05, respectively). Regarding mDISCERN, BiliBili scored significantly higher than TikTok (p < 0.05), while the difference with Kwai was not statistically significant (p = 0.08). Nevertheless, none of the platforms achieved scores above 3, indicating generally low information quality and reliability. Significant positive correlations were found between video duration and both GQS (r = 0.41, p < 0.01) and mDISCERN (r = 0.32, p < 0.01). Additionally, strong correlations were observed between likes and saves (r = 0.90, p < 0.01), likes and comments (r = 0.92, p < 0.01), and saves and comments (r = 0.86, p < 0.01).
    Conclusion: While acknowledging limitations regarding cross-sectional design and specific sampling of single keyword and Chinese platforms, our findings highlight that the prevalence of low-quality videos on Chinese social media exposes viewers to significant risks of misinformation and inappropriate dieting. These findings underscore the need to promote digital health literacy and improve strategies for digital health communication.
    Keywords:  dietary intervention; internet videos; quality assessment; reliability assessment; weight loss
    DOI:  https://doi.org/10.3389/fpubh.2026.1749849
  18. BMC Oral Health. 2026 Apr 07.
      
    Keywords:  DISCERN; Dental implantology; Global Quality Scale; JAMA benchmark; Patient education; Subperiosteal implant; TikTok; VIQI; Video analysis; YouTube
    DOI:  https://doi.org/10.1186/s12903-026-08290-x
  19. J Int Med Res. 2026 Apr;54(4): 3000605261436168
      BackgroundParkinson's disease is a progressive neurodegenerative disorder that contributes to the growing global health burden. YouTube has emerged as a ubiquitous source of health information among patients and their caregivers. Despite the increasing use of video-sharing platforms, the educational quality and reliability of Parkinson's disease-related videos remain unknown.MethodsThis cross-sectional study evaluated 147 Parkinson's disease-related YouTube videos. The research team collected general video information, and three instruments (Global Quality Score, modified DISCERN tool, and Patient Education Materials Assessment Tool) were applied to assess overall quality, reliability, and content understandability and actionability. Descriptive analyses were conducted overall, followed by detailed comparisons across the videos. Finally, using the Spearman correlation coefficient, we explored potential correlations between general video information and video quality and reliability.ResultsIn this study, we observed moderate overall quality and reliability when assessed using the Global Quality Score, modified DISCERN tool, and Patient Education Materials Assessment Tool instruments.ConclusionsOur findings demonstrated that YouTube contains substantial publicly available Parkinson disease-related content; however, the quality and reliability of the content varies and is generally inadequate to facilitate patient education. To better serve patients with Parkinson disease and their caregivers, multifaceted actions from healthcare professionals, science communicators, and internet platforms are necessary to elevate the quality and visibility of credible content.
    Keywords:  Parkinson’s disease; YouTube; information quality; patient education; social media
    DOI:  https://doi.org/10.1177/03000605261436168
  20. Sci Rep. 2026 Apr 04.
      Wilson disease (WD), a rare autosomal recessive hereditary disorder of copper metabolism that requires lifelong management and patient education. Short video platforms have become major channels for health information dissemination in China, but the quality and reliability of content related to Wilson disease on these platforms have not been systematically evaluated. A cross-sectional study design was adopted, and a total of 153 short videos related to Wilson disease were collected from three platforms: Bilibili, Douyin, and Kuaishou. On January 18, 2026, information quality and reliability assessment was conducted over two days using three validated evaluation tools: GQS for quality assessment, and the mDISCERN and JAMA benchmarks for reliability assessment. Meanwhile, user interaction indicators and video characteristics were extracted. Chi-square test, Kruskal-Wallis H test, and Spearman correlation analysis were used for statistical analysis. Videos on Bilibili achieved the highest scores in all three evaluations: GQS (3.34 ± 1.34), mDISCERN (2.84 ± 1.31), and JAMA (2.34 ± 1.24), followed by Douyin and Kuaishou. Among video creators, science communicators had the highest video scores (GQS: 4.40 ± 0.55), rather than health professionals (GQS: 2.91 ± 1.08). Videos with comprehensive themes covering "etiology, symptoms, and treatment" had higher quality scores (GQS 3.45 ± 1.17) and user interaction indicators than those with single themes. There was no correlation between video quality and interaction indicators, but a certain correlation existed between video quality and video duration. There are significant differences in the quality of content related to Wilson disease across different short video platforms, with Bilibili providing the most reliable information. Uploads by health professionals and comprehensive theme content are associated with higher information quality. Measures should be formulated according to the characteristics of different platforms to promote credible health information on Wilson disease, so as to assist patient education.
    Keywords:  GQS; Health information quality; JAMA; Patient education; Short video platforms; Wilson disease; mDISCERN
    DOI:  https://doi.org/10.1038/s41598-026-47222-1
  21. Front Public Health. 2026 ;14 1748168
       Background: Myocardial infarction (MI) is a leading cause of cardiovascular mortality worldwide and requires timely treatment and accurate public awareness of risk factors, warning signs, and first aid. In China, short-video platforms such as TikTok (Douyin, Chinese mainland version) and Bilibili have become major health information sources, yet the quality and reliability of MI-related content remain inadequately evaluated.
    Objective: This cross-sectional study systematically assessed and compared the quality, reliability, and educational value of MI-related videos on TikTok and Bilibili.
    Methods: Using the keyword "" (myocardial infarction), we retrieved the top 100 videos from TikTok and Bilibili on September 1, 2025. After exclusions, 137 videos were included. Uploaders were classified as clinicians, patients, or traditional Chinese medicine practitioners. Quality was evaluated using GQS, mDISCERN, JAMA benchmarks, and PEMAT-U/A. Statistical analyses included Spearman correlation, Mann-Whitney U, Kruskal-Wallis, and chi-square tests.
    Results: Bilibili videos were significantly longer but had much lower engagement than TikTok videos. Only mDISCERN scores differed significantly between platforms: Bilibili contained a higher proportion of high-reliability videos than TikTok, while JAMA, GQS, and PEMAT-U/A scores did not differ significantly. Uploader background significantly influenced quality outcomes. Clinicians and TCM practitioners achieved higher JAMA scores than patients, indicating greater formal credibility, whereas patients had a higher proportion of high mDISCERN scores, reflecting more detailed experiential content. Correlation analysis revealed a bidirectional effect of video length: longer duration was positively associated with mDISCERN and GQS scores but negatively associated with JAMA scores. Interaction metrics showed strong internal synergy but almost no correlation with professional quality scores, demonstrating a clear "quality popularity paradox."Content analysis showed an imbalanced pattern: particularly regarding emergency measures and medication safety information were severely lacking.
    Conclusion: MI-related content on Chinese short-video platforms is of moderate quality but characterized by a significant disconnect between popularity and educational value, as well as critical deficiencies in emergency response information. These findings underscore the urgent need for coordinated interventions, including platform-level quality control, collaborative content creation between professionals and platform, and enhanced public health literacy to ensure the safe and effective use of these platforms for health education.
    Keywords:  Bilibili; TikTok; health education; myocardial infarction; social media; video quality
    DOI:  https://doi.org/10.3389/fpubh.2026.1748168
  22. Medicine (Baltimore). 2026 Apr 10. 105(15): e48359
      TikTok and Bilibili have gradually become important sources for the public to obtain health information. This study aims to evaluate the content, quality, and reliability of varicocele-related videos. We conducted a search on both platforms using the keyword "varicocele" and collected the top 150 videos based on the default rankings, along with video duration, engagement metrics, uploader identity, and video content. Videos were assessed using the global quality score (GQS) and modified DISCERN (mDISCERN) scales. Mann-Whitney U tests and Kruskal-Wallis tests were used to compare differences between groups. Spearman correlation analysis was employed to evaluate the relationships between video features, engagement, and quality. A total of 255 videos were included. The content of the videos was primarily focused on treatment (71.37%), with limited coverage of prevention (32.16%). The median GQS score was 3.00 (interquartile range (IQR): 3.00-4.00), and the median mDISCERN score was 2.00 (IQR: 2.00-2.00). TikTok demonstrated a higher GQS score than Bilibili (P < .05). Compared to individual users and non-specialists, videos uploaded by specialists scored higher on both GQS and mDISCERN (P < .05). No significant correlation was found between engagement metrics and either GQS or mDISCERN (P > .05). The quality and reliability of varicocele-related videos are suboptimal, with insufficient coverage of prevention content. Videos uploaded by specialists demonstrated higher quality and reliability. Engagement metrics were not correlated with video quality or reliability. Future platforms should strengthen content monitoring and review processes and provide support for professional creators to enhance the credibility and educational value of digital health communication.
    Keywords:  Bilibili; TikTok; digital health; health communication; information quality; short-video platforms; varicocele
    DOI:  https://doi.org/10.1097/MD.0000000000048359
  23. Front Public Health. 2026 ;14 1770727
       Introduction: YouTube, the dominant global video-sharing social media platform, has created new opportunities to provide regulated health content directly with the public, including on broader public health threats such as antimicrobial resistance (AMR). Little is known about the most effective strategy with which to engage the public with online AMR content.
    Method: This study comprehensively evaluated the top 200 viewed YouTube videos on AMR by extracting data on video characteristics, narratives and quality, and explored factors associated with viewer engagement through proxy analytics (views, likes and comments).
    Results: We found that content focused upon the mechanisms of AMR and antibiotics were most viewed, yet engaging videos do not necessarily convey high-quality information. Videos on internet media and non-medical channels were more popular.
    Discussion: The study calls for more strategic production of engaging videos on AMR. Global platforms should strive to facilitate audience in accessing reliable health information.
    Keywords:  antimicrobial resistance; content analysis; engagement; online video; social media
    DOI:  https://doi.org/10.3389/fpubh.2026.1770727
  24. JMIR Infodemiology. 2026 Apr 07. 6 e86489
       Unlabelled: Nearly 1 in 4 young adults has a chronic condition, yet many feel well despite their diagnosis. Asymptomatic conditions such as prediabetes and hypertension create a unique vulnerability to digital health misinformation, particularly on platforms where inaccurate content is prevalent. Conventional clinical responses, which often just warn patients about online misinformation, fail to address the underlying drivers of this behavior. This viewpoint proposes a novel disease characteristic-based vulnerability framework to understand this challenge, grounded in established behavioral science theories such as the capability, opportunity, and motivation-behavior model; temporal discounting; and the concept of information voids in infodemiology. We identify a critical "information void" for asymptomatic conditions managed primarily through lifestyle modification. This void, created by the absence of symptomatic feedback combined with delayed clinical biomarker feedback, compels patients to seek information online. Instead of viewing this information seeking as a problematic deviation, we reframe it as a "digital phenotype" indicating a patient's readiness for behavior change. Through case studies illustrating how this framework applies to specific conditions (prediabetes, nonalcoholic fatty liver disease, and untreated hypertension), we demonstrate its practical utility for clinicians, health systems, and policymakers. Evidence supports a multipronged approach: integrating digital health literacy into clinical encounters, providing curated evidence-based resources, and pursuing strategic institutional engagement in digital spaces. While acknowledging the framework's deliberate simplification and the need for culturally sensitive adaptation across diverse health care settings, this viewpoint offers a generalizable strategy for engaging with patients' information needs, helping transform a public health challenge into an opportunity for empowerment.
    Keywords:  NAFLD; TikTok; asymptomatic chronic disease; behavior change; digital health; digital health literacy; dyslipidemia; fatty liver disease; health information seeking; health literacy; hypertension; infodemiology; misinformation; nonalcoholic fatty liver disease; patient engagement; prediabetes; social media; young adults
    DOI:  https://doi.org/10.2196/86489