bims-librar Biomed News
on Biomedical librarianship
Issue of 2026–04–26
27 papers selected by
Thomas Krichel, Open Library Society



  1. J Microbiol Biol Educ. 2026 Apr 22. e0029725
      Journal Article Annotations with Zotero (JAAZ) is an activity designed to help undergraduate life science students develop essential skills for critically engaging with research articles while building proficiency in reference management. This structured, multi-part assignment introduces students to Zotero, a free reference manager, and guides them in actively engaging with scientific literature. Students use Zotero to read and annotate journal articles, which provides them with experience in evaluating scientific literature. Articles are first annotated independently with notes that define terminology, explain complex concepts, and identify the main points of the study. Then, in reading groups, students discuss the article and their independent annotations, working together to produce a consensus group-annotated version that is shared with the class. By making the annotation process a collaborative resource, JAAZ makes research articles more accessible, enabling students to engage with a greater number of articles in less time while improving overall comprehension. Students also build skills in critically evaluating whether an article is relevant and should be cited in their own research writing. We find that this activity helps undergraduate life science students enrolled in bioinformatics research courses become more confident in navigating scientific literature and managing references.
    Keywords:  STEM education; active learning; bioinformatics; course-based undergraduate research experiences (CUREs); reference management; research skills development
    DOI:  https://doi.org/10.1128/jmbe.00297-25
  2. Radiol Technol. 2026 May-Jun;97(5):97(5): 303-309
       PURPOSE: To evaluate the factual accuracy and citation fidelity of Scopus AI's outputs in response to a single health care-related research question about the importance of human trafficking prevention education for professionals.
    METHODS: This study employed a mixed-methods content verification approach. A single health care-related research question was entered into Scopus AI (Elsevier), which generated a summary, expanded summary, and concept map. Quantitative data were collected by classifying each statement in the Scopus AI output as accurate, misleading, or incorrect. Qualitative analysis provided contextual insights into citation use, source type, and interpretation of content.
    RESULTS: Of the 30 statements analyzed from the Scopus AI output, 27 (90.0%) were rated as accurate, and 3 (10.0%) were categorized as misleading. No incorrect or hallucinated content was detected. Qualitative analysis revealed that Scopus AI consistently cited legitimate, peer-reviewed sources. However, in 2 cases, the tool referenced secondary sources without clarification, raising questions about source hierarchy.
    DISCUSSION: Though Scopus AI produced largely reliable academic content, this study underscores the need for user verification and scholarly judgment, particularly regarding secondary sources and citation transparency. The findings highlight the importance of teaching students to critically evaluate artificial intelligence (AI)-generated material. In response to the findings, a classroom activity titled "Fact-Check the Bot" was developed to promote critical AI literacy. This activity guides learners in assessing AI-generated claims using a verification matrix and original literature and can be adapted for use with other AI tools.
    CONCLUSION: This study demonstrates the potential and the limitations of generative AI in academic research and offers a model for integrating verification practices into educational settings to enhance students' critical engagement with AI tools.
    Keywords:   AI literacy; Scopus AI; critical thinking; digital literacy; literature verification; generative AI
  3. BMC Med Res Methodol. 2026 Apr 25.
      
    Keywords:  Bibliographic databases; Case reports.; Evidence-based medicine; Indexing; PubMed; Publication types; Study designs
    DOI:  https://doi.org/10.1186/s12874-026-02861-w
  4. Digit Health. 2026 Jan-Dec;12:12 20552076261444223
       Background: Advancements in artificial intelligence (AI) have markedly improved healthcare accessibility, providing patients with immediate medical information via chatbots. Individuals with chronic cough often seek support through online resources; however, unregulated tool use raises concerns regarding misinformation, safety risks, and clinical guideline deviations. Therefore, critically evaluating chatbot-provided information on chronic cough is crucial.
    Objective: To conduct a performance evaluation of six AI chatbots-ChatGPT-4o, ChatGPT-5, DeepSeek V3, Copilot, Gemini 2.5 flash, and Perplexity-in responding to high-frequency chronic cough queries, with respect to accuracy, reliability, readability, and clinical guideline adherence.
    Methods: Based on an inductive analysis of Google Trends and Chinese online health communities, 25 queries were formulated. Two clinical experts evaluated the responses for accuracy, supplementarity, and incompleteness, following the European Respiratory Society (ERS) chronic cough guidelines. Reliability was assessed using DISCERN, EQIP, JAMA, and GQS, while readability was measured via six standard metrics, including the Flesch-Kincaid Grade Level.
    Results: Perplexity achieved the highest reliability scores out of the tested models (DISCERN: 51.00±3.94; EQIP: 69.40±6.34), while Copilot recorded the lowest (DISCERN: 37.60±4.19; EQIP: 52.40±6.94; pairwise P<0.001vs. Perplexity). Although Copilot demonstrated comparatively better readability, no model achieved the recommended 6th-grade reading level. Pooled accuracy reached 80.39%, but critical clinical details were frequently omitted across all models.
    Conclusion: While AI chatbots offer accessible health advice for chronic cough, their usefulness is constrained by significant deficiencies in readability and reliability. Widely used tools such as Copilot systematically omit guideline-based content, potentially introducing safety risks. Future research should explore whether enhanced chatbots can safely support patient decision-making and evaluate their real-world clinical applicability.
    Keywords:  artificial intelligence; chatbot; chronic cough; health advice; large language models
    DOI:  https://doi.org/10.1177/20552076261444223
  5. Cureus. 2026 Mar;18(3): e105155
      Purpose Patient education materials (PEMs) enhance healthcare access and inclusivity, particularly for individuals without clinical backgrounds. However, many PEMs exceed recommended readability levels. This study evaluated whether artificial intelligence (AI)-assisted editing using ChatGPT-4o (OpenAI, San Francisco, CA, USA) could improve the readability of endourology PEMs related to prostate cancer, nephrolithiasis, bladder cancer, and kidney cysts. Methods Twenty-one publicly available PEMs from the American Urological Association (AUA) were analyzed. Each document was uploaded into ChatGPT-4o with instructions to rewrite the text to an eighth-grade reading level or lower while preserving content and word count. Readability of original and AI-modified PEMs was assessed using ReadabilityFormulas.com across six validated indices: Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), Gunning-Fog Index (GFI), Coleman-Liau Index (CLI), Automated Readability Index (ARI), and Flesch Reading Ease (FRE). Pre- and post-AI readability scores were compared using paired two-tailed t-tests. Results Across all indices and disease categories, AI modification significantly improved readability. Unmodified PEMs were written at approximately the ninth to 11th grade level, whereas AI-modified versions were reduced to the fourth to sixth grade range (p < 0.001 for all comparisons). AI-modified PEMs also demonstrated reduced variability across readability scores, indicating improved standardization. Conclusions ChatGPT-4o significantly improved the readability of AUA endourology PEMs, aligning them with established health literacy recommendations. AI-assisted editing represents a scalable and standardized approach to improving patient comprehension and accessibility of urologic education materials.
    Keywords:  artificial intelligence; endourology; health literacy; patient education materials; readability
    DOI:  https://doi.org/10.7759/cureus.105155
  6. Rev Assoc Med Bras (1992). 2026 ;pii: S0104-42302026000202206. [Epub ahead of print]72(2): e20251080
       OBJECTIVE: The aim of this study was to determine the quality, reliability, and readability of the responses provided by artificial intelligence chatbots about idiopathic pulmonary fibrosis.
    METHODS: The clinically relevant questions about idiopathic pulmonary fibrosis diagnosis, treatment, prognosis, and lifestyle management were submitted to four widely used artificial intelligence chatbots, including ChatGPT, Perplexity, Gemini, and Copilot. Responses were assessed by five clinicians based on readability, understandability, quality, and content reliability using standardized tools.
    RESULTS: The overall readability of chatbot-generated responses was low, corresponding to high educational requirements. Understandability ranged from moderate to good, whereas actionability remained moderate. Gemini produced the most readable and understandable outputs. Journal of the American Medical Association and Likert scores indicated limited source transparency but good guideline concordance. DISCERN analysis for the treatment question showed significant variation (p=0.003), with Perplexity achieving the highest total score.
    CONCLUSION: Although artificial intelligence chatbots offer rapid and accessible information, their readability and source reliability remain limited. These findings highlight the necessity of expert supervision and further model improvement before artificial intelligence chatbots can be safely integrated into patient education.
    DOI:  https://doi.org/10.1590/1806-9282.20251080
  7. Appl Clin Inform. 2026 Mar;17(2): 274-280
       Objectives: The objective of this study is to evaluate whether ChatGPT models can reliably apply the DISCERN instrument, a 16-question human-scored rubric developed in 1999 to evaluate consumer health information, and assess the impact of prompting strategies, model choice, and scoring repeatability on agreement with human-derived DISCERN scores.
    Methods: A PubMed search of "DISCERN" identified English-language studies since 2019 reporting exact webpage URLs with corresponding human-derived DISCERN scores. Archived versions of 42 webpages were retrieved. Three ChatGPT models (GPT-5.2, GPT-4o, and o3) were evaluated using four prompting strategies: "Naïve" zero-shot, item-level "Split" scoring, "Augmented" prompting with DISCERN guidance, and a "Combined" split-plus-augmented approach. Agreement with human scores was assessed using correlations and absolute differences. Repeatability was examined using 10 repeated scoring runs across 9 webpages.
    Results: Agreement between ChatGPT-generated and human DISCERN scores was weak to moderate. All models demonstrated systematic score compression, overestimating low-quality webpages and underestimating high-quality webpages. Combined prompting modestly improved agreement and reduced absolute error, particularly for the o3 model, which consistently outperformed GPT-5.2 and GPT-4o. Substantial run-to-run variability was observed with a mean score range of 17.5 points and ranges up to 43 points for the same webpage. Averaging scores across runs did not improve agreement with human ratings. ChatGPT's DISCERN scoring reflects systematic attenuation consistent with prediction under noisy subjective measurement. Prompt engineering did not correct calibration bias or reproducibility limitations.
    Conclusion: Under the prompting strategies evaluated, ChatGPT models were insufficient for reliable automated DISCERN scoring. Persistent attenuation bias and poor repeatability significantly limit clinical or research applicability.
    DOI:  https://doi.org/10.1055/a-2853-8892
  8. Eur J Pediatr. 2026 Apr 24. pii: 297. [Epub ahead of print]185(5):
      This study aims to evaluate the reliability, quality, and readability of ChatGPT-4's responses to questions about cerebral palsy (CP) and rehabilitation strategies. The 56 most frequently asked questions by families about CP, its treatment, and rehabilitation strategies were divided into five categories (A1-5) and asked in ChatGPT-4. The reliability, quality, and readability of the responses were assessed by two researchers, respectively, using the modified DISCERN (mDISCERN) tool, the Global Quality Scale (GQS), and the Flesch Reading Ease Scale (FRE). Median (IQR) values for mDISCERN ranged from 3 (3-3) in A5 to 4 (3-4) in A1, while GQS values ranged from 3 (3-3.5) in A2 to 3.5 (3-4) in A1. The mean readability values assessed with FRE ranged from 31.47 ± 12.92 to 44.84 ± 17.39. No statistically significant differences were observed between the categories. The ICC values suggested very good agreement for both scales, with 0.851 for the mDISCERN total score and 0.824 for the GQS total score.
    CONCLUSION:  This study suggests that ChatGPT-4 provides moderate reliability and generally acceptable quality of responses regarding CP treatment and rehabilitation. FRE scores indicated that many responses were difficult for families to understand. While ChatGPT-4 may serve as a supportive source of general information, its outputs should be interpreted with caution, particularly in clinical contexts, and supervision and oversight by healthcare professionals remain essential for safe and effective use.
    WHAT'S KNOWN: • Children with cerebral palsy (CP) receive lifelong rehabilitation. Exercise plays a fundamental role in maintaining functional independence and quality of life, as well as physical development. The use of AI-based tools such as ChatGPT-4 for health information is increasingly widespread; however, uncertainties regarding the reliability, quality,and readability of these tools remain.
    WHAT IS NEW: • Our study shows that ChatGPT-4''s responses to questions about treatment and rehabilitation strategies for children with CP have moderate to good reliability and quality, but low readability.
    Keywords:  Artificial intelligence; Cerebral palsy; ChatGPT; Exercise; Rehabilitation
    DOI:  https://doi.org/10.1007/s00431-026-06979-3
  9. Hand Ther. 2026 Apr 21. 17589983261444990
       Objective: ChatGPT is a popular artificial intelligence (AI) tool used to answer questions on any subject. Given ChatGPT's popularity, it is prudent to investigate its ability to answer common patient questions in the field of hand therapy to better guide patients as they navigate the resources available to them.
    Methods: This is a cross-sectional, rater-based comparison study. Four common hand therapy questions were entered into ChatGPT version 3.5. The first five answer tabs that appeared with a Google search for the same four questions were downloaded. Three certified hand therapists blindly graded ChatGPT and Google's answers using Likert scales to assess for answer accuracy (0-6), comprehensiveness (0-3), and conciseness (0-3).
    Results: ChatGPT was significantly more accurate, with an estimated marginal mean (EMM) of 5.75 (95% CI: 4.96, 6.54) compared to Google's 3.48 (95% CI: 2.86, 4.10) (p < 0.001). ChatGPT was significantly more complete, with an EMM of 2.50 (95% CI: 2.10, 2.90) compared to Google's 1.48 (95% CI: 1.19, 1.77) (p < 0.001). ChatGPT was significantly more concise, with an EMM of 3.00 (95% CI: 2.66, 3.34) versus 1.60 (95% CI: 1.29, 1.91) for Google (p < 0.001).
    Conclusion: ChatGPT is a concise, comprehensive, and accurate alternative to a Google search for people seeking information on hand therapy. The free version of ChatGPT does not update its sourcing past 2019, and the software is known to occasionally present false information. Frequently updated academic websites should therefore remain the primary online medical resource for patients.
    Keywords:  hand; hand therapy; occupational therapy; orthopaedics; wrist
    DOI:  https://doi.org/10.1177/17589983261444990
  10. Oral Dis. 2026 Apr 20.
       OBJECTIVE: To evaluate the quality and readability of large language models (LLMs) when responding to Frequently Asked Questions (FAQs) about oral lichen planus (OLP).
    METHODS: We evaluated the responses of three LLMs (ChatGPT-4o, Gemini 2.0 Flash Experimental, and Copilot) to 13 patient-centered FAQs about OLP. Questions were identified using query tools, and answers were assessed by 14 oral medicine experts using the Quality Assessment of Medical Artificial Intelligence (QAMAI) tool. Readability was analyzed with the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKG) tools.
    RESULTS: All LLMs provided generally accurate and relevant responses, with median QAMAI scores indicating "good" to "very good" quality. ChatGPT achieved slightly higher completeness, particularly for questions on OLP definition and treatment. The reference provision was inconsistent across all chatbots. Readability analysis revealed that most responses required college-level literacy, with ChatGPT producing the most complex texts, Gemini occasionally achieving more accessible outputs, and Copilot situated in an intermediate position.
    CONCLUSIONS: LLMs may have potential as adjunctive tools for patient education in OLP, although they remain limited by incomplete information, inconsistent references, and suboptimal readability. Future research should incorporate longitudinal LLMs evaluations and training to develop models delivering accurate, accessible information, tailored to users' literacy levels.
    Keywords:  accuracy; large language models; oral lichen planus; oral potentially malignant disorders; patient education; readability
    DOI:  https://doi.org/10.1111/odi.70334
  11. J Clin Res Pediatr Endocrinol. 2026 Apr 21.
       Objective: To comparatively evaluate the reliability, quality, and readability of responses generated by widely used large language model (LLM)-based chatbots to congenital hypothyroidism (CH)-related patient questions.
    Methods: Forty CH frequently asked questions (FAQs), derived from clinician-reviewed patient education resources, were submitted under standardized conditions (December 2025) to ChatGPT-4, ChatGPT-5.2, Gemini, and Copilot. The modified DISCERN (mDISCERN) instrument was used to assess reliability, whereas the Global Quality Score (GQS) was used to evaluate quality. Readability was evaluated using Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG). Scores were compared using Friedman tests with Bonferroni-corrected post hoc analyses.
    Results: Median mDISCERN scores were 5.0 for ChatGPT-4, ChatGPT-5.2, and Gemini, and 4.0 for Copilot. Median GQS scores were 5.0 for ChatGPT-4, ChatGPT-5.2, and Gemini, and 4.0 for Copilot. Differences among models were significant for both mDISCERN and GQS (p<0.001), with ChatGPT-5.2 outperforming others in key pairwise comparisons. Readability differed significantly across all indices (all p<0.001). ChatGPT-5.2 demonstrated the highest FRE and lowest FKGL, whereas Gemini produced the most complex text. However, all models exceeded the recommended sixth-grade reading level.
    Conclusion: LLM-based chatbots generated generally moderate-to-high quality CH information, but readability remains suboptimal for patient education. ChatGPT-5.2 showed the best overall performance. LLM outputs may support patient information needs but should complement, not replace, clinician-provided counseling.
    Keywords:  Artificial intelligence; ChatGPT; Copilot; Google Gemini; Large language models; congenital hypothyroidism
    DOI:  https://doi.org/10.4274/jcrpe.galenos.2026.2026-1-15
  12. Chronic Dis Transl Med. 2026 Mar;12(1): 73-74
      Large Language Models (LLMs) frequently generate patient education materials that exceed recommended reading levels.Prompting LLMs to produce PEMs at a 5th-grade level consistently produced statistically lower readability scores than unprompted outputs.These findings suggest that simple prompt engineering can improve clarity and accessibility of LLM-generated PEMs.
    DOI:  https://doi.org/10.1002/cdt3.70031
  13. JMIR Form Res. 2026 Apr 24. 10 e90139
       Unlabelled: Retrieval-augmented generation improved overall quality scores for patient-facing gynecological cancer information mainly through better source attribution.
    Keywords:  GPT-4o; gynecological neoplasms; health literacy; large language models; patient education as topic; readability; retrieval-augmented generation
    DOI:  https://doi.org/10.2196/90139
  14. Acad Radiol. 2026 Apr 21. pii: S1076-6332(26)00259-X. [Epub ahead of print]
       RATIONALE AND OBJECTIVES: YouTube™ is increasingly used as an educational resource for complex diagnostic skills in dentistry, including cone-beam computed tomography (CBCT) interpretation. However, the absence of scientific regulation raises concerns regarding content reliability. This study aimed to evaluate the educational quality and reliability of YouTube™ videos related to CBCT interpretation.
    MATERIALS AND METHODS: A systematic search was conducted on YouTube™ in January 2026 using keywords related to CBCT interpretation. Videos appearing on the first three pages of results were screened according to predefined inclusion and exclusion criteria. Sixty-six videos were included. Educational quality and reliability were assessed using the JAMA benchmark criteria, DISCERN instrument, Global Quality Score (GQS), and Video Information and Quality Index (VIQI). Interobserver agreement was evaluated using weighted Kappa and intraclass correlation coefficients. Correlation and regression analyses were performed to identify predictors of quality.
    RESULTS: The mean GQS score was 4.35 ± 0.62, with 81.8% of videos classified as high quality (GQS ≥ 4). The mean DISCERN score was 68.42 ± 6.84, and 75.8% of videos were rated good or excellent. Videos uploaded by academic institutions demonstrated significantly higher quality scores than those from individual creators (p < 0.001). Engagement metrics, including views and likes, were not associated with educational quality. Regression analysis identified production quality, structural organization, and institutional origin as significant predictors of composite quality.
    CONCLUSION: YouTube™ contains educational content on CBCT interpretation with variable quality. Institutional origin and pedagogical structure are stronger indicators of reliability than popularity metrics. Careful source selection is essential when using YouTube™ as a supplementary educational resource in dental radiology.
    Keywords:  Cone-beam computed tomography; Radiology education; Video-based learning
    DOI:  https://doi.org/10.1016/j.acra.2026.03.053
  15. Aust Endod J. 2026 Apr 20.
      The aim was to assess the quality and accuracy of Vital Pulp Therapy videos on YouTube. First, sixty videos were surfed using five (pulp capping, pulpotomy, vital pulp therapy, and two layman terms) keywords. Quality was assessed using the following validated indices: Modified DISCERN, Global Quality Score (GQS), and Video Information and Quality index (VIQI). Accuracy was evaluated using scoring criteria based on European Society of Endodontology (ESE) and American Association of Endodontists (AAE) position statements. The correlation between the derived metrics was established using Spearman's rank order. The average quality scores were 3.03 (modified DISCERN), 3.1 (GQS), and 11.08 (VIQI). As per the ESE statement, the average accuracy was 44.64%, 25.87%, and 32.8%, respectively, for 'pulpotomy', 'pulp capping', and 'vital pulp therapy' videos. Mean accuracy as per the AAE statement was 24.63%. A positive correlation of accuracy was established with all the metrics derived. The overall quality and accuracy were subpar.
    Keywords:  Pulpotomy; YouTube; endodontics; pulp capping; vital pulp therapy
    DOI:  https://doi.org/10.1111/aej.70085
  16. Respir Med. 2026 Apr 17. pii: S0954-6111(26)00206-4. [Epub ahead of print]257 108838
       BACKGROUND: Chronic Obstructive Pulmonary Disease (COPD) requires both pharmacological and non-pharmacological treatments. Patients increasingly seek health-related information on platforms such as YouTube. However, the usefulness of YouTube videos on the treatment of COPD is unknown.
    OBJECTIVE: This study evaluated the usefulness of YouTube videos on the non-pharmacological treatment of COPD.
    METHODS: Search terms were "COPD", "Chronic Obstructive Pulmonary Disease" and "Chronic Obstructive Lung Disease", and videos with >30,000 views and non-pharmacological content were included. Two independent reviewers assessed quality using standardized tools: Modified DISCERN, Journal of American Medical Association benchmark criteria, and Global Quality Score.
    RESULTS: Among 97 videos, 86% were "useful", 13% "misleading", and 1% "neither". 62% of the misleading videos contained misinformation on the nutritional management of COPD. 85% of all misleading videos were uploaded by healthcare professionals. Misleading videos received significantly more engagement (total p-values <0.05). Scores on the Modified DISCERN (p = 0.015) and JAMA benchmark criteria (p = 0.021) were significantly higher for useful videos compared to misleading ones, while the Global Quality Score (p = 0.253) was less effective at distinguishing between the two categories.
    CONCLUSION: YouTube videos on non-pharmacological treatment of COPD are of high quality; however, nutrition-focused misinformation is widespread and tends to attract disproportionately high engagement.
    Keywords:  Chronic obstructive pulmonary disease; Misinformation; Patient education; Social media; YouTube
    DOI:  https://doi.org/10.1016/j.rmed.2026.108838
  17. Front Public Health. 2026 ;14 1802156
       Background: Functional Neurological Disorder (FND) is a common and often misunderstood condition characterized by neurological symptoms such as limb weakness, movement disorders, sensory disturbances, and non-epileptic seizures that are not explained by structural neurological disease. Patients increasingly seek information through digital platforms such as YouTube; however, the reliability and educational value of such content remain uncertain.
    Objective: This study aimed to systematically evaluate the quality, reliability, and educational value of English-language YouTube videos on FND using standardized assessment tools.
    Methods: A cross-sectional analysis was conducted on the 50 most viewed videos retrieved with relevant keywords. Video characteristics and engagement metrics were recorded. Quality was assessed using the Global Quality Scale (GQS), reliability with the modified DISCERN (mDISCERN), and health information standards with JAMA benchmark criteria. User interaction was measured via the Video Power Index (VPI). Statistical analyses included Spearman correlation, Kruskal-Wallis and Mann-Whitney U tests, with effect sizes reported. Inter-rater reliability was evaluated using ICC and weighted Cohen's kappa.
    Results: The mean GQS, mDISCERN, and JAMA scores were 3.27, 3.23, and 2.38, respectively, indicating moderate overall quality but suboptimal adherence to health information standards. Producer type did not significantly affect quality scores (p > 0.05), though VPI differed across groups (p = 0.022), with health information channels showing higher engagement. VPI showed strong correlations with both view count and like count. Engagement metrics demonstrated limited association with information quality indicators. Inter-rater reliability was excellent across all instruments (ICC range: 0.882-0.944).
    Conclusion: YouTube hosts a substantial amount of FND-related content; however, overall quality and reliability are inconsistent. Engagement metrics do not reliably reflect informational accuracy. Given the stigma and complexity of FND, reliance on unregulated online content may hinder patient understanding and management. Greater involvement of clinicians and professional organizations in producing evidence-based, patient-centered digital resources is warranted to improve health literacy and outcomes.
    Keywords:  YouTube; digital health literacy; functional neurological disorder; health information quality; patient education
    DOI:  https://doi.org/10.3389/fpubh.2026.1802156
  18. J Robot Surg. 2026 Apr 21. pii: 447. [Epub ahead of print]20(1):
      
    Keywords:  DISCERN; Information quality; Robot-assisted total knee arthroplasty; Robotic surgery; Short-video platforms
    DOI:  https://doi.org/10.1007/s11701-026-03411-8
  19. Digit Health. 2026 Jan-Dec;12:12 20552076261444451
       Background: Plantar fasciitis is a common condition that impacts patients' motor function and quality of life. As short video platforms such as TikTok and Bilibili become increasingly popular for information seeking, patients are turning to them for health guidance, yet the quality of this content varies significantly. This cross-sectional study was designed to systematically evaluate the quality, reliability, and content completeness of plantar fasciitis videos on TikTok (Chinese TikTok, Douyin) and Bilibili.
    Methods: A total of 158 videos were collected and assessed using the global quality score (GQS), modified DISCERN (mDISCERN), and JAMA benchmarks, while uploader identity and user interaction data were also analyzed.
    Results: Compared with Bilibili videos, higher GQS scores were observed for TikTok videos (p = 0.003), whereas no significant between-platform differences were observed for mDISCERN (p = 0.496) or JAMA (p = 0.103). User engagement was also higher on TikTok. Professionally uploaded content, particularly from medical personnel, significantly outperformed videos from nonprofessional sources in terms of quality and reliability (p < 0.001). In terms of content, a significant gap was identified: 91.8% (n=145) of the videos addressed treatment, whereas only 15.8% (n=25) mentioned prevention. Crucially, correlation analysis revealed no significant associations between user engagement metrics (e.g., likes, shares) and GQS, mDISCERN, or content completeness scores.
    Conclusion: These findings reveal a dual role for short video platforms in plantar fasciitis information dissemination: they not only enhance public access but also risk spreading low-quality content due to inadequate oversight. Enhanced credential verification for health creators, greater involvement of medical institutions in content creation, and improved public education to prioritize verified sources are therefore warranted.
    Keywords:  Bilibili; TikTok; V; information quality; plantar fasciitis; social media
    DOI:  https://doi.org/10.1177/20552076261444451
  20. J Cardiothorac Surg. 2026 Apr 22.
       BACKGROUND: The proliferation of short-form educational video platforms has facilitated the public's access to health information; however, no research has assessed the characteristics and quality of videos related to atrial septal defect. ASD was selected because it is one of the most common congenital heart defects encountered across the lifespan, often requiring repeated explanations regarding diagnosis, follow-up, timing of intervention, and long-term prognosis for patients and families. In addition, although short-form video studies have been conducted for several other diseases, ASD-related content on major Chinese short-video platforms has not been systematically evaluated. This study aimed to evaluate the quality and reliability of short videos related to atrial septal defect on TikTok and Bilibili.
    METHODS: The Chinese term "atrial septal defect" was used to search for related videos on TikTok and Bilibili, and a predefined sampling strategy was used to screen the first 100 algorithm-ranked videos from each platform on October 21, 2025. This sample size was determined a priori to provide a feasible and standardized cross-sectional sample for manual content evaluation and to maintain comparability between platforms, rather than being based on a formal sample size calculation. Duplicate, irrelevant videos, and videos published within seven days were excluded to reduce the instability of early engagement indicators. As a result, 155 videos were included for analysis. The overall quality of these videos was assessed using the Global Quality Score (GQS), VIQI, PEMAT, JAMA Benchmark, and the DISCERN tool. These instruments were used to evaluate educational quality, reliability, transparency, understandability, and actionability, but they did not constitute a direct assessment of factual accuracy or potentially harmful medical content. Interobserver reliability was assessed for the independently retained duplicate GQS ratings using quadratic weighted kappa. Because social media search results are algorithm-ranked and may be affected by platform personalization, the included sample should be interpreted as a snapshot of highly visible ASD-related videos on the sampling day rather than an exhaustive representation of all available videos.
    RESULTS: Among the 155 videos (Bilibili: n = 70; TikTok: n = 85), significant differences were observed across all engagement and quality metrics. TikTok videos demonstrated significantly higher values for likes (median 415.0 [IQR 186.0 to 961.0] vs. 11 [IQR 5.0 to 52.75]), collections (131.0 [IQR 50.0 to 332.0] vs. 26.5 [IQR 8.0 to 106.75]), shares (124.0 [IQR 35.0 to 583.0] vs. 7.5 [IQR 2.0 to 29.0]), and comments (61.0 [IQR 17.0 to 492.0] vs. 1.0 [IQR 0.0 to 9.5]) (all P < 0.05). In contrast, the Bilibili group had longer video durations (in seconds) (639.0 [IQR 327.0 to 1,106.0] vs. 326.0 [IQR 216.9 to 429.0]; P < 0.001) and longer times since upload (in days) (356.0 [IQR 71.25 to 770.25] vs. 60.0 [IQR 37.0 to 102.0]; P < 0.001). Content quality assessments also differed, with TikTok videos having higher median DISCERN scores (23.788 vs. 22.786; P = 0.001), JAMA_Benchmark (1.918 vs. 1.714; P = 0.001) and GQS scores (3.635 vs. 3.314]; P = 0.006). However, the proportion of professional uploaders differed markedly between platforms (TikTok: 96.47% vs. Bilibili: 44.29%), and uploader-level analyses suggested that part of the observed quality advantage on TikTok may be attributable to uploader composition rather than platform characteristics alone. Importantly, this conclusion was based on the absolute position of the observed scores on their respective validated scales rather than on an arbitrary composite cutoff. Specifically, median GQS values of 3.314 and 3.635 on a 1-5 scale indicate moderate rather than high educational quality, whereas median JAMA Benchmark values of 1.714 and 1.918 on a 0-4 scale indicate that, on average, fewer than half of the transparency/reliability criteria were met. Because no universally accepted single "target quality score" exists across GQS, JAMA, DISCERN, PEMAT, and VIQI, each instrument was interpreted according to its own published scale direction and anchors.
    CONCLUSIONS: TikTok videos related to atrial septal defect are more engaging and of higher content quality than those on Bilibili. Overall, health-related videos on both platforms showed only moderate educational quality, with notable limitations in transparency, reliability, source attribution, and actionable guidance. Professionally produced content tended to perform better, although between-platform differences should be interpreted cautiously because of differences in uploader composition. These findings suggest that greater participation by health professionals may help improve the quality and reliability of online health information. As this study assessed informational quality rather than factual accuracy and was limited to Chinese-language videos from two platforms at a single time point, the results should be interpreted as a platform- and time-specific snapshot.
    Keywords:  Atrial Septal Defect; Bilibili; Health Information; Social Media; TikTok; Video Quality
    DOI:  https://doi.org/10.1186/s13019-026-04161-2
  21. Indian J Orthop. 2026 Apr;60(4): 1023-1030
       Background: With easy access to the Internet, the majority of patients today often resort to online material for education. The aim of our study is to assess the quality and readability of online information relating to tibial shaft fractures (TSFs).
    Methods: A search of the top 50 results via three search engines was performed. The readability of these webpages was assessed using www.readable.com to generate scores for the Gunning Fog Index (GFI), Flesch Reading Ease (FRE) and Flesch-Kincaid Grade (FKG). The quality was also assessed using the DISCERN tool and the Journal of American Medical Association (JAMA) benchmark criteria.
    Results: A total of 90 unique websites were noted. The mean FKG was 10.67 ± 2.44, with 70 webpages scored at a reading level which was too high. The mean GFI was 12.2 ± 3.6, with only six webpages pitched at a sixth-grade reading level. The mean FRE score was 38.13 ± 13.21, with no websites reported to be below a seventh-grade reading level. The mean DISCERN score was 49.84 ± 12.42, with an assigned score of "fair". The majority of webpages also failed to meet any of the JAMA criteria (n = 31, 34.44%).
    Conclusion: We have highlighted that online information relating to TSFs is at a reading level which is too high for the average patient and is of low quality. As patient outcomes have shown to improve through appropriate online education, an effort should be made to resolve these inadequacies.
    Graphical Abstract:
    Keywords:  Health literacy; Orthopaedic surgery; Patient education; Quality analysis; Reading level; Tibial shaft fractures
    DOI:  https://doi.org/10.1007/s43465-025-01640-x
  22. Digit Health. 2026 Jan-Dec;12:12 20552076261443756
       Objective: This study evaluated the quality and reliability of unassessed DDH health information for parents on prominent short video platforms like Bilibili and TikTok, where accurate data is crucial for early intervention.
    Methods: We conducted a cross-sectional study, analyzing 125 eligible DDH-related short videos from Bilibili and TikTok. Video characteristics, author types, and interaction metrics were extracted. Content quality was assessed by medical professionals using the modified DISCERN (mDISCERN) and Global Quality Score (GQS) tools. Ordinal logistic regression identified factors influencing quality.
    Results: Significant inter-platform differences were found in video characteristics and engagement. While both platforms showed generally high quality, Bilibili slightly outperformed TikTok. Author professionalism (GQS OR=4.025; mDISCERN OR=5.585) and video duration (GQS OR=1.010; mDISCERN OR=1.010) were significant positive predictors of quality. Critically, a negative correlation existed between higher interaction metrics (likes, comments, shares, views) and perceived video quality.
    Conclusion: While short video platforms significantly impact DDH health information, quality varies, with author professionalism and video duration predicting higher standards, yet popular videos often lack scientific rigor, highlighting an urgent need for enhanced quality control and expert-driven content.
    Keywords:  Bilibili; DDH; TikTok; cross-sectional study; health information quality; short video platforms
    DOI:  https://doi.org/10.1177/20552076261443756
  23. ANZ J Surg. 2026 Apr 24.
       BACKGROUND: Hernia imposes a growing global and national disease burden, yet public understanding remains limited. Short-video platforms such as TikTok and Bilibili have become major sources of health information in China, but the quality and reliability of hernia-related content have not been systematically evaluated.
    METHODS: Hernia-related videos were retrieved from TikTok and Bilibili between November 15 and 20, 2025. After excluding irrelevant, duplicate, and promotional content, 184 videos were included. Video characteristics, uploader categories, and engagement metrics were collected. Content themes were categorized, and quality was assessed using the Global Quality Score (GQS), modified DISCERN, Patient Education Materials Assessment Tool (PEMAT-U/A), and Journal of the American Medical Association (JAMA) benchmark criteria. Group comparisons were performed using the Mann-Whitney U test, and associations were examined via Spearman correlation.
    RESULTS: TikTok videos showed significantly higher engagement (likes and comments, both p < 0.01) but were shorter in duration compared with Bilibili. Professional physicians produced 98% of TikTok videos, whereas non-professionals contributed 56% of Bilibili content. TikTok demonstrated higher scores in mDISCERN, PEMAT-A, and JAMA benchmarks (all p < 0.05), though overall GQS scores remained low on both platforms. Longer videos were modestly associated with higher GQS (r = 0.35) and PEMAT-U (r = 0.34). Engagement indicators did not correlate with quality metrics.
    CONCLUSION: Hernia-related content on TikTok and Bilibili exhibits a clear trade-off: TikTok achieves greater reach but offers limited educational depth, while Bilibili supports more comprehensive content with lower engagement.
    Keywords:  Bilibili; Tiktok; content; hernia; quality
    DOI:  https://doi.org/10.1111/ans.70702
  24. Arthroplast Today. 2026 Jun;39 102007
       Background: With the rise of internet access and social media use, finding medical information has become easier than ever. Increasingly, medical professionals (MPs) are using video-sharing platforms such as TikTok to disseminate educational material. This study aims to assess the quality of educational content related to total knee arthroplasty (TKA) treatment on TikTok.
    Methods: On July 31, 2025, using the search terms "total knee arthroplasty," "knee replacement surgery," and "TKA," the first 150 videos produced by TikTok were compiled. After applying exclusion criteria, 59 videos were reviewed using the following assessments: Global Quality Score (GQS), Journal of the American Medical Association benchmark, Health on the Net Code, The Patient Education Materials Assessment Tool for Audiovisual Materials, and a modified DISCERN.
    Results: The 59 videos had a cumulative total of 2,738,858 views. Videos created by MPs had higher average scores in GQS (2.29 vs 1.31; P = .00047), Journal of the American Medical Association (1.76 vs 1.31; P = .00056), Health on the Net Code (3.95 vs 2.40; P = <0.000001), The Patient Education Materials Assessment Tool for Audiovisual Materials Understandability (8.29 vs 7.20; P = .0033), and modified DISCERN (2.3 vs 1.1; P = <0.0008) when compared to videos created by non-MPs.
    Conclusions: TikTok demonstrates potential as a platform for disseminating educational content related to TKA. Although videos created by MPs scored higher across all evaluation metrics compared with those produced by non-MPs, the overall educational quality of available content remains limited. Improving video quality requires citing credible sources, providing supplementary learning materials, and disclosing potential conflicts of interest.
    Keywords:  Patient education; Social media; TKA; TikTok; Total knee arthroplasty
    DOI:  https://doi.org/10.1016/j.artd.2026.102007
  25. Health Informatics J. 2026 Apr-Jun;32(2):32(2): 14604582261445744
      ObjectiveTo conduct a multidimensional evaluation of endometrial cancer (EC)-related videos on major Chinese short-video platforms.MethodsA cross-sectional study conducted on May 8, 2025 analyzed 226 eligible EC-related videos from TikTok, Rednote, and Bilibili. Video quality was assessed using Global Quality Scale (GQS) and Video Information and Quality Index (VIQI), reliability using modified DISCERN (mDISCERN), and understandability and actionability using Patient Education Materials Assessment Tool (PEMAT).ResultsAmong 226 videos, TikTok emphasized symptoms/risk factors and scored highest in engagement, reliability (mDISCERN=2.0, P=0.002), and understandability (88%). Bilibili led in VIQI (median=17.0) but had the lowest understandability (67%). Professional videos outperformed patient-generated content (all P<0.05). Video length correlated positively with quality but negatively with engagement and understandability (all P<0.01).ConclusionsEC-related videos vary widely in quality and lack consistent actionability. Strategic content design and platform-level verification are needed to improve reliability and public health impact.
    Keywords:  content reliability; endometrial carcinoma; information quality; public health policy; short-video platforms
    DOI:  https://doi.org/10.1177/14604582261445744
  26. Digit Health. 2026 Jan-Dec;12:12 20552076261444155
       Background: Ovarian cancer is one of the most common and lethal gynecological malignancies, with high mortality rates due to late-stage diagnoses and frequent recurrence. Despite its significant global health impact, public awareness of ovarian cancer remains low, contributing to delayed diagnosis and treatment. The rapid rise of short-video platforms, such as TikTok, Rednote, and WeChat, presents an opportunity to enhance public knowledge and early detection of ovarian cancer.
    Method: This study analyzed ovarian cancer-related videos from TikTok, Rednote, and WeChat using two validated quality assessment tools: the Global Quality Scale (GQS) and the modified DISCERN (mDISCERN) tool. A total of 220 videos were examined for content quality, thematic coverage, and engagement metrics. The videos were categorized into themes such as epidemiology, etiology, symptoms, diagnosis, treatment, complications, and maintenance therapy. Additionally, user comments were analyzed to assess public sentiment toward the videos. The study also examined the correlation between video quality and audience engagement.
    Results: Videos on TikTok demonstrated the highest quality in terms of accuracy and engagement, followed by Rednote, with WeChat videos showing the lowest quality. A significant gap was observed in the coverage of critical topics such as complications and maintenance therapy, with many videos providing only partial information. Specialist-created videos scored higher in quality compared to those created by non-specialists and individual users. Moreover, higher-quality videos were associated with greater audience engagement, including more likes, shares, comments, and collections. Positive sentiment in user comments was most strongly correlated with videos focusing on treatment and maintenance therapy.
    Conclusion: This study reveals significant quality variation in ovarian cancer videos across Chinese platforms. To enhance public health communication, content accuracy must improve, with a focus on professional creators and key topics like complications and maintenance therapy.
    Keywords:  Rednote; TikTok; WeChat; information quality; ovarian cancer; short videos platforms
    DOI:  https://doi.org/10.1177/20552076261444155
  27. Australas J Dermatol. 2026 Apr 21.
       BACKGROUND: Short-form video platforms such as TikTok and YouTube Shorts have become influential sources of health information. Among dermatology topics, topical steroid withdrawal (TSW) and corticosteroid phobia ('corticophobia') are frequently discussed.
    OBJECTIVES: To examine quantitative engagement patterns and qualitatively characterise representations of TSW and corticosteroid phobia on TikTok and YouTube Shorts.
    METHODS: A qualitative reflexive thematic analysis was conducted on publicly available TikTok and YouTube Shorts videos identified using predefined hashtags. Sampling was restricted to a consecutive two-day period to minimise algorithmic drift. Seventy-six publicly-accessible videos met inclusion criteria. Videos were coded inductively using NVivo, and thematic saturation was assessed through analytic redundancy and code recurrence.
    RESULTS: TikTok videos were shorter and more interactive than YouTube Shorts. Sentiment was negative or neutral in most (78.9%, 60/76) videos. Five themes emerged: (1) Visually-dramatic embodiment of TSW; (2) Protest and mistrust; (3) Alternative healing; (4) Identity and hope; (5) Platform-shaped performance.
    CONCLUSIONS: Short-form platforms amplify emotionally intense and distrust-oriented narratives about TSW. Future research should focus on effective strategies to address misinformation and rebuild trust.
    DOI:  https://doi.org/10.1111/ajd.70129