bims-librar Biomed News
on Biomedical librarianship
Issue of 2026–05–03
39 papers selected by
Thomas Krichel, Open Library Society



  1. Sci Data. 2026 Apr 29. pii: 677. [Epub ahead of print]13(1):
      As Open Access continues to gain importance in science policy, understanding the proportion of Open Access publications relative to the total research output of research-performing organizations, individual countries, or even globally has become increasingly relevant. In response, dashboards are being developed to capture and communicate progress in this area. To provide an overview of these dashboards and their characteristics, an extensive survey was conducted, resulting in the identification of over 60 dashboards. To support a detailed and structured description, a dedicated metadata schema was developed, and the identified dashboards were systematically indexed accordingly. We provide an openly reusable dataset and an interoperable metadata schema that enable comparative and longitudinal analyses of Open Access dashboards across regions and operator types, and we invite the community to reuse, extend, and refine them. The dataset is particularly relevant for researchers in Library and Information Science and Science and Technology Studies, supporting both empirical analyses of Open Access and the methodological refinement of indicators and policy instruments in the context of Open Science.
    DOI:  https://doi.org/10.1038/s41597-026-07217-z
  2. J Health Commun. 2026 Apr 27. 1-5
      The internet has become a primary source of health information, with over 50% of European Union citizens and 68-80% of U.S. adults searching for health information online. However, the quality of online health information varies greatly, and inaccurate, outdated, or misleading information is widespread. While existing guidelines like the International Patient Decision Aid Standards (IPDAS) and the German Good Practice Guidelines for Health Information (GPGI) aim to improve the quality of patient decision aids and health information, current efforts are insufficient to address the scale of low-quality information. Recent approaches include consumer education and resilience-building strategies, but these remain difficult to implement widely. This paper proposes criteria for evaluating the credibility of health information providers, focusing on structural and procedural quality, rather than comprehensive content assessment. These criteria, inspired by frameworks from IPDAS, GPGI, and the National Academy of Medicine (NAM), emphasize transparency, evidence-based methodology, and the accountability of information sources. Implementing a certification and accreditation system based on these criteria could incentivize providers to adopt high standards, improve online health information quality, and ensure trustworthy content is prioritized by search engines, AI, and social media platforms.
    Keywords:  Online health information; certification; evidence-based medicine; health communication; quality standards
    DOI:  https://doi.org/10.1080/10810730.2026.2663933
  3. J Lipid Res. 2026 Apr 27. pii: S0022-2275(26)00075-1. [Epub ahead of print] 101049
      There are numerous public resources and guidelines available for lipidomics research, including standard nomenclatures, classification systems, and lipid databases. However, these resources are not always aligned with one another, making it difficult to find and compare information on the same lipid across different databases. To tackle these challenges we present LipidLibrarian, a lipid search engine that enables a combined search of all major lipid databases by aggregating the available information and presenting it in a unified manner. The three main sources of information that build the foundation of LipidLibrarian as a comprehensive search-engine are SwissLipids, LIPID MAPS and ALEX123. Furthermore, various secondary resources such as LION/web, LINEX, LipidLynxX, and Goslin were incorporated to enhance the results and conduct name and hierarchy conversions. LipidLibrarian is accessible via a user-friendly website, allowing the user to query lipids using their trivial names, shorthand notations, database identifiers, or their masses. Alternatively, LipidLibrarian can be accessed as a Python package for integration into high-throughput lipidomics pipelines. The output of a LipidLibrarian query is split into multiple categories, such as nomenclature, database identifiers, masses, adducts, fragments, ontology terms, and reactions. For each of these categories, LipidLibrarian aggregates the results from all databases and provides the source from which each value originates. This enables the user to quickly assess if the databases contain differing or conflicting information. In summary, LipidLibrarian provides an effortless, comprehensive and automated search for lipid information, thereby accelerating the research workflow and making it a meaningful tool for the scientific community; lipidlibrarian.ciobio.io.
    Keywords:  Biochemistry; Bioinformatics; Database; Integration; Lipid; Lipidomics; Mass Spectrometry; Metabolism; Multi-omics; Visualization
    DOI:  https://doi.org/10.1016/j.jlr.2026.101049
  4. Trials. 2026 Apr 25.
      We have previously described a free, public web-based tool, Trials to Publications, https://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/TrialPubLinking/trial_pub_link_start.cgi, which employs a machine-learning model based on title, abstract, and other metadata features to predict which publications are likely to present clinical outcome results from a given registered trial in ClinicalTrials.gov. We have now updated and expanded the scope of the tool, by extracting mentions of ClinicalTrials.gov registry numbers (NCT numbers) from the full-text of 3 online biomedical article collections (open access PubMed Central (PMC), EuroPMC, and OpenAlex), as well as retrieving biomedical publications that are mentioned within the ClinicalTrials.gov registry itself. These mentions greatly increase the number of linked publications identified by the tool and should assist those carrying out evidence syntheses as well as those studying the metascience of clinical trials.
    Keywords:  Bibliographic databases; Clinical trials; Information retrieval; Linking trials to publications; Systematic reviews
    DOI:  https://doi.org/10.1186/s13063-026-09747-8
  5. Nat Commun. 2026 04 29. pii: 3621. [Epub ahead of print]17(1):
      State-supported research funding agencies are critical to the scientific enterprise. However, it remains unclear how funding agencies cooperate with academic communities to realize common scientific goals. Here, we present a fully digital archive assembled by the National Human Genome Research Institute (NHGRI), focusing on the nascent stages of "genomics" as a scientific field and the everyday workings of the Human Genome Project and subsequent major genomics projects. We identify early events behind the conception of genome-wide association studies, clarify hitherto obscured factors around funding decisions, and how NHGRI and academics outside NHGRI ensured continuity in technical expertise across projects. The computational models we developed correctly recapitulate how academic experts and NHGRI increased adoption of genomics by jointly deciding which organisms' genomes to sequence. Taken together, these findings reveal how a funding agency contributed to scientific innovation in a nascent field of science by repeatedly cooperating with the broader scientific community.
    DOI:  https://doi.org/10.1038/s41467-026-71700-9
  6. J Pediatr Adolesc Gynecol. 2026 Apr 27. pii: S1083-3188(26)00363-3. [Epub ahead of print]
       STUDY OBJECTIVE: Adolescents often look online for answers to sensitive questions about puberty and sexual health. As chat-based AI tools become more accessible, they may influence how young people interpret symptoms and decide whether to consult a doctor. We examined whether GPT-4o responses to common puberty-related questions from girls aged 8-17 years were judged by clinicians to be acceptable and appropriate for the user's age.
    METHODS: Ten common puberty-related queries from a Polish search context were identified using Google Autocomplete in May 2024 and converted into standardized first-person prompts beginning with "I am X years old and…". GPT-4o generated one response to each prompt in separate new chat sessions without manual editing. Eighteen clinicians (11 pediatric and adolescent gynecologists and 7 pediatricians) rated each response for content quality, adequacy of recommendations, empathy, and age appropriateness on 5-point scales. The main outcome was the proportion of ratings considered acceptable (scores of 4 or 5).
    RESULTS: Overall, 79.2% of ratings (570/720; 95% CI 76.0-82.1) were in the acceptable range, exceeding the predefined threshold. Across domains, acceptability ranged from 72.2% to 84.4%. Internal consistency was high for empathy and age appropriateness. Agreement between individual raters was low but improved to a moderate level when scores were averaged. Greater concern about adolescents placing too much trust in AI was strongly associated with the expectation that its use could reduce contact with physicians (Spearman's ρ = 0.86; p < 0.001; n = 18).
    CONCLUSIONS: Most GPT-4o responses to common puberty-related questions were judged acceptable by clinicians. At the same time, concerns persisted that young users might rely too heavily on AI and delay seeking medical advice. Any use of such tools by adolescents should therefore include clear advice on when in-person medical assessment is needed.
    Keywords:  Adolescent; Artificial Intelligence; Empathy; GPT-4o; Health Information Seeking Behavior; Large Language Models; Pediatric Gynecology; Puberty
    DOI:  https://doi.org/10.1016/j.jpag.2026.04.001
  7. Front Digit Health. 2026 ;8 1768843
       Background: Diabetes mellitus is a chronic metabolic disease with rising global prevalence. Adequate patient education is essential to encourage self-management and reduce complications. Artificial intelligence applications such as ChatGPT have emerged as potential supplementary resources for patient education alongside the broader integration of technology in healthcare.
    Methods: A cross-sectional evaluation was conducted using ten frequently asked questions (FAQs) on diabetes, selected from the Diabetic Association of India and the International Diabetes Federation. ChatGPT-4o (accessed via the web interface in March 2025) generated responses to each question in separate, stand-alone chat sessions to simulate typical patient interactions. Five board-certified endocrinologists (diabetologists) with a mean clinical experience of ≥10 years independently evaluated the responses using a 4-point Likert scale across five domains: overall quality, content accuracy, clarity, relevance, and trustworthiness. Final domain scores were computed as the mean of all five raters' scores. Readability was assessed using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). All readability analyses apply exclusively to the English-language outputs generated in this study.
    Results: The mean FRES was 38.19 and the mean FKGL was 16.87, indicating a reading level appropriate for college-educated individuals and substantially above the recommended sixth-grade benchmark for patient health materials. Mean response length was 300 ± 100 words across the ten prompts. Expert ratings were generally high: aggregated mean scores (±SD) were 4.0 (±0.0) for content accuracy and overall quality, 3.98 (±0.10) for relevance, and 3.9 (±0.20) for clarity and trustworthiness. No clinically inaccurate statements were identified by the raters; however, the high scores and narrow score range indicate a potential ceiling effect that limits discrimination between responses. Raters expressed concern about linguistic complexity, which may impede comprehension among patients with limited health literacy.
    Conclusions: ChatGPT-4o generated generally accurate and relevant diabetes education content, suggesting potential as a supplementary tool in diabetes care. However, the high reading-level complexity, small evaluation scope (ten prompts, one model, one session), and English-only assessment limit the generalisability of these findings. AI-generated content should supplement, not replace, clinician-led education. Future work should address language simplification, multilingual evaluation, and longitudinal assessment of patient outcomes.
    Keywords:  AI in healthcare; ChatGPT; artificial intelligence; blood glucose control; diabetes mellitus; patient education
    DOI:  https://doi.org/10.3389/fdgth.2026.1768843
  8. J Clin Sleep Med. 2026 Apr 28. pii: 69. [Epub ahead of print]22(1):
       PURPOSE: Patients with obstructive sleep apnea (OSA) frequently seek information online, yet the comparative quality of content delivered by web search engines versus generative AI systems is unclear. This study evaluated how different digital information sources perform in answering common patient questions about OSA.
    METHODS: Thirty high-volume, patient-facing OSA questions were identified using Google Trends. Each question was submitted verbatim to four general-purpose large language models (GPT-4, GPT-5, DeepSeek, Mistral), a medically specialized retrieval-augmented model (OpenEvidence), and Google Search. Seven otolaryngologists with clinical experience in OSA independently rated each response for accuracy, clarity, completeness, relevance, and usefulness using a five-point rubric. Composite and domain scores were analyzed using one-way analysis of variance with multiple-comparison correction; inter-rater reliability was assessed with two-way random-effects intraclass correlation coefficients.
    RESULTS: A total of 180 question-system pairs received 6295 domain-level ratings. OpenEvidence achieved the highest mean composite score (4.33), followed by a tightly clustered group of LLMs (means 4.00-4.04). Google Search scored significantly lower (3.15). Differences among systems were statistically significant across all domains (p < 0.001), with large effect sizes for comparisons of OpenEvidence and general LLMs versus Google. Composite average-rater reliability was good (ICC = 0.70).
    CONCLUSION: For common OSA questions, generative AI systems-particularly a retrieval-augmented medical model-produced higher-quality patient-facing information than standard web search. These findings support cautious consideration of GenAI tools to supplement patient education in OSA, while underscoring the need for ongoing evaluation across diseases, disciplines, and patient populations.
    CURRENT KNOWLEDGE/STUDY RATIONALE: Patients with obstructive sleep apnea (OSA) frequently rely on online sources such as Google Search to understand symptoms, testing, and treatment, yet the quality of patient-facing information varies widely. As generative artificial intelligence tools are increasingly used for health questions, their comparative performance for OSA education has not been systematically evaluated using blinded expert review.
    STUDY IMPACT: In this blinded comparative study, generative AI systems, particularly a retrieval-augmented medical model, provided more accurate, clear, complete, and useful answers to common OSA questions than standard web search. These findings highlight that the choice of digital information source can meaningfully influence the quality of patient education in sleep medicine and support further evaluation of AI tools within clinical practice.
    Keywords:  Generative artificial intelligence; Health information quality; Large language models; Obstructive sleep apnea; Patient education; Web search engines
    DOI:  https://doi.org/10.1007/s44470-026-00090-y
  9. J Esthet Restor Dent. 2026 Apr 26.
       OBJECTIVE: This study compared the accuracy and temporal consistency of ChatGPT and Gemini in responding to dental bleaching questions across three weekly sessions.
    MATERIALS AND METHODS: A total of 280 true/false questions were developed comprising 200 textbook-based and 80 patient-oriented frequently asked questions. Both chatbots were queried weekly under controlled conditions. Accuracy was compared using generalized estimating equations, consistency was assessed using Fleiss' kappa, and weekly stability was evaluated using Cochran's Q test. Open-ended responses were scored for quality and misinformation by two evaluators.
    RESULTS: For textbook questions, ChatGPT achieved significantly higher accuracy than Gemini (77.7% versus 70.5%, p = 0.0009). For frequently asked questions, both chatbots performed comparably (92.9% versus 90.8%, p = 0.252). Temporal consistency was only fair for textbook questions but almost perfect for frequently asked questions in both chatbots. Both chatbots showed significant upward trends in textbook accuracy across sessions. Gemini received higher global quality scores for open-ended responses, while misinformation rates were similarly low.
    CONCLUSIONS: Within the limitations of this study, ChatGPT achieved significantly higher accuracy than Gemini for textbook-based dental bleaching questions, while both chatbots performed comparably for patient-oriented questions. Temporal consistency differed markedly, with almost perfect consistency for patient-oriented questions and only fair consistency for textbook-based questions.
    CLINICAL SIGNIFICANCE: Chatbot responses to common patient questions about dental bleaching are generally accurate and consistent, but their reliability drops substantially for specialized academic content, suggesting these tools should complement rather than replace professional clinical judgment.
    Keywords:  artificial intelligence; chatbot accuracy; dental bleaching; patient education; temporal consistency
    DOI:  https://doi.org/10.1111/jerd.70172
  10. Dent J (Basel). 2026 Apr 08. pii: 219. [Epub ahead of print]14(4):
      Objective: The application of artificial intelligence (AI) in orthodontics has evolved rapidly in recent years, encompassing areas such as diagnosis, treatment planning, and patient management, and AlimGPT is an AI-based tool that provides treatment options based on data and algorithms. Methods: Fourteen different orthodontic questions were asked to each model, and answers were analyzed. This study aimed to compare AlimGPT with GPT-4o, Gemini, and Llama using standardized tests to evaluate the quality of information provided, including the Likert scale, modified DISCERN (mDISCERN), and modified Global Quality Score (mGQS). Results: Significant differences were detected for reliability (χ2 = 15.267, p = 0.0016) and usefulness (χ2 = 20.557, p = 0.0001). Post hoc tests showed AlimGPT > Gemini and Llama for reliability and AlimGPT > GPT-4o, Gemini, and Llama for usefulness. mDISCERN was significant overall (χ2 = 11.047, p = 0.0115), but no pairwise contrast met adjusted significance; mGQS showed no significant differences (χ2 = 7.071, p = 0.0697). Inter-rater agreement was moderate-to-good for reliability (ICC = 0.710, 95% CI 0.60-0.80) and usefulness (ICC = 0.729, 95% CI 0.63-0.82), moderate for mGQS (ICC = 0.596, 95% CI 0.47-0.71), and poor-to-moderate for mDISCERN (ICC = 0.435, 95% CI 0.30-0.58). Conclusions: In this blinded, within-subjects experiment, the domain-specific model (AlimGPT) received higher clinician ratings for usefulness and, for reliability, exceeded two general baselines. Differences in mGQS were not detected. Expanding the number of raters, increasing item diversity or integrating updated baselines would be beneficial.
    Keywords:  artificial intelligence; language model; orthodontic planning; reliability
    DOI:  https://doi.org/10.3390/dj14040219
  11. JMIR Med Inform. 2026 Apr 27. 14 e79416
       Background: MedlinePlus, developed by the National Library of Medicine (NLM) in the United States, is one of the most widely used, authoritative, consumer-grade health information resources on the web. Although extensively used and discussed in scholarly work for health literacy and patient education, it is unclear how MedlinePlus has been integrated into clinical care or embedded within health informatics applications.
    Objective: This study aimed to understand how MedlinePlus has supported patients and caregivers by increasing access to health information for clinical care and illness management. The insights on this topic will inform the design and development of patient-facing digital health intervention tools for improved health communication, decision engagement, informed decision-making, and health outcomes.
    Methods: We conducted a systematic literature review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. First, we developed a comprehensive literature search strategy, searched 9 citation databases, and aggregated and deduplicated search results before importing them into Covidence for manual screening using predefined inclusion and exclusion criteria. Second, reviewers independently assessed all studies at the title-abstract and full-text levels, resolving discrepancies through ongoing discussions. Third, we applied the PICO (problem/population, intervention, comparison, and outcome) and the Collaborative Chronic Care Model as guiding frameworks for data extraction and analysis. All included studies underwent quality assessment using the Mixed Methods Appraisal Tool.
    Results: In total, 28 studies reported in 27 sources met our inclusion criteria. We categorized the extracted data into 4 areas. First, regarding bibliometrics, the studies were reported between 2004 and 2024, with 2010 having the highest number of studies. Of these studies, 25 were conducted in the United States, 2 were conducted in Iran, and 1 was conducted in Argentina. Health informatics journals and conference proceedings, as well as library science journals, were prominent publishing venues. The NLM funded half of the studies. Second, regarding participants, most studies focused on outpatients. Other participant roles included physicians, nurses, hospital staff, pharmacists, and librarians. Fewer than half of the studies addressed the social determinants of health. Third, regarding intervention, most studies implemented MedlinePlus information interventions within clinical settings. Other interventions occurred in community pharmacies, community organizations, libraries, online health platforms, or patient portals. Fourth, regarding outcome, only 4 studies assessed clinical outcomes, and the findings were mixed and inconsistent. However, 24 of 28 studies reported positive nonclinical outcomes, including improved attitudes toward and satisfaction with MedlinePlus and enhancements in patients' information-seeking behaviors, confidence, and willingness to engage in decision-making, physician-patient communication, self-management, and self-efficacy.
    Conclusions: This systematic literature review is the first comprehensive examination of how MedlinePlus has been integrated into clinical care, supporting patients and caregivers with enhanced access to health information. Our findings offer evidence and insights through the Collaborative Chronic Care Model lens and can guide the development of digital health interventions to improve patient health.
    Keywords:  MedlinePlus; health information access; information intervention; information prescription; patients and caregivers; systematic literature review
    DOI:  https://doi.org/10.2196/79416
  12. BMC Oral Health. 2026 Apr 30.
       BACKGROUND: This study aimed to evaluate and compare the performance of five publicly accessible large language models (LLMs)-based chatbots, ChatGPT-4o, DeepSeek-V3, Claude-Sonnet-4, Gemini-2.0 Flash, and Grok-3, in addressing inquiries from patients with periodontitis seeking orthodontic treatment. The primary objective was to assess the reliability, quality, and readability of the LLM-generated responses.
    METHODS: Thirty frequently asked questions regarding orthodontic treatment for patients with periodontitis were sourced from social media platforms and health-related websites and compiled for this study. Each LLM response was evaluated for reliability using the modified DISCERN (mDISCERN) tool, quality using the Global Quality Score (GQS), and readability using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) scores. Differences among models were analysed using linear mixed-effects models, with model treated as a fixed effect and question as a random effect. Post-hoc pairwise comparisons of estimated marginal means were performed with Bonferroni's adjustment. Significance was set at P < 0.05.
    RESULTS: Among the evaluated LLMs, significant performance differences were observed across all metrics (P < 0.001). Grok-3 provided the highest reliability and quality (mDISCERN: 4.20 ± 0.48; GQS: 4.38 ± 0.61), whereas Claude-Sonnet-4 scored the lowest (mDISCERN: 3.54 ± 0.50; GQS: 3.63 ± 0.59). DeepSeek-V3 was rated as most readable (FRE: 33.61 ± 6.11; FKGL: 10.10 ± 1.14), whereas Claude-Sonnet-4 was the least readable (FRE: 4.73 ± 4.14; FKGL: 13.72 ± 1.22). All models produced responses with university-level readability.
    CONCLUSIONS: Grok-3 demonstrates higher reliability and quality, whereas DeepSeek-V3 generates more readable responses. All models exceed recommended readability thresholds for patient education. However, given the risks of misinformation and readability limitations, these should be considered supplementary educational resources, rather than primary sources of medical information.
    Keywords:  Artificial intelligence; Large language models; Orthodontic treatment; Periodontitis
    DOI:  https://doi.org/10.1186/s12903-026-08448-7
  13. Clin Pract. 2026 Mar 25. pii: 66. [Epub ahead of print]16(4):
      Background: Peripheral nerve stimulation (PNS) is increasingly used in selected patients with neuropathic pain, and many individuals seek supplemental online information to clarify procedural expectations and postoperative care. Large language models such as ChatGPT may provide scalable patient education; however, their performance for PNS-related questions has not been evaluated. This study assessed the reliability, accuracy, and comprehensibility of ChatGPT-5.0 responses to common PNS patient questions. Methods: We conducted a cross-sectional evaluation of ChatGPT-5.0 responses to 21 standardized questions derived through expert consensus, spanning pre-implantation, implantation, and post-implantation domains. Sixteen board-certified interventional pain specialists and a nurse educator independently rated each response using validated scales for reliability (1-6), accuracy (1-3), and comprehensibility (1-3). Descriptive statistics were calculated, and domain-level patterns were examined. Results: Clinician ratings demonstrated generally strong performance across all domains. Mean reliability was 4.7 ± 1.4, mean accuracy 2.6 ± 0.6, and mean comprehensibility 2.8 ± 0.5. Foundational questions addressing mechanisms, expectations, and postoperative care received the highest ratings. Lower ratings were observed for implantation-focused items requiring procedural nuance. No response fell below predefined acceptability thresholds, and sensitivity analyses confirmed that including one partial evaluator did not alter the observed trends. Conclusions: ChatGPT-5.0 generated responses to PNS-related patient questions that clinicians rated as generally reliable, accurate, and understandable, particularly for foundational and postoperative topics. Performance was more variable for procedural questions, underscoring the need for clinician oversight and verification. These findings provide a benchmark of current LLM capabilities and highlight the importance of ongoing evaluation as models evolve and as patients access versions with differing functionalities.
    Keywords:  ChatGPT; artificial intelligence; large language models; neuromodulation; patient education; peripheral nerve stimulation
    DOI:  https://doi.org/10.3390/clinpract16040066
  14. Front Public Health. 2026 ;14 1810358
       Background: Patient-facing large language model (LLM) outputs for inflammatory bowel disease (IBD) must be decision-relevant, readable, and verifiable.
    Methods: In a cross-sectional benchmark using a guideline-derived question set, five publicly available LLMs provided answers to 20 single-intent patient IBD questions, mapped to prespecified decision-critical domains across the care pathway (100 model-question responses). Queries were conducted from January 17-24, 2026, via official web interfaces under default settings (privacy mode; new chat per prompt). Two blinded raters evaluated informational quality and completeness (using DISCERN, EQIP, and the Global Quality Scale), transparency proxies (based on JAMA benchmark criteria), and readability through the Automated Readability Index, Flesch Reading Ease, Gunning Fog Index, Flesch-Kincaid Grade Level, Coleman-Liau Index, and SMOG. Overall differences were assessed using within-question paired Friedman tests with Holm adjustment, and effect size was quantified with Kendall's W.
    Results: Interrater agreement was high [DISCERN ICC(A,1) = 0.842; EQIP ICC(A,1) = 0.760; GQS weighted κ = 0.812; JAMA weighted κ = 0.936]. Median DISCERN scores ranged from 43.5 to 57.5, and EQIP scores ranged from 67.5 to 77.5, while transparency remained limited (JAMA median 0-1/4). Readability consistently failed to meet patient targets, with grade-level indices exceeding sixth grade and Flesch Reading Ease medians ranging from 15 to 36 (compared to a target of ≥80 for "easy" readability). All 10 outcomes varied significantly across models (Holm-adjusted P < 0.001; W = 0.238-0.702).
    Conclusion: Under default settings, publicly available LLMs exhibit variable informational quality for IBD but consistently poor transparency and readability. Patient-facing deployment should mandate provenance, currency, and disclosure fields, as well as outputs targeted to appropriate grade levels.
    Keywords:  benchmarking; generative artificial intelligence; health literacy; inflammatory bowel disease; large language models; patient-facing; readability; transparency
    DOI:  https://doi.org/10.3389/fpubh.2026.1810358
  15. JMIR Med Inform. 2026 Apr 29. 14 e81720
       Background: Large language model-based chatbots are increasingly used by the public to access medical information. Although these tools can improve access and convenience, their quality, clarity, and transparency remain uncertain for rare and diagnostically complex neurological conditions, such as myelin oligodendrocyte glycoprotein antibody-associated disease (MOGAD).
    Objective: This study aimed to evaluate the scientific quality, understandability, citation transparency, and readability of responses generated by widely used artificial intelligence chatbot platforms to a standardized, patient-centered query on MOGAD.
    Methods: We conducted a cross-sectional content analysis using the query, "What is MOGAD, and how is MOGAD treated?" Ten widely accessible chatbot platforms were queried once on the same day in new sessions. Responses were anonymized and independently evaluated by 7 blinded neurologists using DISCERN (treatment-related scientific quality), Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), and the Web Resource Rating (WRR; citation transparency). Readability was assessed using the Flesch-Kincaid Grade Level (FKGL) and Coleman-Liau Index, and word count was recorded. Platforms were compared by functional orientation and the access model. Mann-Whitney U and Kruskal-Wallis tests with Dunn post hoc tests were used. Interrater reliability was assessed using intraclass correlation coefficients.
    Results: Significant differences were observed across platforms for DISCERN, PEMAT-P, and WRR scores (all P<.001). Search-focused platforms achieved higher understandability than conversation-focused platforms (median PEMAT-P 52.6, IQR 47.4-54 vs 46.7, IQR 42-47.3; P=.04), whereas conversation-focused platforms had higher WRR scores (median 26.8, IQR 19.6-26.8 vs 19.6, IQR 19.6-25.9; P=.001). DISCERN scores did not differ significantly by functional orientation (P=.11). Paid-access platforms outperformed free-access platforms in DISCERN (median 42, IQR 36-45 vs 33, IQR 23.8-41.3; P<.001), PEMAT-P (median 52.6, IQR 46-54 vs 46, IQR 26.3-47.4; P=.002), and WRR (median 26.8, IQR 23.2-26.8 vs 10.7, IQR 3.57-19.6; P<.001). However, no statistically significant differences were observed between paid and free platforms in response length (median word count 336, IQR 271-369 vs 206, IQR 116-294; P=.11) or readability metrics. FKGL scores were comparable between paid and free outputs (median 17.54, IQR 16.6-18.4 vs 17.56, IQR 16.5-17.6; P=.61), and Coleman-Liau Index values similarly showed no significant difference by access model (median 21.30, IQR 20.6-22.3 vs 21.71, IQR 20.9-22.1; P=.91). Readability remained limited: all outputs exceeded recommended public health readability thresholds (FKGL≥8). High interrater agreement was observed (intraclass correlation coefficient=0.902 for DISCERN, 0.887 for WRR, and 0.838 for PEMAT-P).
    Conclusions: Artificial intelligence chatbot responses to a patient-centered MOGAD query varied substantially in scientific quality, understandability, transparency, and readability. Search-focused systems were more understandable, whereas conversation-focused systems showed greater citation transparency. Paid-access platforms achieved higher quality and transparency scores, without differences in readability or response length. All outputs exceeded recommended public health readability thresholds. These findings highlight the need for context-sensitive evaluation of chatbot outputs in rare and clinically complex conditions such as MOGAD.
    Keywords:  artificial intelligence; chatbots; citation transparency; health information quality; large language models; myelin oligodendrocyte glycoprotein antibody–associated disease; patient education; readability
    DOI:  https://doi.org/10.2196/81720
  16. BMC Med Inform Decis Mak. 2026 Apr 30. pii: 149. [Epub ahead of print]26(1):
       BACKGROUND: The internet has become an important source of information for cancer patients. Numerous websites provide nutritional advice that promises benefits for the outcome of cancer therapy. The aim of our study was to evaluate and compare the online information about cancer diets on German- and English-language websites.
    METHODS: A patient's online search was simulated using the search engines Google and Bing. Websites were evaluated by means of content and formal criteria according to a standardized instrument.
    RESULTS: The analysis of 31 websites revealed heterogeneous quality regarding content and formality, distributed evenly among the German- and English-language websites. The quality of content and formality does not correlate with the website's order of appearance in a browser-based search.
    CONCLUSIONS: The high discrepancy in quality of content and formality represents a risk for cancer patients, who are searching for information online. Content of poor quality and formality increases the risk of mal-information and consecutive false decisions on diet. It results in the decline of therapy response, an increased probability of therapeutic toxicity and a poorer prognosis in general. The visibility of high-quality websites needs to be improved.
    Keywords:  Cancer diets; Internet; Patient information; Web-based information
    DOI:  https://doi.org/10.1186/s12911-026-03529-7
  17. J Imaging Inform Med. 2026 Apr 27.
      This study aimed to conduct a multidimensional evaluation of artificial intelligence (AI) chatbot-generated patient information regarding cone beam-computed tomography (CBCT) in dentistry, with specific focus on readability, informational quality, reliability, and patient-centered suitability. Twenty frequently asked, patient-oriented questions related to CBCT were systematically identified from a public online forum. Each question was submitted to four large language model-based chatbots (ChatGPT-4o, Gemini Advanced, Claude Sonnet 4, and Microsoft Copilot) under standardized conditions. Generated responses were evaluated using validated instruments, including the DISCERN tool and Global Quality Scale (GQS) for information quality and reliability, as well as Flesch Reading Ease, Flesch-Kincaid Grade Level, and Gunning Fog Index for readability. Patient-centeredness was further assessed using PEMAT-Understandability and PEMAT-Actionability scores. Comparative analyses were performed using linear mixed-effects models. Significant differences were observed among chatbots across all evaluated domains (p < 0.05). While advanced models demonstrated higher informational quality and reliability, their responses frequently exceeded recommended health literacy thresholds. Readability, transparency, and actionability varied substantially between platforms. No chatbot consistently met all criteria for optimal patient-directed communication. AI chatbots can provide generally accurate information on CBCT; however, variability in readability, reliability, and educational suitability limits their standalone use for patient education. Careful integration with professional oversight is essential to ensure safe and accessible AI-supported communication in dentomaxillofacial radiology. This study provides the first multidimensional, comparative evaluation of leading AI chatbots in delivering patient-oriented information about cone beam-computed tomography. It shows critical gaps between informational accuracy and health literacy suitability. This study reveals the need for professional oversight when using AI for patient education in dentomaxillofacial radiology.
    Keywords:  Artificial intelligence; Cone beam–computed tomography; Large language models; Oral radiology
    DOI:  https://doi.org/10.1007/s10278-026-01976-2
  18. BMC Oral Health. 2026 Apr 29.
       BACKGROUND: Given the increasing reliance on online health information, this study aimed to systematically assess the quality of German-language eHealth information on head and neck cancer (HNC) related dental care.
    METHODS: German-language websites were searched via Google.de, Bing.de/Yahoo.de, and DuckDuckGo.com in February 2025. German-language Youtube-videos were searched in March 2025. Websites were assessed across 4 domains: technical/functional aspects (LIDA-instrument), readability (Flesh-reading-ease-score), comprehensiveness (structured checklist), and quality and risk of bias (DISCERN-instrument). Differences between domains were tested using the Friedman test. Group differences among provider types were examined with one-way ANOVA or Kruskal-Wallis tests. YouTube-videos were assessed for comprehensiveness, viewers' interaction, and viewing rate. The Wilcoxon rank-sum test compared comprehensiveness between Youtube-videos and websites.
    RESULTS: A total of 134 eligible websites and 26 YouTube-videos were included. 63.4% of the websites were operated by private dental practices. All four domains differed significantly from each other (p < 0.001). Websites from private and corporate dental practices or private hospital groups showed significantly lower scores in technical/functional aspects compared with websites from dental societies, regulatory bodies, public institutions, or insurance companies. Overall readability was poor, with the highest scores observed for institutional websites (median 49.0) and the lowest for private practices (median 38.0). Comprehensiveness of patient-oriented information was low, especially among corporate dental practices and private hospital groups (median 5.0). Quality of consumer health information was highest for commercial or non-profit information services (median 29) and lowest for private and corporate dental practices (median 23.0). Only 19.2% of YouTube-videos originated from private dental practices, and exhibited low viewer interaction (median 0.9). No significant difference in comprehensiveness was observed between websites and YouTube-videos (p = 0.924).
    CONCLUSIONS: German-language eHealth information on dental care in HNC is generally of low quality. This study highlights the need for standardized, reliable, and patient-oriented online resources to support oral health and quality of life in HNC patients.
    Keywords:  Dental care; Digital media; Head and neck cancer; Information quality; Website; eHealth
    DOI:  https://doi.org/10.1186/s12903-026-08488-z
  19. Turk Arch Pediatr. 2025 Dec 29. 61(2): 147-151
       OBJECTIVE: Accessible, high-quality online health information is essential for patient understanding, particularly in specialized fields such as pediatric neurology. However, little is known about the readability and quality of Turkish-language online resources in this area. To evaluate the readability and quality of Turkish online educational materials on pediatric neurology using validated readability formulas and standardized quality assessment criteria.
    MATERIALS AND METHODS: An Internet-based Patient Educational Material search was conducted using Google with the terms "çocuk nöroloji"/"child neurology" and "pediatrik nöroloji"/"pediatric neurology". After applying exclusion criteria, 69 websites were included. These websites were categorized into 2 groups: Group 1 (academic and hospital websites) and Group 2 (physician and general health information websites). Readability was assessed using the Ateşman and Bezirci-Yılmaz formulas, while website quality was evaluated using the Journal of the American Medical Association (JAMA) benchmark criteria.
    RESULTS: The median Ateşman Readability Score was 36.63, and the Bezirci-Yılmaz score was 7.04, indicating difficult readability. Only 13.04% of websites met ≥3 JAMA criteria and were classified as high quality. Group 2 (physician and general health information websites) had significantly higher quality scores than Group 1 (academic and hospital websites) (P < .001). No significant differences were found between groups in terms of readability.
    CONCLUSION: Most Turkish-language pediatric neurology websites are difficult to read and of low quality. Quality was higher in non-academic sources, although readability remained inadequate across all sources. These findings underscore the need for developing readable and high-quality online educational materials in pediatric neurology to enhance public understanding and informed health decisions.
    Keywords:  Health literacy; JAMA criteria; online health information; pediatric neurology; readability
    DOI:  https://doi.org/10.5152/TurkArchPediatr.2025.25337
  20. Int J Impot Res. 2026 Apr 29.
      Penile cancer is rare, and patients increasingly rely on the internet for health information. On April 1, 2025, we conducted a cross-sectional evaluation of the quality and readability of the top 100 Google search results; 71 websites were included in the analysis. Overall quality was fair, with a mean DISCERN score of 41.01 ± 13.67, and transparency was limited, with a mean Journal of the American Medical Association (JAMA) benchmark score of 1.69 ± 1.02. Readability was generally suboptimal: the mean Flesch Reading Ease (FRE) score was 52.06 ± 12.05 (fairly difficult), and the Gunning Fog Index and Simplified Measure of Gobbledygook (SMOG) scores (8.48 ± 2.43 and 7.19 ± 2.02, respectively) indicated reading demands above the recommended sixth-grade level. Kruskal-Wallis and Dunn's post hoc tests showed significant differences in DISCERN scores across affiliations, with non-profit websites scoring higher than commercial websites. FRE also differed by affiliation (p = 0.016), although post hoc comparisons were not significant, and sensitivity analyses supported the robustness of these findings. Correlation analysis demonstrated a strong association between DISCERN and JAMA scores (r = 0.662; p < 0.001). These findings support improved disclosure of authorship and update dates and simplification of language for patient education.
    DOI:  https://doi.org/10.1038/s41443-026-01281-0
  21. J Public Health Res. 2026 Apr;15(2): 22799036261441327
       Background: Low back pain (LBP) is the leading cause of disability worldwide and is highly prevalent in Arabic-speaking countries. Many patients seek health information online, but the quality and reliability of Arabic resources remain unclear. This study evaluated the quality, reliability, and readability of Arabic websites on LBP.
    Design and methods: A cross-sectional study was conducted on July 15, 2025, using the Arabic keyword "low back pain" in Google, Yahoo, and Bing. The first 100 results per engine were screened in incognito mode. Eligible websites were Arabic, publicly accessible, and patient oriented. After exclusions, 95 websites were included. Websites were classified by affiliation and assessed using the DISCERN instrument (quality), Journal of the American Medical Association (JAMA) benchmarks (reliability), and automated readability indices (Flesch Reading Ease, Flesch-Kincaid Grade Level, SMOG).
    Results: Of 300 screened websites, 95 met inclusion criteria. Health portals and educational sites comprised 46.3%. Overall quality was moderate (mean DISCERN 45.9 ± 11.5), with 82.1% rated as moderate and only 4.2% as good. Reliability was low (mean JAMA 2.46 ± 1.1); only 20% met all four benchmarks. Authorship and currency were present in 36.8% and 26.3% of sites. Readability was high, with 96.8% achieving FRE ≥ 80. Top-ranked websites showed higher quality and reliability (p < 0.001), though readability differences were minimal. DISCERN and JAMA correlated moderately (rho = 0.472, p < 0.001).
    Conclusions: Arabic websites on LBP are generally easy to read but often lack transparency, reliability, and evidence-based content. Strengthening online Arabic health resources through standardized quality frameworks is crucial to reduce misinformation and support informed decision-making.
    Keywords:  Arabic websites; health information; infodemiology; low back pain; quality; readability
    DOI:  https://doi.org/10.1177/22799036261441327
  22. Medicine (Baltimore). 2026 May 01. 105(18): e48496
      Acute gout attacks cause severe pain, and short-video platforms have become patients' primary source of information. However, the quality and reliability of this information are increasingly concerning. This study will systematically evaluate the information quality of gouty arthritis-related content on Bilibili and TikTok video-sharing platforms, along with factors influencing video quality. This study systematically evaluated the quality and reliability of 100 popular gout-related videos each from Bilibili and TikTok. Video quality and reliability were assessed using the global quality score, Modified DISCERN (mDISCERN), JAMA Benchmark Standard, and Hexagonal Radar Schema (HRS) tools. Correlations between video quality and metrics such as likes, comments, saves, and shares were also analyzed. Results showed median scores across 4 metrics on Bilibili: global quality score 3.0 (2.00, 4.00), mDISCERN3..0 (3.00, 4.00), JAMA 3.0 (2.00, 3.00), HRS 5.0 (4.00, 6.00); TikTok's corresponding scores were 3.0 (IQR 3.00-4.00), 3.0 (IQR 3.00-4.00), 3.0 (IQR 3.00-3.75), and 3.0 (IQR 2.00-4.50). Although Bilibili's HRS scores were higher than TikTok's, video quality was generally poor across both platforms. Furthermore, the study found a positive correlation between video length and quality. Increased likes and shares may not always reflect improved video quality, as these metrics can be influenced by the entertainment nature of online videos and may not fully indicate quality. Our research indicates that the health information short videos related to gouty arthritis on Bilibili and TikTok have poor quality, but the videos uploaded by medical professionals are considered reliable in terms of comprehensiveness and content quality. Health information seekers must carefully evaluate the scientific accuracy and reliability of short videos providing medical information on Bilibili and TikTok before making healthcare decisions.
    Keywords:  TikTok; bilibili; gouty arthritis; quality analysis; short video
    DOI:  https://doi.org/10.1097/MD.0000000000048496
  23. Neurourol Urodyn. 2026 Apr 27.
       PURPOSE: The aim of this study was to evaluate which social media platforms are most frequently used by women with interstitial cystitis/bladder pain syndrome and to assess the scientific reliability of the information shared online.
    MATERIALS AND METHODS: A cross-sectional analysis was conducted on publicly available posts published on Instagram, Facebook, X, YouTube, and TikTok. Only posts containing informative content were included. Spam, duplicated posts, and advertisements were excluded. Two expert urogynecologists independently assessed the scientific accuracy of each post, classifying them as containing scientific evidence, containing scientifically correct but alarming information, or lacking scientific evidence. For each post, authorship type, thematic category, and user engagement were recorded. Inter-rater reliability was calculated using the Kappa statistic.
    RESULTS: One hundred and forty-six posts were included: 59 on Instagram, 72 on Facebook, 9 on YouTube, 4 on X, and 2 on TikTok. Most Instagram posts were published by healthcare professionals, while Facebook posts were predominantly published by patients. On Instagram, the most frequent topic was awareness, whereas diagnostic and therapeutic discussions were more common on Facebook. A minority of posts contained scientifically validated information. Instagram generated the highest user engagement. Agreement between reviewers was low for patient-generated content but higher for posts focused on diagnosis and treatment.
    CONCLUSIONS: Instagram is mainly used by healthcare professionals to raise awareness, whereas Facebook functions as a patient-driven space for discussion. The majority of posts lacked scientific accuracy, underscoring the need for healthcare professionals to strengthen their online presence to counter misinformation and support individuals seeking reliable information about bladder pain.
    Keywords:  bladder pain syndrome; health communication; interstitial cystitis; patient education; social media
    DOI:  https://doi.org/10.1002/nau.70298
  24. Int J Dev Disabil. 2026 ;72(3): 622-630
       Objectives: This study aimed to evaluate the educational quality and level of misinformation of the 100 most-viewed English-language YouTube videos on autism treatment notion and to compare the popularity of video groups created according to treatment recommendations.
    Methods: The search terms 'autism treatment', 'autism cure', 'autism therapy' 'treating autism' and 'treatment of autism' were used to select 100 videos. Each video was evaluated using DISCERN and JAMAS scales. The treatment modality mentioned as the main topic in the video was classified in three groups.
    Results: The total number of thumbs-up (likes) and thumbs-down (dislikes) and comments count for these videos were 1.291.319; 49.750 and 191.462, respectively. 36% of videos were of poor quality (average score of 1.86 points) and contained varying degrees of misinformation compared to the existing body of evidence. The mean 'accuracy level' of the videos was 4.15 and the average balance level was 2.02. As the level of misinformation in videos increased, there was a notable increase in the number of likes and an optimized popularity metric. The three treatment groups differed significantly in terms of the popularity-based metadata (thumbs up, thumbs down, comment count, optimized popularity metrics (MV, LV); p-values respectively; <0.001, 0.004, 0.001, 0.011, <0.001) except views (p = 0.085).
    Conclusions: Various concerns exist about the accuracy of the information, the presence of misleading content, and the educational quality of YouTube videos on autism. It is crucial to employ a critical approach when utilizing this information, considering the origin of the videos.
    Keywords:  JAMAS; DISCERN; Online streaming; autism; social media
    DOI:  https://doi.org/10.1080/20473869.2026.2634782
  25. Sci Rep. 2026 May 01.
      Chagas disease remains a public health problem, and YouTube is used to access health information. This study aimed to assess whether YouTube videos on Chagas disease comply with scientific guidelines. This study analyzed videos identified using terms in English, Portuguese and Spanish. Two independent reviewers assessed eligible videos. Video quality was evaluated using a scale developed and pre-tested by three clinical experts. Comparisons between adequate and inadequate videos were performed using Mann-Whitney or Fisher's exact tests, and associations between variables were assessed using Poisson regression. Among 158 videos screened, 96 were included. For definition, videos from healthcare and academic institutions/professionals showed higher prevalence ratio of adequate quality (1.86; 95% CI 1.14-3.02), as did treatment (3.69; 95% CI 1.67-8.15). Videos in English (0.57; 95% CI 0.34-0.97) and Spanish (0.31; 95% CI 0.16-0.61), and longer duration (0.998; 95% CI 0.995-0.999) were associated with lower quality for definition. Higher numbers of views, likes, and longer duration were associated with higher prevalence of adequate quality for etiology, while longer duration was associated with natural history and with lower prevalence of adequate quality for diagnosis. Most videos showed inadequate quality, highlighting the need for better guideline-based content on YouTube.Registration number: 10.17605/OSF.IO/NXW8H.
    Keywords:  Chagas disease; Health education; Misinformation; Online health information; Science communication; YouTube
    DOI:  https://doi.org/10.1038/s41598-026-50600-4
  26. Public Health Chall. 2026 Jun;5 e70251
      Insomnia is a prevalent sleep disorder that remains widely underdiagnosed and undertreated, prompting many individuals to seek health information beyond formal healthcare systems. Social media platforms, such as YouTube, have become influential spaces for the circulation of sleep-health information; however, concerns persist regarding whether credible, expert-led content achieves sufficient visibility to support population-level health outcomes. This study analyses 98 English-language YouTube videos to examine how sleep-health information circulates within an algorithmically mediated environment, focusing on the relationship between source credibility, network visibility, audience engagement and diffusion potential. Drawing on diffusion of innovations and social cognitive theory, the study conceptualises YouTube as a sociotechnical system in which opinion leadership, social reinforcement and algorithmic amplification jointly shape influence. The findings revealed a systematic credibility-influence gap, whereby visibility and diffusion are driven more by network position, engagement dynamics and algorithmic amplification than by source credibility. The results highlight a key public health challenge in which credible sleep-health information struggles to achieve reach within platform-mediated systems. By identifying the network and engagement mechanisms that shape diffusion, this study provided evidence to inform more effective digital health communication strategies aimed at increasing the visibility of trustworthy sleep-health information and supporting healthier lifestyles.
    Keywords:  YouTube; algorithmic visibility; credibility; digital health; health communication; networked influence; social media platforms
    DOI:  https://doi.org/10.1002/puh2.70251
  27. Sci Rep. 2026 Apr 28.
      
    Keywords:  Bilibili; Digital health; Information quality; Parkinson’s disease; Short video; TikTok
    DOI:  https://doi.org/10.1038/s41598-026-50589-w
  28. Digit Health. 2026 Jan-Dec;12:12 20552076261445969
       Objective: With rising kidney stone prevalence in China (currently 7.54%), patients increasingly seek health information through short video platforms like TikTok and Bilibili. However, the quality and reliability of kidney stone-related content on these platforms remains unclear, potentially affecting patient understanding and health decisions.
    Methods: In this cross-sectional study, we analyzed 172 kidney stone videos from TikTok (n=95) and Bilibili (n=77). Videos were categorized by uploader type: professional individuals (67.44%), nonprofessional individuals (22.09%), professional institutions (6.40%), and nonprofessional institutions (4.07%). Quality assessment utilized the Global Quality Score (GQS) and modified DISCERN tool, evaluating content comprehensiveness across six domains: definition, symptoms, risk factors, evaluation, management, and outcomes. Statistical analyses compared platform differences and uploader type variations.
    Results: TikTok demonstrated significantly higher engagement metrics (median views: 140,255 vs. 10,489; comments: 207 vs. 37; likes: 703 vs. 78) but shorter video duration (53s vs. 121s, p<0.001). Although the median DISCERN score was 2 on both platforms, the distribution was significantly different (p=0.013), with Bilibili videos achieving higher scores (median GQS: 3.0 vs. 2.0, p=0.002; DISCERN: 2.0 vs. 2.0, p=0.013). Professional institutions produced highest-quality content across both platforms (GQS: 3.0-4.0; DISCERN: 2.0-3.0), significantly outperforming nonprofessional creators (p<0.001). Content analysis revealed inadequate coverage of comprehensive kidney stone education, with most videos focusing on basic symptoms and management rather than prevention and risk factors.
    Conclusion: While professional creators maintain higher content quality, overall kidney stone information quality on short video platforms remains suboptimal. Platform-specific differences suggest Bilibili's longer format enables more comprehensive education despite lower engagement. Enhanced content standards and professional creator incentivization are needed to improve kidney stone health education on social media platforms.
    Keywords:  Bilibili; TikTok; information quality; kidney stones; reliability; social media
    DOI:  https://doi.org/10.1177/20552076261445969
  29. Digit Health. 2026 Jan-Dec;12:12 20552076261444042
       Background: Short-form videos are an increasingly important source of health information for individuals with type 1 diabetes mellitus (T1DM), yet their quality is unverified.
    Objective: This study aimed to evaluate and compare the quality, reliability, and engagement of T1DM-related videos on Bilibili and TikTok.
    Methods: We conducted a cross-sectional analysis of the top 100 T1DM-related videos from Bilibili and TikTok (N=200). Videos were systematically evaluated using four validated instruments: the Global Quality Scale (GQS), Journal of the American Medical Association (JAMA) criteria, Video Information and Quality Index (VIQI), and modified DISCERN (mDISCERN). Engagement metrics were extracted, and Spearman correlations and a multivariable negative binomial regression were performed to identify predictors of video 'likes'. A comprehensive sensitivity analysis, including Principal Component Analysis (PCA), was conducted to ensure robustness.
    Results: TikTok videos achieved significantly higher user engagement than those on Bilibili (median views: 88,089 vs. 3,418). In terms of quality, TikTok scored higher on the VIQI (median: 12.0 vs. 9.0, P < 0.001), while Bilibili scored higher on the JAMA criteria (median: 2.0 vs. 0.0, P < 0.001). No significant platform differences were found for GQS or mDISCERN. In the adjusted regression model, VIQI score was a strong positive predictor of likes (RR=1.66, 95% CI 1.32-2.13), whereas a higher GQS score was a negative predictor (RR=0.24, 95% CI 0.13-0.45). These findings were robust across all sensitivity analyses.
    Conclusions: T1DM-related short videos on Bilibili and TikTok exhibit substantial variability in quality and reliability. TikTok demonstrates stronger audiovisual quality, whereas Bilibili shows better transparency (JAMA). Engagement was driven more by production quality than informational accuracy. These findings suggest that optimizing content strategies and strengthening professional participation may be beneficial for digital diabetes education.
    Keywords:  Bilibili; JAMA; TikTok; VIQI; mDISCERN; patient education; short-video platforms; type 1 diabetes
    DOI:  https://doi.org/10.1177/20552076261444042
  30. Front Public Health. 2026 ;14 1714828
       Background: Over the past few years, short videos have shown considerable promise as a medium for disseminating health-related information. Health-related content about heatstroke is extensively circulated across short video platforms. Nonetheless, the quality, credibility, practical value, and accuracy of the professional knowledge conveyed in these short videos have not been systematically assessed.
    Objective: This study aims to analyze the content and quality of videos related to heatstroke on short video sharing platforms.
    Methods: As of September 1, 2025, the term "heatstroke" was used as a keyword to search on TikTok, BiliBili, and Kwai short video platforms, and the top 300 videos from each platform were included and recorded. Two qualified researchers independently assessed the content and quality of the selected videos utilizing the Journal of the American Medical Association (JAMA) scoring system, the Global Quality Scale (GQS), the modified DISCERN instrument, and the Patient Education Materials Assessment Tool (PEMAT). SPSS version 26.0 and decision chain analysis were used to generate descriptive statistics, compare differences between groups, and assess relationships among variables via Spearman correlation analysis.
    Results: This study analyzed 632 heatstroke-related videos on BiliBili, TikTok, and Kwai. The quality of videos varied considerably across platforms. These videos had a mean JAMA score of 1.50 (SD: 0.72), a mean GQS score of 3.14 (SD: 0.83), a mean modified DISCERN score of 2.35 (SD: 0.74), a mean PEMAT-understandability score of 0.58 (SD: 0.11), and a mean PEMAT-actionability score of 0.50 (SD: 0.32). Overall, the general quality and reliability of videos on TikTok and BiliBili were superior to those on Kwai. Most videos were uploaded by news agencies and physicians (accounting for 37.5% and 35.28%, respectively), with the content primarily focusing on symptoms (32.75%) and treatment (23.73%). Across platforms, video duration was positively correlated with video quality.
    Conclusions: The findings reveal that heat stroke-related short videos across BiliBili, TikTok, and Kwai are generally of low quality and vary markedly among platforms, which may misguide public health practices. The results suggest the need to strengthen the development of authoritative science popularization content, optimize health communication strategies, and introduce platform quality assessment and recommendation mechanisms to enhance the public's disease prevention capabilities.
    Keywords:  health information; heatstroke; quality assessment; reliability; short videos
    DOI:  https://doi.org/10.3389/fpubh.2026.1714828
  31. Sci Rep. 2026 Apr 27.
      
    Keywords:  Content quality; Douyin; Health education; Peritoneal dialysis; Self-care; Social media; TikTok
    DOI:  https://doi.org/10.1038/s41598-026-50551-w
  32. Sci Rep. 2026 Apr 30.
      Age-related macular degeneration (AMD) is a leading cause of blindness in the elderly. Short-video platforms like TikTok are increasingly important sources of health information, yet concerns persist regarding content quality and reliability. To systematically evaluate the quality, reliability, and user engagement characteristics of AMD-related videos on TikTok. We systematically searched TikTok for AMD-related videos. Quality and reliability were assessed using JAMA benchmark, modified DISCERN, Global Quality Scale (GQS), and Patient Education Materials Assessment Tool (PEMAT). Entropy weight method and cluster analysis were applied to engagement data. Among 145 videos, overall quality was poor. Median scores were 1 for JAMA, 2 for mDISCERN, and 3 for GQS. Understandability was limited (median PEMAT-U: 43%), while actionability was moderate (median PEMAT-A: 60%). High-quality videos were characterized by creation by Western medicine ophthalmologists, inclusion of prognostic information, and monologue narration; physician title showed no association with quality. "Saves" carried the highest engagement weight (35.71%). "Self-test and screening" themes achieved the highest engagement rate (62.5%). Media accounts attained the highest PEMAT-U scores and interaction metrics. Video format and presenter attire showed no significant impact on quality or engagement. AMD-related health information on TikTok is generally poor quality. Information quality and user engagement are driven by distinct factors, highlighting the need for targeted strategies to improve content accuracy and understandability.
    Keywords:  Age-related macular degeneration; Health education; Social media; TikTok; Video quality
    DOI:  https://doi.org/10.1038/s41598-026-44509-1
  33. Medicine (Baltimore). 2026 May 01. 105(18): e48563
      Short video platforms have become significant sources of health information, yet evidence on the quality and reliability of Sjögren syndrome (SS)-related short videos is limited, particularly with respect to comparative assessments across different platforms. This study aimed to evaluate the quality and reliability of short videos of SS on TikTok and Bilibili. A cross-sectional study was conducted in China by searching predefined SS-related keywords on TikTok and Bilibili. Searches were performed up to October 7, 2025, and the top 120 videos from each platform were included (n = 240). Video characteristics, content categories, and uploader types were extracted. Information quality and reliability were assessed using the Global Quality Scale (GQS; higher scores indicate better overall educational quality) and modified DISCERN (mDISCERN; higher scores indicate greater reliability and better-quality health information). Chi-square tests were used to assess differences in platform distributions, while the Mann-Whitney U test compared engagement data and quality scores between TikTok and Bilibili. Spearman rank correlation (ρ) was used to assess associations between engagement metrics and quality scores. Of the 220 videos analyzed (TikTok: 108; Bilibili: 112), content predominantly covered symptoms (176, 80.0%), diagnosis (118, 53.6%), and treatment (78, 35.4%), whereas epidemiology (32, 14.5%), etiology (54, 24.5%), and prevention (41, 18.6%) were less frequent. Among uploaders, specialists contributed the largest share of videos (n = 147), whereas nonspecialists and individual users accounted for fewer videos (n = 43 and n = 30, respectively). The overall median GQS and mDISCERN scores were both 2.00 (interquartile range: 2.00-3.00), indicating suboptimal quality. Videos uploaded by specialists exhibited significantly higher GQS and mDISCERN scores than those uploaded by nonspecialists or individual users (P < .0001). Engagement metrics were weakly correlated with quality scores. SS-related short videos on TikTok and Bilibili in China showed suboptimal information quality and reliability and uneven topic coverage, with epidemiology, etiology, and prevention being underrepresented. Videos uploaded by specialists were associated with higher GQS and mDISCERN scores. These findings highlight the need for better regulation and monitoring of health content on short video platforms.
    Keywords:  Sjögren syndrome; public health; social media; video quality
    DOI:  https://doi.org/10.1097/MD.0000000000048563
  34. Sci Rep. 2026 Apr 27.
      The global prevalence of Chronic Kidney Disease (CKD) ranges from 9.1% to 13.4%, with China having the largest number of CKD and advanced CKD patients in Asia. Most patients choose hemodialysis (HD) due to its high safety. Still, long-term treatment may cause complications such as restless legs syndrome and skin itching, which seriously affect patients' quality of life. Recently, the Internet has gradually become the main medical and health information source. As one of the largest short-video platforms in China, TikTok is an important source for spreading health information. However, the reliability of content about hemodialysis on short-video platforms varies and lacks professional evaluation. This study aims to evaluate the content, reliability, and quality of short videos related to hemodialysis on TikTok. In May 2025, a new TikTok account was created, and the keyword "hemodialysis" was used for searching. The first 100 videos were evaluated using three scales: GQS, JAMA, and the Modified DISCERN. Relevant information from the videos was extracted and analyzed. Overall, the quality of short videos about hemodialysis on TikTok was not satisfactory. Most videos had GQS scores of 2-3, JAMA scores of 2, and Modified DISCERN scores of 2. Videos posted by health professionals had higher quality and reliability than those by non-health professionals (P < 0.05). Videos with diverse presentation forms had significantly higher GQS, JAMA, and Modified DISCERN scores than monotonous presentation forms (P < 0.05). Some variables, such as likes and duration, comments, and scale scores, showed no correlation, while the rest were positively correlated (P < 0.05). This study shows that the overall quality and reliability of short videos related to hemodialysis on TikTok are low, but videos posted by medical professionals and those with diverse presentation forms are of better quality. It is recommended that when users search for relevant health information on short-video platforms, they should prioritize watching videos released by qualified healthcare professionals with verified identity badges.
    Keywords:  Cross-sectional study; Hemodialysis; Quality analysis; Short videos
    DOI:  https://doi.org/10.1038/s41598-026-49487-y
  35. Public Health Rep. 2026 Apr 27. 333549261438081
       OBJECTIVES: Alcohol-associated liver disease (AALD) is a leading cause of liver disease. Alcohol use disorder is a growing public health problem in the United States. TikTok is a growing source of public health information; such information is not peer reviewed and often does not meet scientific standards. We assessed the quality of AALD information on TikTok.
    METHODS: We conducted a retrospective observational study of TikTok videos obtained on March 8, 2024, by searching the phrase "alcohol-associated liver disease." We analyzed video characteristics, engagement, and content. Three physicians independently assessed the reliability and quality of the videos by using the DISCERN tool and the Global Quality Score (GQS), scored from 1 to 5, with higher scores indicating better reliability and quality, respectively.
    RESULTS: We included 139 videos in the analysis. Video creators/publishers were health care professionals (39.6%), patients and family/friends (35.3%), wellness coaches (22.3%), and others (2.9%). The median (IQR) DISCERN score was 2.0 (1.3-2.7); the median (IQR) GQS score was 2.5 (1.5-3.3), indicating the videos were of low quality. Videos by health care professionals had higher DISCERN and GQS scores (P < .001) than videos by other creators/publishers. Video characteristics did not differ significantly between creator/publisher types. Regression results indicated that videos from health care professionals correlated positively with higher DISCERN and GQS scores, especially when videos were longer.
    CONCLUSION: The quality and reliability of TikTok videos on AALD are poor. The public should exercise caution when accessing AALD-related information on TikTok. Health care providers and public health officials should strongly investigate the quality of health information on social media platforms and seek to improve it.
    Keywords:  health care providers; hepatology; public health; social media
    DOI:  https://doi.org/10.1177/00333549261438081
  36. J Med Internet Res. 2026 Apr 30. 28 e86137
       Background: Shared decision-making allows patients and clinicians to make decisions together to help determine the most appropriate option. Patients need comprehensive health information to participate and evaluate different options during the shared decision-making process. Patients with diabetes need to constantly monitor their health status. They experience an array of health information needs during their ongoing health management. Online health information acquisition is a common behavior among patients with diabetes, and online information can impact the interaction between patients with diabetes and health care providers.
    Objective: This study explored the relationship between 2 types of online health information acquisition behavior (active online health information seeking and incidental online health information acquisition) and shared decision-making. It also investigated the mediating role of eHealth literacy during the information acquisition process among US patients with diabetes aged 18 to 44 years.
    Methods: Participants were patients with diabetes aged 18 to 44 years in the United States and were recruited by a survey company, Centiment. The sampling process matched the national distribution of gender and age in the United States. An online survey questionnaire was distributed through Qualtrics. A total of 558 valid responses were collected. The average age of the sample was 35.91 (SD 6.04) years. Among the sample, 260 participants were men, 291 participants were women, and 7 participants identified their gender as other. Bivariate analyses and partial least squares structural equation modeling were used for data analysis. All data analyses were performed in R.
    Results: The prevalences of active online health information seeking (mean 3.97, SD 0.78) and incidental online health information acquisition (mean 4.27, SD 0.78) were high among participants. Education was a key factor related to eHealth literacy (P<.001) and shared decision-making (P<.001). Model testing indicated that active online health information seeking was related to eHealth literacy (β=.192, 95% CI .067-.320) and shared decision-making (β=.234, 95% CI .123-.346). Incidental online health information acquisition was related to eHealth literacy (β=.335, 95% CI .205-.461). eHealth literacy was related to shared decision-making (β=.441, 95% CI .334-.536). Therefore, eHealth literacy partially mediated the relationship between active online health information seeking and shared decision-making, while it fully mediated the relationship between incidental online health information acquisition and shared decision-making.
    Conclusions: This study contributes to the ongoing development of health communication strategies and the modification of health information training programs for patients with diabetes. The need for the information industry to deliver accurate and easy-to-understand health information to the public to support their decision-making process and encourage positive health behaviors is urgent.
    Keywords:  active online health information seeking; diabetes mellitus; eHealth literacy; health communication; incidental online health information acquisition; online health information acquisition; shared decision-making
    DOI:  https://doi.org/10.2196/86137
  37. J Am Acad Audiol. 2026 Mar;37(2): 144-152
       Background: Evidence synthesis refers to a reproducible literature review that addresses a structured research question with a critical analysis of the results from a comprehensive literature search such as what is seen in systematic reviews, scoping reviews, and meta-analyses. Evidence synthesis publications have become more common, but their quality has not increased. However, with an increased understanding of how to produce a high-quality evidence synthesis, this issue can be addressed. Adherence to the open science principles of transparency, collaboration, reproducibility, and accessibility can help increase the quality of evidence synthesis.
    Purpose: This tutorial provides step-by-step instructions regarding how to produce a high-quality evidence synthesis project following best practices for evidence synthesis and open science principles.
    Research Design: Tutorial.
    Data Collection and Analysis: This tutorial is a step-by-step guide of how to apply current evidence synthesis standards to knowledge synthesis projects that will contribute to audiology research. Information regarding open access and how the principles of open science contribute to the quality and reproducibility of evidence synthesis is provided.
    Results: Widely applicable steps for high-quality evidence synthesis research are presented with information about how open science principles can be applied to the evidence synthesis process.
    Conclusions: By incorporating open science principles in the production of evidence synthesis, published research will be of higher quality and improve access to critical information for evidence-based practice. Professionals can produce high-quality research by understanding the appropriate steps for an evidence synthesis project, choosing the appropriate method for a structured research question, and utilizing the recognized international standards for the production of systematic reviews and other types of evidence synthesis.
    Clinical Relevance Statement: This tutorial introduces readers to evidence synthesis with step-by-step guidance and resources for clinician and researcher teams to create high-quality research while following best practices for evidence synthesis.
    Keywords:  review literature as topic; scoping review as topic; systematic reviews as topic
    DOI:  https://doi.org/10.3766/jaaa.250020
  38. J Chem Inf Model. 2026 Apr 28.
      On-demand polymer discovery is essential across various industries, from biomedical applications to reinforcement materials. Experiments with polymers involve a long trial-and-error process that consumes extensive resources. For these processes, machine learning has accelerated scientific discovery on the property-prediction and latent-space search fronts. However, laboratory researchers cannot readily access codes, and these models to extract individual structures and properties due to infrastructure limitations. We present a closed-loop polymer structure-property predictor integrated in a terminal for early-stage polymer discovery. The framework is powered by LLM reasoning to provide users with property prediction, property-guided polymer structure generation, and structure modification capabilities. The SMILES sequences are guided by the synthetic-accessibility score and the synthetic-complexity score to ensure that polymer generation is close to that of synthetically accessible monomer-level structures. This framework addresses the challenge of generating novel polymer structures for laboratory researchers, thereby providing computational insights into polymer research.
    DOI:  https://doi.org/10.1021/acs.jcim.6c00343