bims-librar Biomed News
on Biomedical librarianship
Issue of 2026–03–22
28 papers selected by
Thomas Krichel, Open Library Society



  1. Med Ref Serv Q. 2026 Mar 16. 1-10
      Research and Education Librarians at Rowland Medical Library at the University of Mississippi Medical Center designed an engaging 90-minute instruction session for middle school students participating in a summer enrichment program. Librarians adapted instruction strategies to effectively reach and connect with a younger, nontraditional audience. Through active learning activities, group discussion, and a scavenger hunt activity, students learned about academic health sciences libraries and strengthened information evaluation skills.
    Keywords:  Academic health sciences libraries; dental education; instructional design; library instruction; middle school students; outreach
    DOI:  https://doi.org/10.1080/02763869.2026.2641625
  2. J Clin Epidemiol. 2026 Mar 18. pii: S0895-4356(26)00097-1. [Epub ahead of print] 112222
       BACKGROUND: Grey literature are materials produced outside traditional academic or commercial publishing such as theses, conference proceedings, technical reports, and government documents. Grey literature is often underrepresented in systematic reviews due to limited accessibility and lack of standardized search methodologies. Despite its potential to reduce publication bias and broaden evidence bases, there is no universally accepted guidance for identifying or appraising grey literature. This study aimed to inform methodological guidance for incorporating grey literature in evidence synthesis.
    METHODS: We conducted a descriptive cross-sectional analysis of 100 systematic reviews published between January 2022 and June 2025 in high-impact journals across public health, education, health services and social care. Reviews were identified via PubMed. Two independent reviewers used a piloted data extraction form to collect information on grey literature identification, extraction, appraisal and synthesis. Descriptive statistics were used for categorical variables and thematic content analysis was employed to explore reporting practices. Findings informed the development of a structured, step-by-step guidance and checklist for grey literature identification, appraisal, extraction, and reporting RESULTS: Systematic reviews primarily originated from medical and health sciences disciplines. The most common grey literature search strategies were hand searching (58%), use of organizational or government websites (42%), and Google Scholar (39%). While 89% of reviews applied formal inclusion/exclusion criteria to grey literature, only 19% extracted grey literature separately from peer-reviewed sources. The influence of grey literature on review conclusions varied substantially.
    CONCLUSION: Grey literature searching in systematic reviews remains highly variable and often lacks transparency. There is an urgent need for standardized guidance, greater integration of automation tools and improved researcher training in accessing less visible sources. Without these improvements, grey literature searching will remain inconsistent and resource-intensive, with uncertain impact on review quality.
    Keywords:  Guidance; evidence synthesis; grey literature; searching; systematic reviews
    DOI:  https://doi.org/10.1016/j.jclinepi.2026.112222
  3. Am J Med Sci. 2026 Mar 13. pii: S0002-9629(26)00110-2. [Epub ahead of print]
      Systematic Review and Meta-Analyses (SRMAs) hold the highest rank in evidence hierarchy; however, their results are vulnerable to selective reporting and publication bias. Nonsignificant/secondary outcomes, such as adverse events or rare complications, are often omitted from the abstract due to word limits or pressure to publish. Academic databases index limited metadata but not the full-text, making outcomes that are not reported in the abstract resistant to search. This creates a modern "file drawer problem," where findings are available in full-text but remain inaccessible to search, leading to overestimated treatment effects and biased reported complication rates in SRMAs. To address this issue, this study proposed recommendations at three levels: Primary study researchers, secondary study researchers, and journals/academic databases. A collaborative effort among stakeholders is necessary to enhance the accessibility of nonsignificant outcomes through database searches, thereby reducing the risk of biased evidence that could influence clinical guidelines and patient care.
    Keywords:  Bibliographic; Databases; Evidence-Based Medicine; Gray Literature; Meta-Analysis as Topic; Metadata; Publication Bias; Search Engine; Systematic Reviews as Topic
    DOI:  https://doi.org/10.1016/j.amjms.2026.01.013
  4. Integr Cancer Ther. 2026 Jan-Dec;25:25 15347354261422762
       BACKGROUND: Cancer patients increasingly use YouTube for nutritional guidance, yet information quality varies substantially. Existing text-based assessment tools fail to capture audiovisual content characteristics. This study aimed to (1) develop a video-specific assessment tool, (2) evaluate German-language YouTube videos on cancer nutrition, and (3) identify quality indicators for laypersons.
    METHODS: A 20-criteria assessment tool integrating established instruments and video-specific elements was developed. The first 30 YouTube videos on cancer nutrition were systematically evaluated. Spearman correlation and Kruskal-Wallis tests identified associations between video characteristics and quality scores. Interrater reliability was assessed.
    RESULTS: Intraclass correlation coefficient indicated good to very good interrater reliability (95% CI: 0.87-0.96). Overall video quality was poor (mean: 38.6/60, SD: 5.3). Videos from hospitals (P = .002) and healthcare organizations (P = .006) scored significantly higher than those from independent persons. Videos with clearly formulated goals (rs = 0.71, P < .001) and cited references (rs = 0.43, P = .019) demonstrated stronger evidence-based content. High-quality videos more frequently addressed missing evidence (rs = 0.51, P = .004). Quality scores inversely correlated with likes (rs = -0.55, P = .002) and views (rs = -0.46, P = .01).
    CONCLUSION: YouTube videos on cancer nutrition exhibit substantial quality deficits, even from institutional providers. The validated assessment tool identifies observable quality indicators including clear objectives, scientific citations, transparent discussion of evidence gaps, and institutional authorship. However, no single feature reliably predicts quality. Strengthening digital health literacy and improving evidence-based content production and visibility remain essential priorities.
    Keywords:  consumer health information; health literacy; medical oncology; neoplasms/diet therapy; quality assurance; quality of health care; social media; video recording
    DOI:  https://doi.org/10.1177/15347354261422762
  5. Ann R Coll Surg Engl. 2026 Mar 18.
       INTRODUCTION: Artificial intelligence (AI) chatbots, powered by large language models, are used increasingly for disseminating surgical information, but concerns about accuracy, hallucinations and source reliability persist. This study evaluates the sources of information upon which these systems rely when producing medical information. As these models generate language without true comprehension or reasoning, assessing the credibility and nature of their referenced sources is essential to promote transparency and support evidence-based integration of AI in healthcare.
    METHODS: Nine AI chatbots (ChatGPT-5, ChatGPT-5 Think, DeepSeek R1, DeepSeek DeepThink, Google Gemini 2.5 Flash, Grok 3, Grok 4, Perplexity Research and Perplexity Search) were queried with six standardised general surgery prompts, both with and without explicit requests for references (n=108 outputs); 1,249 references were extracted and assessed for quantity, authenticity, quality, source category, accessibility, geographic origin and attribution.
    RESULTS: Reference provision varied: four chatbots required explicit prompting, whereas others cited consistently. Hallucination rates ranged from 0% (five models) to 34% (Grok 3). Mean quality scores differed significantly, with Perplexity Research achieving the highest score (4.08) and ChatGPT-5 the lowest (2.39), reflecting differences observed in source type. Most references originated from the US or UK. Accessibility was best in Google Gemini (100% open access, clickable citations). Explicit prompting increased reference quantity significantly in six models and quality in one.
    CONCLUSIONS: AI chatbots exhibit heterogeneous reference integrity, with risks of hallucinations and biases underscoring the need for prompt engineering, model refinements and ongoing evaluation. Our findings suggest ongoing caution is required in surgical contexts to ensure safe, equitable information dissemination.
    Keywords:  Artificial intelligence; Bibliometrics; General surgery; Medical informatics; Trust and transparency
    DOI:  https://doi.org/10.1308/rcsann.2026.0021
  6. Otolaryngol Head Neck Surg. 2026 Mar 18.
       OBJECTIVE: Artificial intelligence-supported large language models (LLMs) have become increasingly widespread in recent years in the health communication and patient education. Models such as ChatGPT, Claude, Gemini, and DeepSeek are used to provide information on complex medical topics, thanks to their natural language processing capabilities. This study compares the responses of models to 5 frequently asked questions about cochlear implants in terms of content and communication quality.
    STUDY DESIGN: Comparative analysis of 4 LLMs using expert-evaluated responses to cochlear implant queries.
    SETTING: Virtual simulation with blinded specialist assessments.
    METHODS: Five of the most frequently searched cochlear implant questions on Google were selected. Each question was individually posed to ChatGPT-4, Gemini 2.0, Claude 3.7, and DeepSeek v3. The responses from each model were evaluated by 5 otolaryngology specialists using a 5-point scale based on content accuracy and communication appropriateness. One-way ANOVA and post hoc tests were used for statistical analysis.
    RESULTS: Statistically significant differences were identified among the models in both content and communication quality (P < .05). The DeepSeek model achieved the highest average scores in both areas, while the Claude model generally received the lowest scores. ChatGPT-4 demonstrated a balanced performance, while Gemini stood out in certain communication criteria.
    CONCLUSION: This study is one of the first comparative analyses evaluating the performance of 4 different large language models in the context of patient education about cochlear implants. Although some models appear more suitable for patient education, the findings indicate that these systems still have limitations when used without expert oversight.
    Keywords:  Artificial Intelligence; ChatGPT; Claude; Cochlear Implant; DeepSeek; Gemini; Large Language Model
    DOI:  https://doi.org/10.1002/ohn.70192
  7. Digit Health. 2026 Jan-Dec;12:12 20552076261420876
       Background: Large language models such as ChatGPT are increasingly used by patients seeking perioperative information, yet their reliability for anesthesia-related patient education remains insufficiently evaluated. This study assessed the quality of ChatGPT-4.0 responses to frequently asked anesthesia questions using a multi-rater evaluation framework.
    Methods: Twenty-two common anesthesia-related patient questions were identified through online search. Each question was submitted once to ChatGPT-4.0 (GPT-4-turbo; chat.openai.com) without follow-up prompts. Five anesthesiology and reanimation specialists-each with more than 20 years of experience-independently evaluated each response using a validated 4-point Likert-type scale (1 = excellent; 4 = unsatisfactory). Inter-rater reliability was calculated using a two-way random-effects model (ICC[2,1]).
    Results: A total of 110 ratings were collected. Among these, 61.8% were classified as excellent, 32.7% as satisfactory requiring minimal clarification, and 5.5% as satisfactory requiring moderate clarification. No responses were rated as unsatisfactory. Mean scores for individual questions ranged from 1.0 to 2.4. Reviewer-wise averages ranged from 1.27 to 1.73, indicating generally positive evaluations with modest variability in scoring strictness. The overall inter-rater reliability was poor to fair (ICC = 0.25).
    Conclusions: ChatGPT-4.0 provided high-quality responses to frequently asked patient questions about anesthesia and may serve as a supportive digital health tool for patient education. However, limited agreement among evaluators highlights the need for expert oversight and contextual refinement when integrating large language models into clinical communication pathways.
    Keywords:  ChatGPT; anesthesia; artificial intelligence; digital health; digital patient education; large language models
    DOI:  https://doi.org/10.1177/20552076261420876
  8. J Craniofac Surg. 2026 Mar 16.
       BACKGROUND: Severe jawbone atrophy often limits the use of conventional dental implants. Modern 3D imaging and CAD/CAM technologies have revitalized subperiosteal implants as a customized alternative for these challenging cases. As AI chatbots (ChatGPT, Claude, DeepSeek, and Copilot) increasingly serve as patient information tools, evaluating the accuracy and clarity of their explanations about such complex procedures has become essential.
    OBJECTIVE: This study aimed to evaluate the accuracy, reliability, readability, actionability, understandability, and practical usefulness of responses provided by AI chatbots to patient questions about subperiosteal jaw implants.
    METHODS: The authors evaluated 4 AI-based chatbots (ChatGPT, DeepSeek, Copilot, and Claude) by submitting frequently asked questions on subperiosteal jaw implants in independent sessions to avoid data leakage. Responses were compiled, duplicates removed, and refined for clarity. Two independent experts assessed the outputs using validated tools: accuracy (5-point Likert), reliability (CLEAR criteria), quality (mGQS), readability (FRE, FKGL), usefulness (4-point scale), and understandability/actionability (PEMAT).
    RESULTS: DeepSeek, Claude, and ChatGPT produced more understandable, actionable, and higher-quality responses than CoPilot, with DeepSeek performing the best overall. Across all models, clarity, mGQS, and accuracy were strongly aligned, while usefulness was inversely related. Readability and actionability-understandability correlations showed consistent patterns, with the strongest positive link observed in DeepSeek.
    CONCLUSION: AI chatbots, such as DeepSeek, Claude, and ChatGPT, can provide accurate and understandable information about the subperiosteal jaw implants, though practical guidance and readability remain limited. Domain-specific training and integration with authoritative dental resources may enhance their clinical utility and patient education potential.
    Keywords:  AI chatbots; DeepSeek; artificial intelligence; chatGPT; subperiosteal implants
    DOI:  https://doi.org/10.1097/SCS.0000000000012417
  9. Am J Infect Control. 2026 Mar 12. pii: S0196-6553(26)00356-1. [Epub ahead of print]
       BACKGROUND: As patients increasingly turn to artificial intelligence (AI) chatbots for medical information, concerns remain regarding the accuracy, transparency, and readability of these tools. This study aimed to comparatively assess the quality, reliability, understandability, actionability, and readability of SSI-related responses produced by widely used AI chatbots.
    METHODS: A cross-sectional design was used to evaluate six AI chatbots (ChatGPT-5o, Gemini 2.5 Pro, Gemini 2.5 Flash, DeepSeek, Grok-1.5, Perplexity). Five patient-centered SSI questions were developed through a Delphi method and directed to each chatbot. A multidisciplinary panel of five blinded experts rated responses using DISCERN, QUEST, PEMAT-P, and the Web Resource Rating (WRR). Readability was assessed using SMOG, Flesch Reading Ease, and Ateşman formulas. Inter-rater reliability was calculated using the Intraclass Correlation Coefficient.
    RESULTS: No single chatbot excelled across all domains. ChatGPT-5o achieved the highest quality scores (DISCERN), while DeepSeek showed the highest accuracy (QUEST). Gemini 2.5 Pro demonstrated the best understandability; however, actionability was lower across all platforms. Transparency was a major weakness: all chatbots scored poorly on WRR, with ChatGPT-5o performing best yet still considered a low-quality source. Readability was generally high-level, with most responses requiring high-school to university literacy (SMOG 11.8-13.9).
    CONCLUSION: Current AI chatbots are not sufficiently reliable as primary educational tools for SSI prevention. Despite strengths in quality and clarity, shortcomings in transparency and readability limit safe patient use.
    Keywords:  Artificial Intelligence; Chatbots; Large Language Models; Patient Education; Surgical Site Infection
    DOI:  https://doi.org/10.1016/j.ajic.2026.03.004
  10. J Craniofac Surg. 2026 Mar 16.
       BACKGROUND: Craniofacial microsomia (CFM) is the second most common congenital craniofacial anomaly. As patients increasingly seek health information online, large language models (LLMs) like ChatGPT and DeepSeek have emerged as potential sources of medical information. This study evaluates the performance of ChatGPT-5 and DeepSeek-V3.2 in providing bilingual responses to CFM-related questions.
    METHODS: Twenty-two questions covering CFM definition, etiology, diagnosis, treatment, and prognosis were developed. Each question was submitted in English and Chinese to both LLMs using a zero-prompt approach. Responses were evaluated for accuracy using a predefined 4-point scale, with readability assessed using the Flesch Reading Ease score for English and the Chinese Readability Platform for Chinese. Safety statement frequency was also recorded.
    RESULTS: DeepSeek demonstrated significantly higher accuracy than ChatGPT in both English (score 1: 86.4% versus 45.5%, P=0.004) and Chinese (77.3% versus 40.9%, P=0.014). However, only DeepSeek produced responses with inaccurate or misleading content (score 3). For English readability, DeepSeek scored significantly higher (39.4±5.5 versus 35.1±8.4, P=0.031), while Chinese readability was comparable. DeepSeek also included safety statements more frequently (54.5%-72.7% versus 4.5%-18.2%).
    CONCLUSIONS: Both LLMs show potential for CFM patient education, with DeepSeek offering superior accuracy and readability in English, though it occasionally produced misleading information. ChatGPT provided safer but less detailed responses. These findings highlight the need for model-specific optimization and clinician oversight when integrating LLMs into patient education for complex craniofacial conditions.
    Keywords:  ChatGPT-5; DeepSeek-V3.2; craniofacial microsomia; large language models; patient education
    DOI:  https://doi.org/10.1097/SCS.0000000000012589
  11. BMC Oral Health. 2026 Mar 19.
      
    Keywords:  Endodontics; Health Education; Retreatment; Root Canal Therapy; Social Media; YouTube
    DOI:  https://doi.org/10.1186/s12903-026-08141-9
  12. JMIR Pediatr Parent. 2026 Mar 16. 9 e78128
       Background: Digital technologies with breastfeeding content have become an important source of information for new parents in Germany. However, little is known about the content and quality of digital breastfeeding information sources.
    Objective: The objective of this paper was to evaluate the scope, content, and quality of free-of-charge smartphone mobile apps and websites with breastfeeding-related content in Germany.
    Methods: A cross-sectional study of mobile apps and websites was conducted in July 2023. The App Store for iOS and Google Play Store for Android were searched for mobile apps. Bing.de and Google.de were searched for websites. The quality, suitability of information, readability, and coverage of digital information on mobile apps and websites were evaluated. We used the user version of the Mobile Applications Rating Scale, the Health-Related Web Site Evaluation Form, the Suitability Assessment of Materials, and the Flesch Index tool, as well as a self-developed checklist. We report our results according to the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement.
    Results: Eight mobile apps and 13 websites were included. The quality of information sources was generally good for apps (median 83%, IQR 73%-87%) and websites (median 86%, IQR 83%-89%). The suitability of information was good for apps (median 84%, IQR 70%-89%) and websites (median 89%, IQR 78%-94%). The coverage of information was good for apps (median 68%, IQR 59%-86%) and websites (median 82%, IQR 73%-100%). However, digital information was difficult or challenging to read on most apps (median 59%, IQR 53%-68%) and websites (median 58%, IQR 47%-61%). Seven of 8 mobile apps and 9 of 13 websites were commercial, with embedded links to shopping sites without external certificates confirming the trustworthiness of the information.
    Conclusions: Assertive action from nonprofit and governmental institutions should be provided to support parents with reliable, unbiased, open-access digital breastfeeding information in Germany.
    Keywords:  German parents; breastfeeding; consumer health information; content analysis; digital health literacy; internet; mobile app; smartphone
    DOI:  https://doi.org/10.2196/78128
  13. Front Public Health. 2026 ;14 1767637
       Introduction: YouTube is widely used as a source of information on health and nutrition. However, concerns exist regarding the reliability and scientific accuracy of its content. This study aims to evaluate the quality, reliability, and scientific accuracy of YouTube videos related to pediatric cancer and nutrition.
    Methods: The reliability of the videos was assessed using the mDISCERN scale, their quality was evaluated with the Global Quality Scale (GQS), and their comprehensiveness was measured using the Nutrition and Pediatric Cancer Scoring System (NPCSS), a scale specifically developed for this study.
    Results: Of the analyzed videos, 60% were classified as useful, while 40% were deemed misleading. Highly reliable videos were found to be longer and more comprehensive; however, their viewership rates were significantly lower (p < 0.05). The majority (66.7%) of the useful videos were presented by dietitians. Correlation analyses demonstrated strong positive relationships between reliability, quality, and comprehensiveness (p < 0.001). Nevertheless, as video quality and reliability increased, viewership rates tended to decrease. The instruments demonstrated an area under the curve ranging between 0.90 and 0.97.
    Discussion: YouTube contains limited content on pediatric cancer and nutrition, and only slightly more than half of the available videos are classified as useful. Even these useful videos receive low audience engagement, revealing a gap between content quality and viewer reach. This highlights the need to increase high-quality content and enhance the visibility of reliable information.
    Keywords:  YouTube; nutrition; pediatric cancer; quality; reliability
    DOI:  https://doi.org/10.3389/fpubh.2026.1767637
  14. PeerJ. 2026 ;14 e20963
      Familial Mediterranean fever (FMF) is the most common hereditary autoinflammatory disease. YouTube is a popular video-sharing platform that both patients and healthcare professionals access for medical information. This study aimed to assess the content, reliability, and quality of the YouTube videos related to FMF. To evaluate video quality and reliability, the Global Quality Scale (GQS) and DISCERN tool were used. Based on GQS scores, videos were categorized into high-, moderate-, and low-quality groups. Four groups were identified in terms of usefulness; useful information, misleading information, useful patient opinion, and misleading patient opinion. The video review was conducted on November 26, 2023. Fifty-three videos that met the inclusion criteria were included in our study. Among these videos, 33 (62.3%) were classified as useful information, 14 (26.4%) as misleading information, two (3.8%) as useful patient opinion, four (7.5%) as misleading patient opinion. Regarding quality, 21 (39.6%) videos were rated as low quality, 13 (24.5%) as moderate quality, and 19 (35.8%) as high quality. Videos uploaded by physicians were significantly more likely to contain useful information (p < 0.001), and demonstrated higher GQS and DISCERN scores compared with other groups (p = 0.003, p < 0.001, respectively). In addition to high-quality videos, there are also lower quality videos on YouTube, which may lead to the spread of misleading information. Therefore, physicians and professional organizations in the field of rheumatology need to make progress in publishing videos according to their quality by collaborating with YouTube and other video sharing sites. Furthermore, encouraging these organizations to share accurate and professionally produced videos on YouTube would benefit healthcare professionals, patients, and their families.
    Keywords:  Familial Mediterranean fever; Patient education; Quality; Reliability; YouTube
    DOI:  https://doi.org/10.7717/peerj.20963
  15. Australas J Ageing. 2026 Mar;45(1): e70149
       OBJECTIVE: The rapid growth in the use of online platforms for obtaining health-related information, together with the increasing incidence of Alzheimer's disease (AD), has made the evaluation of online information quality essential. The purpose of this research was to assess the quality and reliability of the more likely to be viewed YouTube videos related to exercise in individuals living with AD.
    METHODS: This descriptive study evaluated the quality and reliability of YouTube videos related to AD and exercise. Fifty-six English language videos were selected from the top search results based on keywords. Video sources, view rate metrics and content characteristics were recorded. The quality and reliability of the videos were independently evaluated by three physiotherapists using the Global Quality Scale (GQS) and DISCERN tool.
    RESULTS: High-quality videos had higher DISCERN scores and greater view rate (p = 0.02), whereas low-quality videos showed minimal interaction (p < 0.001). Dislike rates were similar across all groups. In addition, Pearson correlation analysis indicated a very strong positive relationship (r = 0.97, p < 0.001) between views and likes, indicating that more viewed videos tend to receive more likes.
    CONCLUSIONS: Video quality may have an influence on both the reliability of the information and viewer interaction, as reflected by view and like metrics. A considerable number of YouTube videos on exercise for individuals living with AD were shown to be of low or moderate quality. The findings highlight the need for improved oversight, collaboration between healthcare professionals and content creators, and the promotion of evidence-based digital health information to protect vulnerable populations.
    Keywords:  Alzheimer's disease; exercise; physical activity; rehabilitation; social media
    DOI:  https://doi.org/10.1111/ajag.70149
  16. J Eval Clin Pract. 2026 Mar;32(2): e70386
       OBJECTIVES: Hyperbaric Oxygen Therapy (HBOT), which provides treatment of chronic wounds and damaged tissues by inhaling pure oxygen under high pressure, serves both in centres and as portable oxygen chambers. During HBOT treatment, it is possible for the patient to experience an accident caused by oxygen and pressure. In this context, especially the use of portable oxygen chambers is quite risky and requires a high level of technical knowledge. Today, YouTube is a platform that contains a lot of useful or useless information and is used by millions of people at the same time. The information contained in these contents uploaded on YouTube without any control or filtering can reach anyone with internet access. In this context, the aim of this study is to evaluate the characteristics, quality and reliability of the content uploaded on YouTube for portable oxygen chambers.
    METHODS: The study was cross-sectional and searches were performed on the YouTube platform with five keywords. The analysed videos were evaluated by two experienced researchers in terms of the accuracy of the information contained, the parameters of the video, upload date, duration, number of views, likes and comments. Video quality was assessed using the Global Quality Scale (GQS), reliability was assessed using the modified DISCERN (Mdıscern), information accuracy, information flow, quality and precision of the videos were assessed using the Video Information and Quality Index (VIQI) Scale, and transparency was assessed using the Journal of American Medical Association Benchmark criteria (JAMA).
    RESULTS: In this study, 45 portable Hyperbaric Oxygen Therapy (HBOT) videos on the YouTube platform were evaluated for quality and reliability. Only 31% of the videos were classified as high quality, while the majority of the remaining videos were of medium (40%) and low (29%) quality. The mean scores of M-Discern and VIQI were 3.1 and 3.3, respectively, and 58% of the content had low reliability according to JAMA criteria. The quality and credibility levels of physician- and academic-generated videos were statistically significantly higher (p < 0.05) compared to content produced by independent users and marketers. A weak but significant relationship was found between GQS and number of views (r = 0.29, p = 0.04).
    CONCLUSION: This study revealed that portable HBOT content on YouTube is largely inadequate in terms of information quality and reliability. It is of great importance that content on digital platforms on health-related topics is prepared by professionals and supported by scientific references.
    Keywords:  Portabl HBOT; YouTube; quality assessment
    DOI:  https://doi.org/10.1111/jep.70386
  17. ANZ J Surg. 2026 Mar 16.
       BACKGROUND AND AIMS: With modern healthcare now shifting towards a more patient-centred approach, informed consent and expectation management relies on both easily accessible and accurate information. YouTube is a popular source of medical information for patients, but its lack of regulation, especially on procedures like Ivor-Lewis oesophagectomies, warrants analysis of the quality of content available. This study aims to assess the quality of information on YouTube related to the Ivor-Lewis oesophagectomy for patient education.
    METHODS: A search of YouTube (www.youtube.com) videos was conducted in May 2025 with the term 'Ivor-Lewis oesophagectomy'. The inclusion criteria required the videos to discuss Ivor-Lewis oesophagectomies, be narrated/subtitled in English, intended for patients and educational in nature. The exclusion criteria removed videos that were non-English, technical surgical videos aimed at healthcare providers, and promotional videos. The videos were evaluated using the DISCERN tool and the Global Quality Score (GQS) and other descriptive statistics.
    RESULTS: A total of 226 videos were assessed against the inclusion and exclusion criteria. Twelve videos were deemed eligible as per the criteria and were included for analysis. The median view count was 10 961 views and the median length of the videos were 162 s. The median DISCERN score was 52.5/80 with medical institutions garnering the highest median out of all the source categories (58/80). The median GQS score was 4 points.
    CONCLUSION: YouTube videos on Ivor-Lewis oesophagectomies were largely of poor quality, indicating greater need for regulation and standards for medical information on video-hosting platforms.
    Keywords:  health education; information dissemination; internet; oesophageal neoplasms; oesophagectomy
    DOI:  https://doi.org/10.1111/ans.70539
  18. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2026 Mar 18.
       BACKGROUND: Young people and adults are increasingly searching for information about menstruation on social media. Against this background, the present study aimed, for the first time, to examine the content and quality of German-language menstruation videos on YouTube, Instagram, and TikTok. The study addresses research questions (RQ) on content provider types (RQ1), content (RQ2), and the quality of menstruation videos (RQ3) as well as audience reactions (RQ4).
    METHODS: In 2024, a sample of N = 500 popular menstruation videos was drawn from YouTube (150), Instagram (150), and TikTok (200). For each video, up to 20 of the most liked topic-related audience comments were included (N = 6314). Videos and comments were analyzed using reliability-tested codebooks. Data analysis was performed with R. The study is preregistered, and all data, materials, and analysis scripts are publicly available.
    RESULTS: The included menstruation videos predominantly originated from laypersons (42%) and less frequently from healthcare professionals (17%; RQ1). The portrayal of menstruation was mostly neutral (52%); negative (15%) or positive (13%) depictions were less frequent. Content-wise, the videos primarily addressed the experience of menstruation (e.g., pain) and practical management of bleeding (e.g., menstrual products; RQ2). According to quality criteria for evidence-based health information, substantial deficits were observed (RQ3). Comment sections were used by viewers to share personal experiences and to ask questions (RQ4).
    DISCUSSION: Future research and practical measures are necessary to further assess and improve the quality of social media videos about menstruation.
    Keywords:  Health information; Menstrual flow; Period; Social media; mDISCERN index
    DOI:  https://doi.org/10.1007/s00103-026-04213-x
  19. Front Digit Health. 2026 ;8 1757584
       Background: Short-video platforms have become major channels for health information dissemination, yet the quality and reliability of content on ankylosing spondylitis (AS) remain underexplored.
    Objective: This study aimed to systematically evaluate the quality, reliability, and characteristics of AS-related short videos on three major Chinese platforms: TikTok, Bilibili, and rednote.
    Methods: A cross-sectional content analysis was conducted on 300 videos (100 per platform) collected in November 2025. Video uploaders were categorized as professional physicians, non-professional physicians, individual users, or institutions. Four validated instruments-modified DISCERN (mDISCERN), Global Quality Scale (GQS), Journal of the American Medical Association (JAMA) criteria, and Video Information Quality Index (VIQI)-were used to assess reliability, completeness, and production quality. User engagement metrics, including likes, shares, comments, and collections, were also analyzed.
    Results: Most videos were uploaded by professional physicians and primarily focused on clinical manifestations and treatment. Videos presented in animated or lecture-style formats demonstrated higher information quality and reliability, whereas casually recorded videos consistently scored lower across multiple assessment tools. Overall, the quality of AS-related short videos was moderate. In unadjusted analyses, collections were positively associated with information quality, suggesting that users may preferentially retain more informative content. However, after accounting for platform characteristics and exposure-related factors, information quality was not a consistent independent driver of relative engagement, and mDISCERN showed an inverse association with standardized collection levels. Other engagement indicators, including likes, comments, shares, and follower counts, showed weak or inconsistent relationships with video quality.
    Conclusion: Although AS-related short videos are predominantly produced by physicians, the overall quality remains suboptimal. Information quality appears to influence certain user behaviors, such as content saving, but does not consistently translate into higher overall engagement. These findings highlight the limitations of using engagement metrics as proxies for content quality. Users should prioritize content from verified medical professionals, and platforms may consider integrating quality-oriented indicators and improving certification systems to enhance health information dissemination.
    Keywords:  China; ankylosing spondylitis; health communication; short-video; social media; video quality
    DOI:  https://doi.org/10.3389/fdgth.2026.1757584
  20. Digit Health. 2026 Jan-Dec;12:12 20552076261428193
       Background: Asthma is a widespread chronic inflammatory airway disease posing a global health challenge. The digital era has made platforms like TikTok and Bilibili key sources for asthma education. However, the quality of asthma information on these platforms is unclear, raising concerns about content accuracy and reliability. This study evaluates the reliability and quality of asthma educational videos on TikTok and Bilibili.
    Methods: Using "asthma" as a keyword, the top 100 videos from each platform were analyzed after excluding duplicates and irrelevant content, totaling 187 videos. All content was retrieved from the mainland-China versions of TikTok and Bilibili, and the videos analyzed were therefore limited to those uploaded and accessible within China. Videos were assessed by two senior physicians using modified DISCERN and GQS tools for reliability and quality, analyzing associations with uploader types and user engagement. Nonparametric statistical methods were applied for data analysis.
    Results: Bilibili videos accounted for 51.34% of the sample, with higher median likes (101.50 vs 42.00, P = .011), shares (30.00 vs 12.00), and comments (19.00 vs 5.00) than TikTok. TikTok videos were longer (121.00 vs 83.50, P = .028) but showed weaker correlations with engagement and quality metrics. Professionals created 54.01% of videos, yet non-professionals had higher engagement. Asthma symptoms were covered in 17.11% of videos, causes in 15.51%, and diagnosis in 12.30%, with epidemiology and prevention underrepresented.
    Conclusion: Our data indicate that Bilibili videos were rated as higher quality and more reliable than those on TikTok, yet both sites reward popularity over accuracy. Users seeking evidence-based asthma information should select content uploaded by verified hospitals or certified specialists and cross-check any health claims made by non-verified influencers.
    Keywords:  Asthma; Bilibili; DISCERN; GQS; TikTok; health information quality
    DOI:  https://doi.org/10.1177/20552076261428193
  21. Digit Health. 2026 Jan-Dec;12:12 20552076261433840
       Background: No studies have evaluated Crohn's disease-related video quality on short video platforms. This study assesses the quality, reliability, and audience engagement of such videos on Douyin and Xiaohongshu to guide patients and healthcare professionals.
    Methods: The top 100 videos from each platform were retrieved using "Crohn's disease" as the keyword. Quality was evaluated using modified DISCERN, JAMA, and PEMAT tools. Comment themes were extracted to identify audience concerns.
    Results: Of 200 videos analyzed, overall JAMA and mDISCERN scores ranged from 2-3. Douyin videos showed higher quality and engagement: higher DISCERN scores (IQR 2.00-2.00 vs 1.00-2.00, p < 0.001), better PEMAT operability (IQR 0.50-0.75 vs 0.25-0.50, p = 0.002), longer duration (100 s vs 67.5 s, p < 0.001), and more interactions. Douyin was dominated by gastroenterologists (71%), while Xiaohongshu had more individual users (50%). Douyin emphasized prognosis and diagnosis; Xiaohongshu focused on treatment and care. Nursing-related videos scored lowest in quality, while follow-up content had the highest operability and engagement.
    Conclusion: Douyin provides higher-quality Crohn's disease content than Xiaohongshu. Key issues include low dissemination of professional content and poor-quality nursing information. Recommendations are threefold: for content creators to improve video quality; for platforms to institute medical review mechanisms for care content; and for patients and the public to practice critical thinking. These measures can support a precise, effective, and scientifically-grounded digital health knowledge ecosystem.
    Keywords:  Crohn's disease; Douyin; Xiaohongshu; health information; quality analysis; short video
    DOI:  https://doi.org/10.1177/20552076261433840
  22. Proc (Bayl Univ Med Cent). 2026 ;39(2): 289-293
       Background: This study aimed to evaluate the usefulness of ChatGPT and Gemini in generating information suitable for developing educational materials for patients with vitiligo.
    Methods: Information was obtained from ChatGPT and Gemini in response to questions regarding vitiligo derived from the American Academy of Dermatology website. Differences between the groups were analyzed using the Mann-Whitney U test.
    Results: Author 1 assigned a completeness score of 2 to 6 out of 15 responses generated by ChatGPT. All texts received full scores for accuracy, comprehensibility, and usefulness. Six responses from Gemini were scored 2 for completeness, and 1 response received a score of 2 for comprehensibility; all other evaluations were assigned full scores. Author 2 assigned a completeness score of 2 to 3 responses from each model, and 1 Gemini response was rated 2 for comprehensibility. All other assessments received full scores. The mean Flesch Reading Ease scores for ChatGPT and Gemini were 34.92 ± 12.28 and 31.40 ± 13.33, respectively (P = 0.41).
    Conclusion: ChatGPT and Gemini provided largely accurate and useful information on vitiligo; however, most responses were difficult to read. The content should be reviewed and refined by dermatologists to ensure adequate readability before being disseminated to patients.
    Keywords:  Artificial intelligence; ChatGPT; vitiligo
    DOI:  https://doi.org/10.1080/08998280.2025.2611688
  23. BMC Pediatr. 2026 Mar 19.
      
    Keywords:  Bilibili; Health communication; Hyperbilirubinemia; Information quality assessment; Neonatal jaundice; Phototherapy; Short-video platforms; TikTok
    DOI:  https://doi.org/10.1186/s12887-026-06771-0
  24. Ophthalmic Physiol Opt. 2026 Mar 16.
       PURPOSE: Patients are turning to the internet to access educational materials to help them make healthcare decisions, making readability an important factor. This cross-sectional study assessed the readability of online patient education materials for myopia management treatments that have regulatory approval.
    METHODS: The top 10 Google search results from May 2024 for freely available online patient information on myopia management modalities and regulatory-approved products in Canada and Australia were analysed for readability. The modalities included orthokeratology, myopia control spectacle lenses, myopia control soft contact lenses and atropine. The products included MiYOSMART® [HOYA®], Stellest® [Essilor®], MyoCare® [ZEISS], MiSight® 1 day [CooperVision®], ACUVUE® Abiliti® 1-Day [Johnson & Johnson], NaturalVue® Multifocal 1 Day [VTI], ACUVUE® Abiliti® Overnight [Johnson & Johnson] and Eikance [Aspen Pharmacare Australia]. These searches gave 120 results. Readability was assessed with Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Simple Measure of Gobbledygook (SMOG) Index and Coleman Liau Index (CLI). Additionally, websites were scored on Journal of the American Medical Association (JAMA) benchmark criteria. Statistical analysis was performed with two-tailed tests.
    RESULTS: Of 120 websites, none met the recommended sixth-grade reading level across all readability indices. There were 13 websites meeting at least one readability index, 10 being product-related. There were seven websites satisfying all four JAMA benchmarks, while the majority met one. There was a weak positive relationship between product search rank and readability (SMOG p = 0.02, GFI p = 0.02) and a weak negative relationship between JAMA benchmarks and readability for both modality (CLI p = 0.045) and product (CLI p = 0.049).
    CONCLUSIONS: Online information about myopia management is generally written above the recommended sixth-grade reading level and does not meet all JAMA benchmarks. Websites that appear as top search results do not necessarily have easier readability. The readability of online patient education materials may influence access to treatment and outcomes.
    Keywords:  Myopia; Myopia management; Online health information; Patient education; Readability; Refractive error
    DOI:  https://doi.org/10.1007/s44402-026-00030-6
  25. Res Gerontol Nurs. 2026 Mar-Apr;19(2):19(2): 65-74
       PURPOSE: Dementia profoundly affects individuals and their family caregivers, who often lack training and resources, making effective health information-seeking behavior (HISB) critical for caregiving and stress reduction. The current pilot study explored caregivers' HISB and its associations with caregiver characteristics to inform targeted interventions.
    METHOD: A cross-sectional survey was conducted with 36 family caregivers of persons with dementia. Descriptive statistics, correlation analyses, and content analysis were used to examine HISB patterns and related factors.
    RESULTS: Caregivers expressed high confidence in seeking health information but faced practical challenges in accessing and using this information. They strongly trusted information from health care professionals, dementia organizations, and pharmacists, whereas informal sources were less trusted. Income, caregiving duration, and caregiving time influenced HISB experiences and trust.
    CONCLUSION: Despite confidence, caregivers encountered barriers in health information search and use. Nursing interventions should promote accessible, easy-to-use, trustworthy resources tailored to caregivers' unique information needs and caregiving demands.
    DOI:  https://doi.org/10.3928/19404921-20260223-01