bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–09–28
thirty-one papers selected by
Thomas Krichel, Open Library Society



  1. Healthcare (Basel). 2025 Sep 19. pii: 2361. [Epub ahead of print]13(18):
      Background/Objectives: As healthcare increasingly utilizes digital delivery systems, equitable access and engagement are critical, particularly for caregivers of older adults in rural regions. This study examines how education levels and geographic rurality influence health information-seeking in Mississippi, a state with persistent structural inequities, through the theoretical lenses of Digital Divide Theory and Theory of Planned Behavior. Methods: A statewide survey was conducted among caregivers in Mississippi (N = 452) who support adults aged 50+. The survey assessed rurality level, educational attainment, attitudes toward various health information sources, perceived digital accessibility, and reported challenges in obtaining necessary health guidance. Results: Findings challenged conventional assumptions regarding rural digital engagement. Rural caregivers reported higher trust in both internet and interpersonal health information sources. Rurality did not significantly predict internet use or reported difficulty finding information. However, a significant interaction between education and rurality revealed an "Outcome Divide": while higher education correlated with more positive attitudes toward online health information in urban areas, this association weakened and reversed in highly rural contexts. Conclusions: These results underscore the need for strategies beyond merely improving access to bridge digital health equity gaps. Policy and interventions must address contextual barriers, such as digital health literacy and relevance, limiting the effectiveness of digital tools, even when internet access is available. Promoting digital health literacy, integrating trusted local interpersonal networks, and adapting educational initiatives to rural realities are essential for advancing equitable and effective digital health engagement.
    Keywords:  Digital Divide Theory; Theory of Planned Behavior; digital health literacy; health information seeking; rural caregivers
    DOI:  https://doi.org/10.3390/healthcare13182361
  2. Adv Health Inf Sci Pract. 2025 ;1(1): XLNW5463
       Background: Some research on social determinants of health (SDOH) has suggested that patient health literacy (HL) or comprehension can influence health outcomes, while other recent studies have not observed strong connections between those variables. Previous research has utilized various screening tools to assess patient HL, with varying success.
    Methods: In this three-phase study, the authors assessed the "readability" of various common patient health information documents and forms used by prominent US-based healthcare organizations and requestors, including authorizations and privacy notices, according to the Flesch-Kincaid and Flesch Reading Ease scales.
    Results: The results showed that the documents examined had readability scores higher than the average American's literacy level.
    Conclusions: Confirming anecdotal data, this research supports the conclusion that many forms signed by American patients are above their average reading level and therefore they may not have a full understanding of what they are signing, even when requesting that records to be sent to others.
    Keywords:  health literacy; patient; release of information; social determinants of health
    DOI:  https://doi.org/10.63116/XLNW5463
  3. Neurogastroenterol Motil. 2025 Sep 24. e70164
       BACKGROUND: Over half of all Americans seek health-related information online, yet the quality of this digital content remains largely unregulated and variable. The DISCERN score, a validated 15-item instrument, offers a structured method to assess the reliability of written health information. While expert-assigned DISCERN scores have been widely applied across various disease states, whether artificial intelligence (AI) can automate this evaluation remains unknown. Specifically, it is unclear whether AI-generated DISCERN scores align with those assigned by human experts. Our study seeks to investigate this gap in knowledge by examining the correlation between AI-generated and human-assigned DISCERN scores for TikTok videos on Irritable Bowel Syndrome (IBS).
    METHODS: A set of 100 TikTok videos on IBS previously scored using DISCERN by two physicians was chosen. Sixty-nine videos contained transcribable spoken audio, which was processed using a free online transcription tool. The remaining videos either featured songs or music that were not suitable for transcription or were deleted or were not publicly available. The audio transcripts were prefixed with an identical prompt and submitted to two common AI models-ChatGPT 4.0 and Microsoft Copilot for-DISCERN score evaluation. The average DISCERN score for each transcript was compared between the AI models and with the mean of the DISCERN score given by the human reviewers using Pearson correlation (r) and Kruskal Wallis test.
    RESULTS: There was a significant correlation between human and AI-generated DISCERN scores (r = 0.60-0.65). When categorized by the background of the content creators-medical (N = 26) versus non-medical (N = 43), the correlation was significant only for content made by non-medical content creators (r = 0.69-0.75, p < 0.001). Correlation between ChatGPT and Copilot DISCERN scores was stronger for videos by non-medical content creators (r = 0.66) than those by medical content creators (r = 0.43). On linear regression, ChatGPT's DISCERN scores explained 55.6% of the variation in human DISCERN scores for videos by non-medical creators, compared to 8.9% for videos by medical creators. For Copilot, the corresponding values were 47.2% and 9.3%.
    CONCLUSION: AI models demonstrated moderate alignment with human-assigned DISCERN scores for IBS-related TikTok videos, but only when content was produced by non-medical creators. The weaker correlation for content produced by those with a medical background suggests limitations in current AI models' ability to interpret nuanced or technical health information. These findings highlight the need for further validation across broader topics, languages, platforms, and reviewer pools. If refined, AI-generated DISCERN scoring could serve as a scalable tool to help users assess the reliability of health information on social media and curb misinformation.
    Keywords:  ChatGPT; Copilot; DISCERN; TikTok; artificial intelligence; health information; irritable bowel syndrome; social media
    DOI:  https://doi.org/10.1111/nmo.70164
  4. Adv Health Inf Sci Pract. 2025 ;1(1): VXUL2925
       Background: ChatGPT is a popular open-source large language model (LLM) that uses supervised learning to create human-like queries. In recent years, ChatGPT has generated excitement in the medical field. However, its accuracy must be carefully evaluated to determine its usefulness in patient care. In this literature review, the authors examine whether ChatGPT can accurately answer frequently asked questions (FAQs) from patients, make clinical recommendations, and effectively categorize patient symptoms.
    Methods: A database search in PubMed was conducted using the search terms "ChatGPT," "accuracy," and "clinical decision-making," yielding 122 unique references. Two screening stages resulted in 9 studies that met the evaluation criteria for this review.
    Results: Analysis of 9 studies showed that while ChatGPT can answer FAQs, offer recommendations, and categorize symptoms in less complicated scenarios, its clinical accuracy ranged from 20% to 95%. ChatGPT may be helpful in specific clinical scenarios; however, its variable accuracy makes it unsuitable as a stand-alone point-of-care product.
    Conclusions: ChatGPT is only adept at providing generalized recommendations when individual patient care is more suitable. Further research is needed to identify where ChatGPT delivers the most accurate responses and how it can supplement traditional care.
    Keywords:  ChatGPT; FAQs; clinical decision-making; clinical recommendations; patient questions; patient symptoms
    DOI:  https://doi.org/10.63116/VXUL2925
  5. Cureus. 2025 Sep;17(9): e92066
      Introduction Patient education plays a critical role in stroke care and management. It helps patients understand their health, diagnosis, diagnostic modalities, and treatment and improves their overall experience. With the integration of AI tools into healthcare, patient education has become efficient and easily accessible, becoming a powerful asset in healthcare. Methodology In this cross-sectional study, two artificial intelligence (AI) tools, namely, ChatGPT (OpenAI, San Francisco, California, United States) and DeepSeek AI (DeepSeek, Hangzhou, Zhejiang, China), were prompted to create patient education guides on three imaging modalities, that is, digital subtraction angiography (DSA), non-contrast computed tomography (CT), and diffusion-weighted imaging (DWI), for stroke cases. Both responses were assessed for variables such as number of words, number of sentences, average words per sentence, ease score, grade level, and average syllables per word using the Flesch-Kincaid calculator. The readability and similarity scores were assessed by the modified DISCERN score and Quillbot, respectively. Statistical analysis was done using R version 4.3.2 (R Foundation for Statistical Computing, Vienna, Austria). Results In generating patient education materials for non-contrast CT, DW-MRI, and DSA in stroke care, ChatGPT and DeepSeek AI showed similar performance across grade level, ease score, similarity, and reliability, with no statistically significant differences. ChatGPT often produced slightly higher grade levels, while DeepSeek AI had higher ease scores for some modalities. Similarity percentages varied by topic but averaged equally, and reliability was uniformly high. Linguistic features showed only minor, non-significant differences. Conclusions Both ChatGPT and DeepSeek AI performed similarly in generating patient education guides based on ease of understanding and readability. These results suggest that either AI tools can be effectively used for patient education in this context.
    Keywords:  artificial intelligence; chatgpt; deepseek ai; diffusion-weighted imaging; digital subtraction angiography; non-contrast ct
    DOI:  https://doi.org/10.7759/cureus.92066
  6. J Neurosurg Pediatr. 2025 Sep 26. 1-6
       OBJECTIVE: This study investigates the potential of artificial intelligence (AI), specifically ChatGPT 4o, to revolutionize the readability of patient education materials in pediatric neurosurgery. The American Medical Association and the National Institutes of Health recommend that educational materials be written at a 3rd- to 7th-grade reading level for accessibility. However, existing resources often exceed this range, hindering comprehension for many patients.
    METHODS: This study analyzed 38 patient education materials on hydrocephalus, spina bifida, tethered cord syndrome, cerebral palsy, Chiari malformation, and craniosynostosis from 7 top-ranked US children's hospitals. The Flesch-Kincaid grade level calculator was used to assess readability before and after AI modification.
    RESULTS: ChatGPT effectively reduced the mean reading level from 10.60 (SD 0.57) to 6.18 (SD 0.28; p < 0.05), achieving the target 6th-grade level across all conditions.
    CONCLUSIONS: Despite some limitations in maintaining word count and precise grade-level control, the results demonstrate the promising potential of AI in significantly enhancing the accessibility of pediatric neurosurgical education materials, which may lead to more inclusive patient communication and understanding.
    Keywords:  artificial intelligence; patient education; pediatric neurosurgery; readability
    DOI:  https://doi.org/10.3171/2025.5.PEDS25122
  7. Health Informatics J. 2025 Jul-Sep;31(3):31(3): 14604582251381996
      BackgroundAI tools are becoming primary information sources for patients with chronic kidney disease (CKD). However, as AI sometimes generates factual or inaccurate information, the reliability of information must be assessed.MethodsThis study assessed the AI-generated responses to frequently asked questions on CKD. We entered Japanese prompts with top CKD-related keywords into ChatGPT, Copilot, and Gemini. The Quality Analysis of Medical Artificial Intelligence (QAMAI) tool was used to evaluate the reliability of the information.ResultsWe included 207 AI responses from 23 prompts. The AI tools generated reliable information, with a median QAMAI score of 23 (interquartile range: 7) out of 30. However, information accuracy and resource availability varied (median (IQR): ChatGPT versus Copilot versus Gemini = 18 (2) versus 25 (3) versus 24 (5), p < 0.01). Among AI tools, ChatGPT provided the least accurate information and did not provide any resources.ConclusionThe quality of AI responses on CKD was generally acceptable. While most information provided was reliable and comprehensive, some information lacked accuracy and references.
    Keywords:  artificial intellingence; chronic kidney disease; health communication; patient education
    DOI:  https://doi.org/10.1177/14604582251381996
  8. Cureus. 2025 Aug;17(8): e90600
      Introduction Systemic lupus erythematosus (SLE), systemic sclerosis, and dermatomyositis are among the most prevalent rheumatological conditions. Using artificial intelligence tools (AI) like ChatGPT and Deepseek AI in health care can help in providing personalized patient education, resulting in improved health care literacy and treatment adherence. Aim To assess and compare ChatGPT and DeepSeek AI's effectiveness in generating understandable, accurate, and reliable patient education guide for three rheumatological conditions: systemic lupus erythematosus, systemic sclerosis, and dermatomyositis. Methodology ChatGPT 4.0 and DeepSeek AI were asked to write a patient education guide for "systemic lupus erythematosus", "systemic sclerosis," and " dermatomyositis". These materials were assessed using validated readability scores (Flesch-Kincaid, grade level, ease score), linguistic complexity analysis (average syllables per word, words per sentence), and similarity metrics to standard rheumatological resources. Finally, the reliability was rated using " discern score, a structured evaluation framework formed based on evidence-based guidelines from the British Society of Rheumatology and the American College of Rheumatology. Results There was no significant difference in the word count (p=0.775), sentence count (p=0.802), average word per sentence (p=0.349), average syllables per sentence(p=0.101), grade level (p=0.193), similarity% (p=0.481), reliability score (p=0.742), and ease score (p=0.097) between ChatGPT and Deepseek AI. Conclusions Both ChatGPT and DeepSeek AI offer promising avenues for augmenting patient education in rheumatology, but due to the limitations, their output should be used as a complement - not a replacement - for verified, expert-reviewed educational materials.
    Keywords:  chatgpt; deepseekai; dermatomyositis; systemic lupus erythematosus (sle); systemic sclerosis
    DOI:  https://doi.org/10.7759/cureus.90600
  9. Interv Pain Med. 2025 Sep;4(3): 100636
       Background: ChatGPT and other Large Language Models (LLMs) are not only being more readily integrated into healthcare but are also being utilized more frequently by patients to answer health-related questions. Given the increased utilization for this purpose, it is essential to evaluate and study the consistency and reliability of artificial intelligence (AI) responses. Low back pain (LBP) remains one of the most frequently seen chief complaints in primary care and interventional pain management offices.
    Objective: This study assesses the readability, accuracy, and overall utility of ChatGPT's ability to address patients' questions concerning low back pain. Our aim is to use clinician feedback to analyze ChatGPT's responses to these common low back pain related questions, as in the future, AI will undoubtedly play a role in triaging patients prior to seeing a physician.
    Methods: To assess AI responses, we generated a standardized list of 25 questions concerning low back pain that were split into five categories including diagnosis, seeking a medical professional, treatment, self-treatment, and physical therapy. We explored the influence of how a prompt is worded on ChatGPT by asking questions from a 4th grader to a college/reference level. One board certified interventional pain specialist, one interventional pain fellow, and one emergency medicine resident reviewed ChatGPT's generated answers to assess accuracy and clinical utility. Readability and comprehensibility were evaluated using the Flesch-Kincaid Grade Level Scale. Statistical analysis was performed to analyze differences in readability scores, word count, and response complexity.
    Results: How a question is phrased influences accuracy in statistically significant ways. Over-simplification of queries (e.g. to a 4th grade level) degrades ChatGPT's ability to return clinically complete responses. In contrast, reference and neutral queries preserve accuracy without additional engineering. Regardless of how the question is phrased, ChatGPT's default register trends towards technical language. Readability remains substantially misaligned with health literacy standards. Verbosity correlates with prompt type, but not necessarily accuracy. Word count is an unreliable proxy for informational completeness or clinical correctness in AI outputs and most errors stem from omission, not commission. Importantly, ChatGPT does not frequently generate false claims.
    Conclusion: This analysis complicates the assumption that "simpler is better" in prompting LLMs for clinical education. Whereas earlier work in structured conditions suggested that plain-language prompts improved accuracy, our findings indicate that a moderate reading level, not maximal simplicity, yields the most reliable outputs in complex domains like pain. This study further supports that AI LLMs can be integrated into a clinical workflow, possibly through electronic health record (EHR) software.
    Keywords:  Artificial intelligence; ChatGPT; Low back pain; Patient questions
    DOI:  https://doi.org/10.1016/j.inpm.2025.100636
  10. J Pediatr Urol. 2025 Sep 02. pii: S1477-5131(25)00457-7. [Epub ahead of print]
       INTRODUCTION: Hypospadias is the most prevalent congenital anomaly of the penis, with an estimated incidence of 0.4-8.2 cases per 1000 live births (1). However, most of the parents and families of those with hypospadias experience anxiety and uncertainty regarding the information about hypospadias (2, 3). Leading to many families conduct their own independent internet search for information to better understand a diagnosis. The reliability and quality of this information for patients and families has not previously been formally assessed. The objective of this study is to assess the ability of AI chatbots to provide accurate and readable information to patients and families on hypospadias.
    METHODS: AI chatbot inputs were sourced from google trends and healthcare organisations. Google trends was used to identify the top 10 google search terms relating to 'Hypospadias' based on search volume. Royal Children Hospital in Melbourne (RCH) and the Urology Care Foundation American Urology Association - Hypospadias (AUA) headers were used as healthcare related hypospadias inputs4 different AI chatbot programs ChatGPT version 4.0, Perplexity, Chat Sonic, and Bing AI. Three urology consultants blinded to the AI chatbots assessed responses for accuracy and safety and a further two trained investigators, blinded to AI chatbot type and each other's evaluation scores, assessed AI chatbot responses using various evaluation instruments including PEMAT, DISCERN, misinfomration and Flesch-Kincaid readability formula as well as word count and citation.
    RESULTS: As demonstrated in the 4 AI chatbots assessed contained high quality health consumer information median DISCERN 4 (IQR 3-5). The degree of misinformation was low overall and across all AI chatbot responses, with a median of 1 (IQR 1-1). The PEMAT Understandability scores was high overall with a median of 91.7 % (IQR 80-92.3). However, all AIs performed poorly in the actionability of their responses with an overall median of 40 % (20-80). The median word count per AI chatbot response was 213 (IQR 141-273).
    CONCLUSION: AI chatbots provided understandable, high level and accurate health information relating to hypospadias. However, the information was delivered at a reading level which may limit its use in a paediatric or general public setting, and only one chatbot gave clearly actionable interventions or direction. Overall, AI chatbots are a clinically safe and appropriate adjunct to face to face consultation for healthcare information delivery and will likely take a more prominent domain as technology advances.
    Keywords:  Artificial intelligence; Health infomration quality; Hypospadias; Patient information
    DOI:  https://doi.org/10.1016/j.jpurol.2025.08.029
  11. Arthrosc Sports Med Rehabil. 2025 Aug;7(4): 101200
       Purpose: To evaluate the accuracy of ChatGPT's responses to frequently asked questions (FAQs) about hamstring injuries and to determine, if prompted, whether ChatGPT could appropriately tailor the reading level to that suggested.
    Methods: A preliminary list of 15 questions on hamstring injuries was developed from various FAQ sections on patient education websites from a variety of institutions, from which the 10 most frequently cited questions were selected. Three queries were performed, inputting the questions into ChatGPT-4.0: (1) unprompted, naïve, (2) additional prompt specifying the response being tailored to a seventh-grade reading level, and (3) additional prompt specifying the response being tailored to a college graduate reading level. The responses from the unprompted query were independently evaluated by two of the authors. To assess the quality of the answers, a grading system was applied: (A) correct and sufficient response; (B) correct but insufficient response; (C) response containing both correct and incorrect information; and (D) incorrect response. In addition, the readability of each response was measured using the Flesch-Kinkaid Reading Ease Score (FRES) and Grade Level (FKGL) scales.
    Results: Ten responses were evaluated. Inter-rater reliability was 0.6 regarding grading. Of the initial query, 2 of 10 responses received a grade of A, seven were graded as B, and one were graded as C. The average cumulative FRES and FKGL scores of the initial query was 61.64 and 10.28, respectively. The average cumulative FRES and FKGL scores of the secondary query were 75.2 and 6.1, respectively. Finally, the average FRES and FKGL scores of the third query were 12.08 and 17.23.
    Conclusions: ChatGPT showed generally satisfactory accuracy in responding to questions regarding hamstring injuries, although certain responses lacked completeness or specificity. On initial, unprompted queries, the readability of responses aligned with a tenth-grade level. However, when explicitly prompted, ChatGPT reliably adjusted the complexity of its responses to both a seventh-grade and a graduate-level reading standard. These findings suggest that although ChatGPT may not consistently deliver fully comprehensive medical information, it possesses the capacity to adapt its output to meet specific readability targets.
    Clinical Relevance: Artificial intelligence models like ChatGPT have the potential to serve as a supplemental educational tool for patients with orthopaedic to aid medical-decision making. It is important that we continually review the quality of they medical information generated by these artificial models as the evolve and improve.
    DOI:  https://doi.org/10.1016/j.asmr.2025.101200
  12. Arthrosc Sports Med Rehabil. 2025 Aug;7(4): 101210
       Purpose: To evaluate the current literature regarding the accuracy and efficacy of ChatGPT in delivering patient education on common orthopaedic sports medicine operations.
    Methods: A systematic review was performed in accordance with Preferred Reporting Items for Systematic Reviews and Meta-analyses guidelines. After PROSPERO registration, a keyword search was conducted in the PubMed, Cochrane Central Register of Controlled Trials, and Scopus databases in September 2024. Articles were included if they evaluated ChatGPT's performance against established sources, examined ChatGPT's ability to provide counseling related to orthopaedic sports medicine operations, and assessed ChatGPT's quality of responses. Primary outcomes assessed were quality of written content (e.g., DISCERN score), readability (e.g., Flesch-Kincaid Grade Level and Flesch-Kincaid Reading Ease Score), and reliability (Journal of the American Medical Association Benchmark Criteria).
    Results: Seventeen articles satisfied the inclusion and exclusion criteria and formed the basis of this review. Four studies compared the effectiveness of ChatGPT and Google, and another study compared ChatGPT-3.5 with ChatGPT-4. ChatGPT provided moderate- to high-quality responses (mean DISCERN score, 41.0-62.1), with strong inter-rater reliability (0.72-0.91). Readability analyses showed that responses were written at a high school to college reading level (mean Flesch-Kincaid Grade Level, 10.3-16.0) and were generally difficult to read (mean Flesch-Kincaid Reading Ease Score, 28.1-48.0). ChatGPT frequently lacked source citations, resulting in a poor reliability score across all studies (mean Journal of the American Medical Association score, 0). Compared with Google, ChatGPT-4 generally provided higher-quality responses. ChatGPT also displayed limited source transparency unless specifically prompted for sources. ChatGPT-4 outperformed ChatGPT-3.5 in response quality (DISCERN score, 3.86 [95% confidence interval, 3.79-3.93] vs 3.46 [95% confidence interval, 3.40-3.54]; P = .01) and readability.
    Conclusions: ChatGPT provides generally satisfactory responses to patient questions regarding orthopaedic sports medicine operations. However, its utility remains limited by challenges with source attribution, high reading complexity, and variability in accuracy.
    Level of Evidence: Level V, systematic review of Level V studies.
    DOI:  https://doi.org/10.1016/j.asmr.2025.101210
  13. Eur Heart J Imaging Methods Pract. 2025 Jul;3(2): qyaf111
       Aims: We assessed the readability level of online patient education materials (PEMs) for cardiac MRI (CMRI) to determine whether they meet the standard health literacy needs as determined by the US National Institutes of Health and the American Medical Association guidelines.
    Methods and results: We evaluated the readability of CMRI PEMs from 5 websites using the Flesch-Kincaid Reading Ease (FKRE), Flesch-Kincaid grade level (FKGL), Gunning-Fog Index (GFI), Simple Measure of Gobbledygook index (SMOGI), Coleman-Liau Index (CLI), and Automated Readability Index (ARI). PEMs on the British Heart Foundation (BHF) website yielded the highest mean FKRE score, while the RadiologyInfo.org (RadInfo) website yielded the highest mean score on the CLI compared to all the other websites. Statistical analysis of individual predictors revealed that average words per sentence (P < 0.001) and average syllables per word (P < 0.001) were strong determinants of FKRE for the RadInfo PEMs. In contrast, sentences (P = 0.044), words (P = 0.046), average words per sentence (P = <0.001), and average syllables per word (P = <0.001) were significant predictors of FKRE for the InsideRadiology (InsRad) PEMs. The sensitivity analysis consistently confirmed the robustness and primary influence of average words per sentence and average syllables per word.
    Conclusion: The BHF and American Heart Association emphasize accessible CMRI communication, whereas RadInfo, InsRad, and the European Society of Cardiology PEMs may be less suitable for low-health-literacy audiences. Strategies aimed at enhancing the comprehensibility of patient education materials should primarily focus on reducing the average complexity of words and shortening average sentence lengths.
    Keywords:  Flesch–Kincaid Reading ease; cardiac MRI; magnetic resonance imaging; patient education materials; readability
    DOI:  https://doi.org/10.1093/ehjimp/qyaf111
  14. Int J Impot Res. 2025 Sep 24.
      This study aimed to evaluate the reliability, readability, and understandability of chatbot responses to frequently asked questions about premature ejaculation, and to assess the contributions, potential risks, and limitations of artificial intelligence. Fifteen questions were selected using data from Google Trends and posed to the chatbots Copilot, Gemini, ChatGPT4o, ChatGPT4oPlus, and DeepSeek-R1. Reliability was evaluated using the Global Quality Scale(GQS) by two experts, readability was assessed with the Flesch Kincaid Reading Ease(FKRE), Flesch Kincaid Grade Level(FKGL), Gunning Fog Index(GFI), and Simple Measure of Gobbledygook(SMOG), and understandability was evaluated using the Patient Educational Materials Assessment Tool for Printable Materials(PEMAT-P). Additionally, the consistency of source citations was examined. The GQS were as follows: Copilot: 3.96 ± 0.66, Gemini: 3.66 ± 0.78, ChatGPT4o: 4.83 ± 0.23, ChatGPT4oPlus: 4.83 ± 0.29, DeepSeek-R1:4.86 ± 0.22 (p < 0.001). The PEMAT-P were as follows: Copilot: 0.70 ± 0.05, Gemini: 0.72 ± 0.04, ChatGPT4o: 0.83 ± 0.03, ChatGPT4oPlus: 0.77 ± 0.06, DeepSeek-R1:0.79 ± 0.06 (p < 0.001). While ChatGPT4oPlus and DeepSeek-R1 scored higher for reliability and understandability, all chatbots performed at an acceptable level (≥70%). However, readability scores were above the recommended level for the target audience. Instances of low reliability or unverified sources were noted, with no significant differences between the chatbots. Chatbots provide highly reliable and informative responses regarding premature ejaculation; however, it is evident that there are significant limitations that require improvement, particularly concerning readability and the reliability of sources.
    DOI:  https://doi.org/10.1038/s41443-025-01179-3
  15. Obes Surg. 2025 Sep 27.
       BACKGROUND: Artificial intelligence (AI) models such as ChatGPT and DeepSeek have gained increasing attention for their potential to enhance patient education by delivering accessible and evidence-based health information. We designed the following study to evaluate the AI models-ChatGPT and DeepSeek-in generating patient education materials for bariatric surgery.
    METHODS: Thirty commonly asked patient questions related to bariatric surgery were classified into four thematic domains: (1) surgical planning and technical considerations, (2) preoperative assessment and optimization, (3) postoperative care and complication management, and (4) long-term follow-up and disease management. Responses generated by ChatGPT and DeepSeek were evaluated using three key metrics: (1) response quality, assessed by the Global Quality Score, rated on a 5-point scale from 1 (poor) to 5 (excellent); (2) reliability, measured using modified DISCERN criteria, which assess adherence to clinical guidelines and evidence-based standards, with scores ranging from 5 (low) to 25 (high); and (3) readability, evaluated using two validated formulas: the Flesch-Kincaid Grade Level and the Flesch Reading Ease Score.
    RESULTS: ChatGPT significantly outperformed DeepSeek in response quality, with a median (IQR) Global Quality Score of 5.00 (4.00, 5.00) vs. 4.00 (4.00, 5.00) (P = 0.002). Higher reliability was also observed in ChatGPT, as reflected by mDISCERN scores across all four domains (median [IQR], 22.0 [21.0, 23.25] vs. 19.7 [19.0, 20.75]; P < 0.001). While no significant difference was found in the Flesch Reading Ease Score (mean [SD], 26.11 [12.84] vs. 20.87 [12.20]; P = 0.110), ChatGPT yielded significantly higher Flesch-Kincaid Grade Level Scores (meaning its text was more complex) (mean [SD], 16.40 [2.43] vs. 13.48 [2.35]; P < 0.001). Both models produced responses at a readability level corresponding to college education.
    CONCLUSIONS: ChatGPT provided higher-quality and more reliable responses, while DeepSeek's answers were slightly easier to read. However, both models' answers lacked attention to psychosocial and cultural aspects of patient care, highlighting the need for more empathetic, adaptive AI to support inclusive patient education.
    Keywords:  AI; Bariatric surgery; ChatGPT; DeepSeek; Patient education
    DOI:  https://doi.org/10.1007/s11695-025-08249-x
  16. JMIR Infodemiology. 2025 Sep 26. 5 e76474
       Background: Dengue fever has evolved into a significant public health concern. In recent years, short-video platforms such as Douyin have emerged as prominent media for the dissemination of health education content. Nevertheless, there is a paucity of research investigating the quality of health education content on Douyin.
    Objective: This study aimed to evaluate the quality of dengue videos on Douyin.
    Methods: A comprehensive collection of short videos pertaining to dengue fever was retrieved from the popular social media platform, Douyin, at a designated point in time. A systematic analysis was then performed to extract the characteristics of these videos. To ensure a comprehensive evaluation, three distinct scoring tools were used: the DISCERN scoring tool, the JAMA benchmarking criteria, and the GQS method. Subsequently, an in-depth investigation was undertaken into the relationship between video features and quality.
    Results: A total of 156 videos were included in the analysis, 81 of which (51.9%) were posted by physicians, constituting the most active category of contributor. The selected videos pertaining to dengue fever received a total of 718,228 likes and 126,400 comments. The video sources were categorized into four distinct classifications: news agencies, organizations, physicians, and individuals. Individuals obtained the highest number of video likes, comments, and saves. However, the findings of the study demonstrated that physicians, organizations, and news agencies posted videos are of higher quality when compared with individuals. The integrity of the video content was analyzed, and the results showed a higher percentage of videos received a score of zero points for outcomes, management, and assessment, with 69 (45%), 57 (37%), and 41 (26%), respectively. The median Total DISCERN scores, JAMA, and GQS of the 156 dengue-related videos under consideration were 26 (out of a total of 80 points), 2 (out of a total of 4 points), and 3 (out of a total of 5 points), respectively. Spearman correlation analysis was conducted, revealing a positive correlation between video duration and video quality. Conversely, a negative correlation was observed between the following variables: video comments and video quality, and the number of days since posting and video quality.
    Conclusions: This study demonstrates that the quality of short dengue-related health information videos on Douyin is substandard. Videos uploaded by medical professionals were among the highest in terms of quality, yet their videos were not as popular. It is recommended that in future, physicians employ more accessible language incorporating visual elements to enhance the appeal and dissemination of their videos. Future research could explore how to achieve a balance between professionalism and entertainment to promote user acceptance of high-quality content. Moreover, platforms may consider employing algorithmic optimization or content recommendation mechanisms to encourage users to access and engage with more high-quality health science videos.
    Keywords:  DISCERN; Global Quality Score; Journal of American Medical Association; dengue fever; douyin; video quality
    DOI:  https://doi.org/10.2196/76474
  17. Mult Scler Relat Disord. 2025 Sep 18. pii: S2211-0348(25)00507-3. [Epub ahead of print]104 106765
       BACKGROUND: Exercise is a widely recommended non-pharmacological approach to improve physical function and quality of life in individuals with multiple sclerosis (MS). With the growing use of digital platforms, YouTube has become a popular resource for health-related information and exercise guidance. This study aimed to evaluate the content quality and reliability of YouTube videos offering exercise guidance for individuals with MS.
    METHODS: This search was conducted using relevant keywords, and videos were screened based on inclusion criteria. Content quality was assessed using the mDISCERN scale, Global Quality Score (GQS), and Journal of the American Medical Association (JAMA) benchmarks. Video characteristics, including duration, number of views, and uploader type, were also recorded. A total of 95 videos met the criteria.
    RESULTS: Among included videos, 18 % were rated low quality, 37 % moderate, and 45 % high quality. The most common upload sources were commercial accounts (n = 28), non-commercial organizations or community groups (n = 19), and physiotherapists (n = 16). Significant differences were found in mDISCERN and JAMA scores between low-, moderate-, and high-quality groups (p < 0.001 and p = 0.008, respectively). Strengthening exercises (n = 21), informational videos (n = 16), and other exercises (n = 15) were the most frequently presented.
    CONCLUSION: In conclusion, while YouTube serves as an accessible platform for MS-related exercise content, the variability in video quality and presence of non-expert uploads may mislead viewers with low e-health literacy. High-quality, evidence-based content provided by professionals is essential to support safe and effective exercise in individuals with MS.
    Keywords:  Exercise; Multiple sclerosis; Video content; Video quality; YouTube videos
    DOI:  https://doi.org/10.1016/j.msard.2025.106765
  18. Arthrosc Sports Med Rehabil. 2025 Aug;7(4): 101170
       Purpose: To evaluate the quality and comprehensiveness of videos regarding acromioclavicular dislocation posted on the YouTube platform and to evaluate potential reinforcement of misinformation that may hinder proper management of these injuries.
    Methods: A YouTube search was performed in November 2024 using key words "acromioclavicular joint dislocation." Videos were ranked on relevance and the first 50 videos that met inclusion criteria were analyzed by 2 reviewers. Video source, content type, time since upload, video duration, number of views, likes, subscribers and comments were recorded. Video educational quality was measured using the modified DISCERN, Journal of the American Medical Association (JAMA) score, Global Quality Score and Shoulder-Specific Score (SSS). Quality scores from different sources and content categories were compared using the Kruskal-Wallis test. Strength of relationship between variables was assessed using Spearman's rank correlation coefficient.
    Results: In total, 209,005 videos were identified of which the first 50 videos were analyzed. Mean mDISCERN, JAMA, GQS and SSS were 2.19, 2.13, 2.48, and 6.26, respectively. The most common uploader source were physicians (28%) and the most common content category was surgical management (32%). Videos uploaded by an academic source had significantly higher mDISCERN, JAMA, and SSS (P < .05). Other uploader sources did not show significant differences among each other. Quantitative video characteristics showed no significant correlation with quality scores, except the video duration. Finally, only 2 of 50 videos mention nonoperative treatment options for high-grade AC joint dislocations and only 3 of 50 videos refer to the lack of scientific evidence for operative treatment in these high-grade injuries.
    Conclusions: Current YouTube video content about AC dislocations has low overall quality, despite being mostly uploaded by physicians. It does not provide sufficient information and potentially reinforces misinformation that may downplay potential benefits of nonsurgical interventions.
    Clinical Relevance: Given the widespread use of YouTube by patients seeking medical information, evaluating the quality of this content is essential for surgeons to better understand and address the information their patients may encounter.
    DOI:  https://doi.org/10.1016/j.asmr.2025.101170
  19. Arthrosc Sports Med Rehabil. 2025 Aug;7(4): 101156
       Purpose: To evaluate the accuracy and informational quality of YouTube videos related to osteochondral allograft (OCA) transplantation as a potentially valuable educational resource for patients and health care professionals.
    Methods: A systematic analysis of YouTube videos retrieved through a predefined search strategy using the key words "osteochondral allograft" was performed. Videos were categorized by content sources, such as health care professionals with and without commercial bias, individuals, or personal testimonials. The video's duration, the publication date, and number of likes and views were recorded. To evaluate the accuracy, reliability and quality of video content, each video was assessed using the Journal of the American Medical Association (JAMA) benchmark criteria, Global Quality Score (GQS), DISCERN, and a newly developed Osteochondral Allograft Quality (OCA-QAL) score, designed specifically for this procedure.
    Results: In total, 80 YouTube videos were included. Overall, the quality of OCA-related YouTube videos was low, with mean scores of 2.16 (JAMA), 2.28 (GQS), 32.58 (DISCERN), and 5.71 (OCA-QAL). Only one video was rated as "excellent" on OCA-QAL, and none achieved full points on JAMA or GQS. Video categories included educational content with (27.5%) or without (51.3%) commercial bias for health care professionals, content for nonhealth care individuals (13.8%), and testimonials (7.5%). Strong positive correlations emerged between OCA-QAL, GQS, and DISCERN scores, whereas views and likes did not predict quality.
    Conclusions: YouTube videos on OCA transplantation generally do not meet the quality standards like peer-reviewed validation necessary for reliable patient education. Given the low quality of available content, health care providers should be cautious in recommending YouTube as a resource for OCA transplantation information and should guide patients to more rigorously reviewed resources.
    Clinical Relevance: As cartilage procedures like OCA transplantation become more common, surgeons and patients lack reliable online resources. This study underscores the need for improved digital health content to ensure accurate and trustworthy patient education.
    DOI:  https://doi.org/10.1016/j.asmr.2025.101156
  20. Digit Health. 2025 Jan-Dec;11:11 20552076251380650
       Objective: This study aimed to evaluate the quality, reliability, and popularity of anemia-related videos on the YouTube social media platform.
    Methods: A total of 50 English-language videos were selected by searching the keyword "anemia" on YouTube in March 2024 using an incognito mode on a desktop device in Turkey. The "relevance" filter was used as it reflects default user behavior and prioritizes algorithmically ranked results. Duplicate, non-English, music-only, and promotional videos were excluded. Videos were evaluated using DISCERN, global quality score (GQS), JAMA benchmark criteria, and video power index (VPI), calculated as: VPI = like ratio (%) × view rate (/day) / 100.
    Results: Of the 50 videos analyzed, 2% were uploaded by patients, 10% by non-physician healthcare professionals, 14% by media organizations, 24% by independent physicians, and 50% by medical institutions. The mean VPI was 246.76 ± 356.29, GQS 3.53 ± 0.81, JAMA 2.30 ± 0.97, and DISCERN 48.24 ± 11.94. A statistically significant correlation was found between the number of likes, like ratio, and GQS scores (p = 0.027 and p = 0.012, respectively). VPI showed a weak but significant correlation with GQS (p = 0.047, r = 0.28). Video duration showed a moderate correlation with DISCERN (r = 0.394, p = 0.005) and a weak correlation with JAMA scores (r = 0.338, p = 0.016).
    Conclusion: The quality and reliability of anemia-related videos on YouTube are generally moderate and variable. Videos uploaded by healthcare professionals were significantly more reliable and of higher quality. These findings highlight the need for health professionals to produce accurate and engaging content and for users to be guided toward evidence-based sources.
    Keywords:  Anemia; DISCERN; JAMA; YouTube; health misinformation; video quality
    DOI:  https://doi.org/10.1177/20552076251380650
  21. PEC Innov. 2025 Dec;7 100428
       Objective: YouTube short videos constitute a key informational resource for individuals at high risk of sexually transmitted infections (STIs). We conducted a quality assessment of short videos about syphilis intended for the general public in Japan.
    Methods: In October 2024, a comprehensive sample of YouTube short videos on syphilis was retrieved using keywords frequently used to search for information on the disease. mDISCERN criteria were employed to assess the reliability of the information in the 72 videos selected for analysis. Reflexive thematic analysis was used to qualitatively examine misinformation embedded within the videos.
    Results: The mean mDISCERN score of the videos was 2.0 (SD 0.9) and 80 % of the videos did not meet the reliability criteria. One in five videos potentially hindered audience engagement in health behaviors by inducing fear of the disease, reinforcing stigma and insensitivity toward high-risk individuals, and punitively portraying those infected. Some of these messages originated from healthcare professionals, indicating their potential role in reinforcing such biases.
    Conclusion: YouTube short videos can support syphilis awareness and prevention, but difficulties with information reliability and lack of quality are common. Stigmatizing content may hinder health-seeking behaviors. Enhancing the quality and sensitivity of messages, particularly those from healthcare professionals, is essential to maximize their public health impact.
    Innovation: This study is among the first to analyze YouTube short videos about STIs, combining qualitative and quantitative methods to assess misinformation.
    Keywords:  Health communication; Multimedia; Patient education; Sexually transmitted infections; Syphilis
    DOI:  https://doi.org/10.1016/j.pecinn.2025.100428
  22. Arthrosc Sports Med Rehabil. 2025 Aug;7(4): 101192
       Purpose: To assess the quality of YouTube videos regarding partial meniscectomy.
    Methods: The first 50 videos returned by the keyword search "partial meniscectomy" after screening for inclusion and exclusion criteria were included in the study. Off-topic videos, non-English language videos, duplicated videos, YouTube Shorts, and videos with poor audio quality were excluded. The primary outcomes were the DISCERN instrument (range, 15-75), Journal of American Medical Association (JAMA) benchmark criteria (range, 0-4), and Global Quality Scale (GQS) (range, 0-5). In addition, date of publication, video duration, number of likes, number of comments, and number of views were recorded. Videos were also categorized by source type (physicians, companies, or patients), subject (surgical technique, patient experience, or overview), and content (educational or subjective patient experience).
    Results: Of the 50 videos, 24 (46.0%) were published by physicians; 20 (40.0%), by companies; and 6 (14.0%), by patients. The most prevalent type of information was an overview (44.0%); 86% of the videos were educational in nature, whereas the remaining 14% described subjective patient experiences. The mean video length was 5.07 ± 0.21 minutes. The mean number of views was 1,624,827.44 ± 8,334.86; the mean number of comments, 191.62 ± 34.11; and the mean number of likes, 25,984.84 ± 1,051.76. The mean DISCERN, JAMA, and GQS scores were 45.005 ± 1.75 (95% confidence interval [CI], 44.74-45.49; range, 15-75), 1.83 ± 0.52 (95% CI, 1.68-1.97; range, 0-4), and 2.97 ± 0.52 (95% CI, 2.83-3.11; range, 1-5) respectively. For the JAMA score and GQS score, videos published by physicians had greater quality (both P = .01). Finally, overview videos were of the highest quality regarding all scores (P < .01 to P = .03), whereas educational content had higher quality than patient experience content (P < .01).
    Conclusions: The overall quality of YouTube videos concerning partial meniscectomy remains poor to suboptimal. Currently, YouTube is not an appropriate resource for orthopaedic patients seeking information about partial meniscectomy.
    Clinical Relevance: YouTube is not an appropriate resource for orthopaedic patients seeking information about partial meniscectomy.
    DOI:  https://doi.org/10.1016/j.asmr.2025.101192
  23. Curr Dev Nutr. 2025 Sep;9(9): 107525
       Background: YouTube is one of the most widely-used social media platforms and has become a key source of nutritional information for athletes. Both experts and nonexperts use it as an educational tool; however, videos created by nonexperts are more popular among viewers. As social media sources can influence athletes' nutritional knowledge, it is essential that reliable nutritional information reaches them.
    Objectives: This study aimed to identify the key nutritional information and communication methods used in popular sports nutrition videos on YouTube.
    Methods: A systematic search was conducted on YouTube to select videos that met the following criteria: English language, sports nutrition-related content, available audio, free access, 4-20 min in length, and classified as informational or educational. Qualitative content analysis was performed to examine video content, and formal concept analysis was applied to determine the structure of associations among communication methods, sports nutrition themes, and presenter expertise. A total of 114 YouTube videos met the inclusion criteria.
    Results: Four themes emerged regarding sports nutrition messages: the function of nutrition in sports, know-how, dietary strategies, and developing a dietary framework. We identified four themes in the methods used to convey these messages: language features, content delivery methods, appearing connected to the audience, and establishing credibility. The analysis revealed distinct differences in communication approaches between experts and nonexperts. Expert videos often lacked the communication techniques that nonexperts used to build trust and connect with viewers.
    Conclusions: This study highlighted the key sports nutrition information and the characteristics of communication features in sports nutrition YouTube videos. The differences in communication methods between experts and nonexperts underscore the need for more effective strategies from experts to engage athletes and build trust. Collaboration between experts and nonexperts could help improve the quality and credibility of online content.
    Keywords:  communication features; qualitative content analysis; social media; sport nutrition; sport nutrition education
    DOI:  https://doi.org/10.1016/j.cdnut.2025.107525
  24. Front Digit Health. 2025 ;7 1622503
       Background: Osteoarthritis (OA) is a debilitating condition characterized by pain, stiffness, and impaired mobility, significantly affecting patients' quality of life. Health education is crucial in helping individuals understand OA and its management. In China, where OA is highly prevalent, platforms such as TikTok, WeChat, and XiaoHongshu have become prominent sources of health information. However, there is a lack of research regarding the reliability and educational quality of OA-related content on these platforms.
    Methods: This study analyzed the top 100 OA-related videos across three major platforms: TikTok, WeChat, and XiaoHongshu. We systematically evaluated the content quality, reliability, and educational value using established tools, such as the DISCERN scale, JAMA benchmark criteria, and the Global Quality Score (GQS) system. The study also compared differences in video content across platforms, offering insights into their relevance for addressing professional needs.
    Results: Video quality varied significantly between platforms. TikTok outperformed WeChat and XiaoHongshu in all scoring criteria, with mean DISCERN scores of 32.42 (SD 0.37), 24.57 (SD 0.34), and 30.21 (SD 0.10), respectively (P < 0.001). TikTok also scored higher on the JAMA (1.36, SD 0.07) and GQS (2.46, SD 0.08) scales (P < 0.001). Videos created by healthcare professionals scored higher than those created by non-professionals (P < 0.001). Disease education and symptom self-examination content were more engaging, whereas rehabilitation videos received less attention.
    Conclusions: Short-video platforms have great potential for chronic disease health education, with the caveat that the quality of the videos currently varies, and the authenticity of the video content is yet to be verified. While professional doctors play a crucial role in ensuring the quality and authenticity of video content, viewers should approach it with a critical mindset. Even without medical expertise, viewers should be encouraged to question the information and consult multiple sources.
    Keywords:  health information; osteoarthritis; quality assessment; reliability; short videos
    DOI:  https://doi.org/10.3389/fdgth.2025.1622503
  25. BMC Cancer. 2025 Sep 24. 25(1): 1428
       BACKGROUND: Despite its high fatality rate, pancreatic cancer remains largely overlooked by the public. The rise of short-form video platforms has made them hubs for health-related content, yet the quality and reliability of this information are often in doubt.
    OBJECTIVE: This study is poised to scrutinize the quality and trustworthiness of videos pertaining to pancreatic cancer across these digital landscapes.
    METHODS: We analyzed the content and publishers of such videos on TikTok, Bilibili, and Kwai using the Global Quality Scale (GQS), modified DISCERN (mDISCERN), and Medical Quality Video Evaluation Tool (MQ-VET). We also correlated the findings with video rankings and compared the quality between the Chinese and USA platforms in 2023 and 2024.
    RESULTS: In 2023, 300 videos were analyzed with median scores indicating medium quality but low reliability, the median GQS, mDISCERN and MQ-VET scores were 2, 2 and 45, respectively. The short videos created by medical practitioners demonstrated significantly higher median scores compared to those by non-medical practitioners in GQS scores (3 [IQR, 2-4] vs. 2 [IQR, 2-3]; P < 0.001), mDISCERN scores (2 [IQR, 2-3] vs. 2 [IQR, 1-2]; P < 0.001), and MQ-VET scores (46 [IQR, 40-52] vs. 37 [IQR, 30-45.5]; P < 0.001). mDISCERN scores showed superiority in treatment-related (3 [IQR, 2-3]), prevention-related (3 [IQR, 2.75-3]), and disease-related videos (including anatomical, pathologic, epidemiologic, and basic research related to pancreatic cancer)(2 [IQR, 2-3]) compared to News and Reports (2 [IQR, 1-2]) and invalid information content (1 [IQR, 1-1]; P < 0.001). TikTok had significantly higher mDISCERN scores (2 [IQR, 2-3] vs Bilibili: 2 [IQR, 1.75-3]; P = 0.024) and MQ-VET scores (47 [IQR, 43-53.5] vs Kwai: 44.5 [IQR, 38.25-49.75]; P = 0.033) for medical professional videos. Video quality showed a weak correlation with rankings. And the GQS scores of short videos in China in 2024 decreased compared with that in 2023 (2 [IQR, 2-3] vs 3 [IQR, 2-4]; P = 0.009). Additionally, in 2024, both medical and non-medical practitioners' videos on the Chinese TikTok platform exhibited lower quality and reliability compared to their counterparts in the USA.
    CONCLUSION: Pancreatic cancer-related short videos are of medium quality and low reliability, particularly on Chinese platforms. Videos from medical professionals are more trustworthy. There is a need for better curation and algorithms to ensure accurate health information dissemination and to enhance public understanding and management of pancreatic cancer.
    Keywords:  Cross-sectional study; Health education; Pancreatic cancer; Quality evaluation; Short video
    DOI:  https://doi.org/10.1186/s12885-025-14825-2
  26. Arthrosc Sports Med Rehabil. 2025 Aug;7(4): 101195
       Purpose: To evaluate TikTok videos related to knee injuries, examining the accuracy and sources of the content, the category of information provided, the reach of the videos, and the capability of the videos to cause harm.
    Methods: On September 1, 2024, TikTok was queried using layperson's terms for acute knee injuries (e.g., "ACL tear") to identify popular hashtags. The top 10 videos per hashtag and 5 videos per search term (e.g., "knee pop") by view count were included if they related to the specified knee injury, surgery, or recovery process. Videos with fewer than 1,000 views were excluded. Metrics such as number of likes, number of views, number of comments, creator demographic characteristics, and video content type were collected, and videos were evaluated for quality using the DISCERN scoring system.
    Results: A total of 234 TikTok videos related to knee injuries were analyzed, averaging 699,235 views per video (median, 138,500 views). DISCERN analysis revealed that 41% of videos were rated as poor whereas 59% were satisfactory. Videos featuring medical recommendations had significantly higher engagement scores (mean, 5.16; 95% confidence interval [CI], 3.49-6.83; P = .001) and longer durations (mean, 53.38 seconds; 95% CI, 44.47-62.28 seconds; P = .002) than those without recommendations (mean score, 3.18 [95% CI, 2.84-3.52]; mean duration, 38.10 seconds [95% CI, 33.18-43.01 seconds]). Satisfactory videos outperformed poor-quality videos across DISCERN metrics, with clearer aims (mean, 3.91 vs 2.72; P < .001), greater relevance (mean, 3.20 vs 2.09; P < .001), and more balanced information (mean, 1.60 vs 1.00; P < .001). Physicians created 24.4% of the videos, which generally scored higher according to the DISCERN criteria than videos created by non-health care-related professionals.
    Conclusions: Videos created by health care professionals, particularly physicians, scored higher in terms of educational quality but accounted for a small proportion of total content. In contrast, nonphysician creators frequently provided inaccurate or incomplete information. Despite this, videos with medical recommendations achieved higher engagement.
    Clinical Relevance: There is a predominance of nonphysician creators disseminating inaccurate medical information, underscoring the need for orthopaedic surgeons to engage in digital health education to provide reliable content. This study provides insights into the digital content patients may be consuming regarding their medical conditions.
    DOI:  https://doi.org/10.1016/j.asmr.2025.101195
  27. Digit Health. 2025 Jan-Dec;11:11 20552076251382029
       Objective: Irritable Bowel Syndrome (IBS) is a prevalent functional gastrointestinal disorder that significantly impairs quality of life. Social media has become a primary source of health information, with TikTok and Bilibili emerging as popular video-sharing platforms in China. The present study aims to assess the content, quality, and user engagement of IBS-related videos on Bilibili and TikTok in China, identifying strengths and weaknesses in online health information and offering insights for improving patient education.
    Methods: Hundred qualified videos from each platform were analyzed. Videos were categorized by platforms, sources, contents, and quality (using Global Quality Score and modified DISCERN scoring systems). Statistical analyses included correlation analysis and linear regression to assess relationships between video quality and engagement metrics.
    Results: Videos showed an increasing trend from 2020 to 2024. TikTok videos were shorter (median 61.0 s vs. 220.0 s) with higher engagement. Healthcare professionals (77.0%) and science communicators (15.5%) were primary content creators. Linear regression revealed significant platform differences: on TikTok, higher quality videos had negative associations with likes and comments; on Bilibili, higher quality showed positive associations with all engagement metrics (likes, favorites, shares, and comments).
    Conclusions: Platform-specific differences exist in how users engage with IBS-related content on Chinese social media platforms. The divergent relationship between video quality and engagement metrics on TikTok versus Bilibili suggests that content creators may need platform-specific strategies to effectively deliver high-quality health information.
    Keywords:  IBS; functional gastrointestinal disorder; health information; online videos; social media
    DOI:  https://doi.org/10.1177/20552076251382029
  28. JMIR Cancer. 2025 Sep 23. 11 e73455
       Background: Radiotherapy (RT) is a crucial modality in cancer treatment. In recent years, the rise of short-form video platforms has transformed how the public accesses medical information. TikTok and Bilibili, as leading short-video platforms, have emerged as significant channels for disseminating health information. However, there is an urgent need to evaluate the quality and reliability of the information related to RT available on these platforms.
    Objective: This study aims to systematically assess the information quality and reliability of RT-related short-form videos on TikTok and Bilibili platforms using the Global Quality Score (GQS) and a modified DISCERN (mDISCERN) evaluation tool, thereby elucidating the current landscape and challenges of digital health communication.
    Methods: This study systematically retrieved the top 100 RT-related videos on TikTok and Bilibili as of February 25, 2025. The quality of the videos was assessed using the GQS (1-5 points) and an mDISCERN scoring system (1-5 points). Statistical analyses were conducted using the Mann-Whitney U test, as well as Spearman and Pearson correlation analyses, to ensure the reliability and validity of the results.
    Results: A total of 200 short-form videos related to RT were analyzed, revealing that the overall quality of videos on TikTok and Bilibili is unsatisfactory. Specifically, the median GQS for TikTok was 4 (IQR 3-4), while for Bilibili, it was 3 (IQR 3-4). The median mDISCERN scores for both platforms were 3 (IQR 2-4 and 3-4, respectively), and no significant differences were observed between the 2 platforms regarding the GQS (P=.12) and mDISCERN score (P=.10). On TikTok, 53% (53/100) of videos had a GQS of 4 or higher ("good" quality or better). On Bilibili, 45% (45/100) of videos had an mDISCERN score of 4 or higher, indicating "relatively reliable" quality. Videos produced by professionals, institutions, and nonprofessional institutions had significantly higher mDISCERN scores than those made by patients, with statistical significance (P<.001, P<.001, and P<.01, respectively). Furthermore, the correlations between the number of bookmarks and video duration, with mDISCERN scores, were 0.172 (P=.02) and 0.192 (P=.007), respectively. However, no video variables were found to predict the overall quality and reliability of the videos effectively.
    Conclusions: This study revealed that the overall quality of RT-related videos on TikTok and Bilibili is generally low. However, videos uploaded by professionals demonstrate higher information quality and reliability, providing valuable support for patients seeking guidance on health care management and treatment options for cancers. Therefore, improving the quality and reliability of video content, particularly that produced by patients, is crucial for ensuring that the public has access to accurate medical information.
    Keywords:  Bilibili; DISCERN score; Global Quality Score; RT; TikTok; information quality; radiotherapy; short-form videos; social media
    DOI:  https://doi.org/10.2196/73455
  29. BMC Oral Health. 2025 Sep 26. 25(1): 1434
       BACKGROUND: The labial frenulum plays a crucial role in oral health, and its surgical removal, known as frenectomy, is often necessary for various dental and orthodontic conditions. This study evaluated the quality and educational value of YouTube videos on labial frenectomy, highlighting potential misinformation risks in online health resources.
    METHODS: A cross-sectional study was conducted via the search term "labial frenectomy" on YouTube, and 68 relevant videos were identified on the basis of predefined criteria. Content quality and credibility were evaluated via the total content score (TCS), video information and quality index (VIQI), modified DISCERN, global quality scale (GQS), Journal of the American Medical Association (JAMA), and Health on the Net Code (HONcode). The data were analyzed with the Kolmogorov‒Smirnov, Kruskal‒Wallis, and Bonferroni correction. Numerical variables were examined via Spearman's rho and point‒biserial correlation, whereas categorical variables were assessed via chi-square and Fisher's exact tests.
    RESULTS: TCS was positively correlated with GQS (r = 0.392; p < 0.001), VIQI (r = 0.379; p < 0.001), and modified DISCERN (r = 0.396; p < 0.001), indicating that higher-quality videos received better scores. The GQS was strongly correlated with the VIQI (r = 0.747), modified DISCERN (r = 0.711), and HONcode (r = 0.721; p < 0.001). A significant association was observed between TCS and presenter gender (p = 0.03), with higher-quality content more frequently found in videos featuring female presenters.
    CONCLUSIONS: YouTube videos on labial frenectomy generally provide low- to moderate-quality content and often lack reliability and medical accuracy. While they offer basic educational value, they should not replace professional guidance. Healthcare professionals should direct patients toward expert-reviewed, evidence-based sources to ensure comprehensive and accurate medical information.
    Keywords:  E-health; Health education video; Labial frenectomy; Video quality assessment; YouTube
    DOI:  https://doi.org/10.1186/s12903-025-06786-6
  30. Health Informatics J. 2025 Jul-Sep;31(3):31(3): 14604582251381271
      Objective: Accessing reliable medical information online in Germany is often hindered by misinformation and low health literacy. Tala-med, an ad-free search engine, was developed to provide curated, expert-reviewed content with filters for trustworthiness, recency, user-friendliness, and comprehensibility. This study re-engineered the original system to overcome technical limitations while maintaining result consistency. Methods: A modular architecture was designed using Elasticsearch, a fastText-based synonym system, and a subZero-powered admin interface. The system was evaluated using 214 unique queries to compare performance and result similarity with the legacy version. Results: The new implementation improved query processing speed while preserving result consistency. Synonym handling was enhanced using fastText, and system maintainability increased via a centralized database and modular backend. The administrative interface simplified data updates and configuration tasks. Conclusion: The re-engineered tala-med search engine maintains the original system's strengths while enabling greater scalability, flexibility, and future extensibility. The open-source platform offers a foundation for advancing domain-specific search systems and supports applications beyond the medical field.
    Keywords:  databases; health literacy; medical informatics applications; natural language processing; search engine
    DOI:  https://doi.org/10.1177/14604582251381271
  31. J Pediatr Ophthalmol Strabismus. 2025 Sep 26. 1-7
       PURPOSE: To evaluate the quality and accuracy of artificial intelligence (AI)-generated images depicting pediatric ophthalmology pathologies compared to human-illustrated images, and assess the readability, quality, and accuracy of accompanying AI-generated textual information.
    METHODS: This cross-sectional comparative study analyzed outputs from DALL·E 3 (OpenAI) and Gemini Advanced (Google). Nine pediatric ophthalmology pathologies were sourced from the American Association for Pediatric Ophthalmology and Strabismus (AAPOS) "Most Common Searches." Two prompts were used: Prompt A asked large language models (LLMs), "What is [insert pathology]?" Prompt B requested text-to-image generators (TTIs) to create images of the pathologies. Textual responses were evaluated for quality using published criteria (helpfulness, truthfulness, harmlessness; score 1 to 15, ≥ 12: high quality) and readability using Simple Measure of Gobbledygook (SMOG) and Flesch-Kincaid Grade Level (≤ 6th-grade level: readable). Images were assessed for anatomical accuracy, pathological accuracy, artifacts, and color (score 1 to 15, ≥ 12: high quality). Human-illustrated images served as controls.
    RESULTS: DALL·E 3 images were of poor quality (median: 7; range: 3 to 15) and significantly worse than human-illustrated controls (median: 15; range: 9 to 15; P < .001). Pathological accuracy was also poor (median: 1). Textual information from ChatGPT-4o and Gemini Advanced was high quality (median: 15) but difficult to read (Chat-GPT-4o: SMOG: 8.2, FKGL: 8.9; Gemini Advanced: SMOG: 8.5, FKGL: 9.3).
    CONCLUSIONS: Text-to-image generators are poor at generating images of common pediatric ophthalmology pathologies. They can serve as adequate supplemental tools for generating high-quality accurate textual information, but care must be taken to tailor generated text to be readable by users.
    DOI:  https://doi.org/10.3928/01913913-20250724-03