bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–08–17
twenty-one papers selected by
Thomas Krichel, Open Library Society



  1. Malays Fam Physician. 2025 ;20 47
      
    Keywords:  Digital media; Digital technology; Medical library
    DOI:  https://doi.org/10.51866/mol.947
  2. J Clin Med. 2025 Jul 29. pii: 5348. [Epub ahead of print]14(15):
      Background/Objectives: Patients with an ileoanal pouch change their diet to manage their symptoms and will often resort to the internet for nutrition advice. Currently, no evidence-based dietary guidelines exist to inform online resources. Hence, this study aims to assess the quality of online nutrition information directed towards patients with an ileoanal pouch. Methods: A systematic Google search was conducted to identify consumer websites including information on nutrition for those with ileoanal pouches. Quality was assessed using the DISCERN instrument, and the readability of written content was assessed using the Flesch-Kincaid score. A summative content analysis was used to identify the frequency of particular topics. Websites were also assessed against standards from the National Institute for Health and Care Excellence (NICE) framework for shared decision-making support tools. Results: A total of 12 websites met the inclusion criteria. Mean total DISCERN scores across all websites are 33 out of 75, indicating that overall, the websites were of poor quality. The mean Flesch-Kincaid score was 57 out of 100, or "fairly difficult" in terms of readability. The main themes according to the content analysis were "general dietary advice for pouch", "dietary strategies for symptom management", "addressing risks associated with having a pouch", and "optimisation of nutritional intake". Overall, websites did not meet the standards for shared decision-making. Conclusions: Online nutrition information for patients with an ileoanal pouch is of poor quality and difficult to understand. There is a need for higher quality online resources for these patients, ideally co-produced with a multidisciplinary team and patient, to provide patients with good quality, understandable, and accessible nutrition information.
    Keywords:  diet; ileoanal pouch; nutrition; online
    DOI:  https://doi.org/10.3390/jcm14155348
  3. Transl Androl Urol. 2025 Jul 30. 14(7): 1959-1977
       Background: The increasing incidence and prevalence of urinary tract infections (UTI) align with the internet's rise to mainstream usage. Today's new digital era has obliged patients to engage in online content relating to their health and well-being, including urological conditions. These recent shifts led this study to assess the quality and readability levels of UTI articles and YouTube videos with the highest online engagement in both English and Spanish.
    Methods: Google Trends was queried to analyze the popularity trends of the terms "UTI" and "infeccion urinaria" from November 2018 to November 2023. The DISCERN tool was utilized to assess the quality of the most popular UTI articles and videos in English and Spanish, determined by the content analyzer platform of BuzzSumo. The same materials had their readability levels assessed by two validated readability metrics in each language.
    Results: Google Trends found "UTI" and "infeccion urinaria" were top search terms with average search volume index values of 79 and 81, respectively. For the DISCERN quality ratings, a small number of articles and videos in both languages were considered to be of high-quality content (with minimal shortcomings). Uniformly, the readability levels calculated across the four readability metrics displayed that most UTI online materials are being created above the recommended 6th-grade comprehension level for patients.
    Conclusions: Despite high demand, online UTI treatment information in English and Spanish falls short in both quality and readability. As such, patients could greatly benefit from insights provided by their healthcare providers concerning reputable and superior online sources of information.
    Keywords:  Urinary tract infection (UTI); infeccion urinaria; internet; readability
    DOI:  https://doi.org/10.21037/tau-2025-221
  4. J Eval Clin Pract. 2025 Aug;31(5): e70238
       AIM: This study aimed to evaluate the accuracy, readability, and safety of ChatGPT-4.0's responses to frequently asked questions (FAQs) related to orthopaedic trauma and to examine whether readability is associated with the quality and reliability of content.
    METHODS: Ten common patient questions related to orthopaedic emergencies were submitted to ChatGPT-4.0. Each response was assessed independently by three orthopaedic trauma surgeons using a 4-point ordinal scale for accuracy, clinical appropriateness, and safety. Readability was calculated using the Flesch-Kincaid Grade Level (FKGL). Inter-rater agreement was analysed using intraclass correlation coefficients (ICC). The presence of disclaimers was also recorded.
    RESULTS: ChatGPT-4.0's responses had a mean FKGL score of 10.5, indicating high school-level readability. Stratified analysis showed comparable readability scores across response quality categories: excellent (10.0), poor (9.8), and dangerous (10.1), suggesting that readability does not predict content reliability. Accuracy and safety scores varied considerably among responses, with the highest inter-rater agreement in clinical appropriateness (ICC = 0.81) and the lowest in safety assessments (ICC = 0.68). Notably, nine out of 10 responses included a disclaimer indicating the nonprofessional nature of the content, with one omission observed in a high-risk clinical scenario.
    CONCLUSION: Although ChatGPT-4.0 provides generally readable responses to orthopaedic trauma questions, readability does not reliably distinguish between accurate and potentially harmful information. These findings highlight the need for expert review when using AI-generated content in clinical communication.
    Keywords:  artificial intelligence; health education; natural language processing; orthopaedic procedures; readability
    DOI:  https://doi.org/10.1111/jep.70238
  5. Am J Otolaryngol. 2025 Aug 06. pii: S0196-0709(25)00120-6. [Epub ahead of print]46(5): 104717
       BACKGROUND: Low health literacy among patients hinders comprehension of care instructions and worsens outcomes, yet most otolaryngology patient materials and chatbot responses to medical inquiries exceed the recommended reading level of sixth- to eighth-grade. Whether chatbots can be pre-programmed to provide accurate, plain-language responses has yet to be studied. This study aims to compare response readability of a GPT model customized for plain language with GPT-4 when answering common otolaryngology patient questions.
    METHODS: A custom GPT was created and provided thirty-three questions from Polat et al. (Int J Pediatr Otorhinolaryngol., 2024), and their GPT-4 answers were reused with permission. Questions were grouped by theme. Readability was calculated with Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) via online calculator. A board-certified, practicing otolaryngologist assessed content similarity and accuracy. The primary outcome was readability, measured by FKGL (0-18; equivalent to United States grade level) and FRE (0-100; higher scores indicate greater readability).
    RESULTS: The custom GPT reduced FKGL by an average of 4.2 grade levels (95 % confidence interval [CI]: 3.2, 5.1; p < 0.001) and increased FRE by an average of 17.3 points (95 % CI: 12.5, 21.7; p < 0.001). Improvements remained significant in three of four theme subgroups (p < 0.05). Readability was consistent across question types, and variances were equal between models. Expert review confirmed overall accuracy and content similarity.
    CONCLUSION: Preprogramming a custom GPT to generate plain-language instructions yields outputs that meet Centers for Medicare & Medicaid Services readability targets without significantly compromising content quality. Tailored chatbots could enhance patient communication in otolaryngology clinics and other medical settings.
    Keywords:  Artificial intelligence; Plain language; Readability; custom GPT
    DOI:  https://doi.org/10.1016/j.amjoto.2025.104717
  6. Cleft Palate Craniofac J. 2025 Aug 12. 10556656251364141
      ObjectiveIt is important for parents and caregivers of children with cleft lip and/or palate to easily understand educational resources provided by the American Cleft Palate-Craniofacial Association (ACPA). This study aimed to assess the readability, understandability, and actionability of ACPA resources, and the utility of large language models (LLMs) in their improvement.DesignACPA resources were collected and assessed using a readability calculator and 2 readers assessed using the Patient Educational Material Assessment Tool (PEMAT). Each resource was modified by 3 LLMs (GPT-4o, Gemini, and Copilot) and their outputs reassessed for improvement.Main Outcome MeasuresAverage reading level across Flesch-Kincaid Grade Level, Gunning Fog Index, SMOG Index, and Coleman-Liau Index; PEMAT understandability and actionability.ResultsACPA educational materials are written at an average grade level of 10.7 ± 1.8 with a PEMAT understandability of 82.4 ± 7.0% and actionability of 42.3 ± 17.4%. Modification by LLMs decreased average reading level to 8.0 ± 1.2 (P = .0002) with a PEMAT understandability of 80.5 ± 7.9% (P = .5371) and actionability of 40.8 ± 18.2% (P = .6709). And 38.5% of ACPA resources included visual aids or illustrations and 42.3% provided explicit, actionable steps that parents and caregivers could take.ConclusionsAlthough ACPA educational resources are high quality, they are written at reading levels nearly meeting NIH guidelines but not meeting AMA guidelines. LLMs prove valuable to improve readability without diminishing understandability or actionability.
    Keywords:  artificial intelligence; cleft lip; cleft lip and palate; cleft palate
    DOI:  https://doi.org/10.1177/10556656251364141
  7. Ann Med Surg (Lond). 2025 Aug;87(8): 4835-4840
       Objective: This study aimed to evaluate and compare the performance of three large language models (LLMs)-ChatGPT o1-preview, Claude 3.5 Sonnet, and Gemini 1.5 Pro-in providing information on endoscopic lumbar surgery based on 10 frequently asked patient questions.
    Methods: The 10 high-frequently asked patient questions about endoscopic lumbar surgery were selected through discussion among authors. These questions were then submitted to the three LLMs. Responses were evaluated by five spine surgeons using a 5-point Likert scale for overall quality, text readability, content relevance, and humanistic care. Additionally, five non-medical volunteers assessed the understandability and satisfaction of the responses.
    Results: The intraclass correlation coefficients of ChatGPT o1-preview, Claude 3.5 Sonnet, and Gemini 1.5 Pro of the five evaluators were 0.522, 0.686, and 0.512, respectively. Claude 3.5 Sonnet received the highest scores for overall quality (4.86 ± 0.35, P <0.001), text readability (4.91 ± 0.32, P <0.001), and content relevance (4.78 ± 0.42, P <0.001). ChatGPT o1-preview was the most approved by non-medical background volunteers (49%), followed by Gemini 1.5 Pro (29%) and Claude 3.5 Sonnet (22%).
    Conclusion: From the perspective of professional surgeons, Claude 3.5 Sonnet provided the highest quality and most relevant information. However, ChatGPT o1-preview was more understandable and satisfactory for non-professional users. This study not only highlights the potential of LLMs in patient education but also emphasizes the need for careful consideration of their role in medical practice, including technical limitations and ethical issues.
    DOI:  https://doi.org/10.1097/MS9.0000000000003519
  8. Sci Rep. 2025 Aug 14. 15(1): 29871
      Artificial Intelligence's (AI) role in providing information on Celiac Disease (CD) remains understudied. This study aimed to evaluate the accuracy and reliability of ChatGPT-3.5 in generating responses to 20 basic CD-related queries. This study assessed ChatGPT-3.5, the dominant publicly accessible version during the study period, to establish a benchmark for AI-assisted CD education. The accuracy of ChatGPT's responses to twenty frequently asked questions (FAQs) was assessed by two independent experts using a Likert scale, followed by categorization based on CD management domains. Inter-rater reliability (agreement between experts) was determined through cross-tabulation, Cohen's kappa, and Wilcoxon signed-rank tests. Intra-rater reliability (agreement within the same expert) was evaluated using the Friedman test with post hoc comparisons. ChatGPT demonstrated high accuracy in responding to CD FAQs, with expert ratings predominantly ranging from 4 to 5. While overall performance was strong, responses to management strategies excelled compared to those related to disease etiology. Inter-rater reliability analysis revealed moderate agreement between the two experts in evaluating ChatGPT's responses (κ = 0.22, p-value = 0.026). Although both experts consistently assigned high scores across different CD management categories, subtle discrepancies emerged in specific instances. Intra-rater reliability analysis indicated high consistency in scoring for one expert (Friedman test=0.113), while the other exhibited some variability (Friedman test<0.001). ChatGPT exhibits potential as a reliable source of information for CD patients, particularly in the domain of disease management.
    Keywords:  Accuracy; Artificial intelligence; Celiac disease; ChatGPT; Reliability
    DOI:  https://doi.org/10.1038/s41598-025-15898-6
  9. J Burn Care Res. 2025 Aug 13. pii: iraf158. [Epub ahead of print]
       INTRODUCTION: Patients from low socioeconomic status (SES) backgrounds face barriers to quality burn care, such as limited healthcare access and follow-up. Many turn to online resources like Google, which may provide overwhelming or irrelevant information. This study compares the accuracy, readability, and SES-relevance of burn care information from ChatGPT and Google to address these disparities.
    METHODS: A standardized set of questions on immediate burn care, medical treatments, and long-term care was developed based on clinical guidelines. Responses from ChatGPT (v4.0) and the first Google search result were analyzed. Two medical students and two burn surgeons assessed accuracy using the Global Quality Score (GQS) on a scale of 1 (poor) to 5 (excellent). Readability was measured using the Flesch-Kincaid grade level, and SES-relevance was determined by counting responses that included themes related to affordability and access to care. Accuracy, readability, and SES-relevance were then compared using a Wilcoxon signed-rank test.
    RESULTS: ChatGPT provided higher-quality responses (GQS 4.35 ± 0.60) than Google (GQS 2.25 ± 1.10, p<.01). ChatGPT was unanimously preferred for half of the questions. Both platforms had grade levels 8-9, but ChatGPT addressed SES issues in 74% of responses, compared to Google's 33%.
    CONCLUSIONS: ChatGPT outperformed Google in providing accurate, SES-relevant burn care information. AI tools like ChatGPT may help reduce health information disparities for low SES patients by offering tailored and user-friendly guidance. Future studies should validate these findings across other clinical topics and patient populations.
    Keywords:  AI in healthcare; ChatGPT; Google; burn care disparities; healthcare accessibility; socioeconomic status
    DOI:  https://doi.org/10.1093/jbcr/iraf158
  10. Cureus. 2025 Jul;17(7): e87920
       OBJECTIVES: Eye-related conditions are a prevalent issue that continues to grow worldwide, affecting the sight of at least 2.2 billion individuals globally. Many patients may have questions or concerns that they bring to the internet before their healthcare provider, which can impact their health behavior. With the popularity of large language model (LLM)-based artificial intelligence (AI) chat platforms, like ChatGPT, there needs to be a better understanding of the suitability of their generated content. We aim to evaluate ChatGPT for the accuracy, comprehensiveness, and readability of its responses to ophthalmology-related medical inquiries.
    METHODOLOGY: Twenty-two ophthalmology patient questions were generated based on commonly searched symptoms on Google Trends and used as inputs on ChatGPT. Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) formulas were used to evaluate response readability. Two English-speaking, board-certified ophthalmologists evaluated the accuracy, comprehensiveness, and clarity of the responses as proxies for appropriateness. Other validated tools, including QUEST, DISCERN, and an urgency scale, were used for additional quality metrics. Responses were analyzed using descriptive statistics and comparative tests.  Results: All responses scored a 2.0 for QUEST Tone and 1.0 for Complementarity. DISCERN Uncertainty had a mean of 3.86 ± 0.48, with no responses receiving a 5. Urgency to seek care scores averaged 2.45 ± 0.60, with only the narrow-angle glaucoma response prompting an ambulance call. Readability scores resulted in a mean FRE of 45.3 ± 9.98 and FKGL of 10.1 ± 1.74. These quality assessment scores showed no significant differences between categories of conditions. The ophthalmologists' reviews rated 15/22 (68.18%) of responses as appropriate. The mean scores for accuracy, comprehensiveness, and clarity were 4.41 ± 0.73, 4.89 ± 0.32, and 4.55 ± 0.63, respectively, with comprehensiveness ranking significantly higher than the other aspects (P < 0.01). The responses for glaucoma and cataract had the lowest appropriateness ratings.
    CONCLUSIONS: ChatGPT generally demonstrated appropriate responses to common ophthalmology questions, with high ratings for comprehensiveness, clarity, and support for medical professional follow-up. Performance did vary by conditions, with weaker appropriateness in responses related to glaucoma and cataract.
    Keywords:  armd; cataract lens; chat gpt; general ophthalmology; large language models (llm)
    DOI:  https://doi.org/10.7759/cureus.87920
  11. Am Surg. 2025 Aug 11. 31348251367031
      BackgroundChatbots and large language models, particularly ChatGPT, have led to an increasing number of studies on the potential for chatbots in patient education. In this systematic review, we aimed to provide a pooled assessment of the appropriateness and accuracy of chatbot responses in patient education across various medical disciplines.MethodsThis was a PRISMA-compliant systematic review and meta-analysis. PubMed and Scopus were searched from January-August 2023. Eligible studies that assessed the utility of chatbots in patient education were included. Primary outcomes were the appropriateness and quality of chatbot responses. Secondary outcomes included readability and concordance with published guidelines and Google searches. A random-effect proportional meta-analysis was used for pooling data.ResultsFollowing initial screening, 21 studies were included. The pooled rate of appropriateness of chatbot answers was 89.1% (95%CI: 84.9%-93.3%). ChatGPT was the most assessed chatbot. Responses, while accurate, were found to be at a college reading level as the weighted mean Flesh-Kincaid Grade Level was 13.1 (95%CI: 11.7-14.5) and the weighted mean Flesch Reading Ease Score was 38.6 (95%CI: 29- 48.2). Answers of chatbots to questions relevant to patient education had 78.6%-95% concordance with published guidelines in colorectal surgery and urology. Chatbots had higher patient education scores (87% vs 78%) than Google Search.ConclusionsChatbots provide largely accurate and appropriate answers for patient education. The advanced reading level of chatbot responses might be a limitation to their wide adoption as a source for patient education. However, they outperform traditional search engines and align well with professional guidelines, showcasing their potential in patient education.
    Keywords:  artificial intelligence; chatbots; meta-analysis; patient education; systematic review
    DOI:  https://doi.org/10.1177/00031348251367031
  12. JMIR Form Res. 2025 Aug 13. 9 e73642
       Background: Chemical ocular injuries are a major public health issue. They cause eye damage from harmful chemicals and can lead to severe vision loss or blindness if not treated promptly and effectively. Although medical knowledge has advanced, accessing reliable and understandable information on these injuries remains a challenge. This is due to unverified web-based content and complex terminology. Artificial intelligence tools like ChatGPT provide a promising solution by simplifying medical information and making it more accessible to the general public.
    Objective: This study aims to assess the use of ChatGPT in providing reliable, accurate, and accessible medical information on chemical ocular injuries. It evaluates the correctness, thematic accuracy, and coherence of ChatGPT's responses compared with established medical guidelines and explores its potential for patient education.
    Methods: A total of 9 questions were entered into ChatGPT regarding various aspects of chemical ocular injuries. These included the definition, prevalence, etiology, prevention, symptoms, diagnosis, treatment, follow-up, and complications. The responses provided by ChatGPT were compared with the International Classification of Diseases-9 and International Classification of Diseases-10 guidelines for chemical (alkali and acid) injuries of the conjunctiva and cornea. The evaluation focused on criteria such as correctness, thematic accuracy, and coherence to assess the accuracy of ChatGPT's responses. The inputs were categorized into 3 distinct groups, and statistical analyses, including Flesch-Kincaid readability tests, ANOVA, and trend analysis, were conducted to assess their readability, complexity, and trends.
    Results: The results showed that ChatGPT provided accurate and coherent responses for most questions about chemical ocular injuries, demonstrating thematic relevance. However, the responses sometimes overlooked critical clinical details or guideline-specific elements, such as emphasizing the urgency of care, using precise classification systems, and addressing detailed diagnostic or management protocols. While the answers were generally valid, they occasionally included less relevant or overly generalized information. This reduced their consistency with established medical guidelines. The average Flesch Reading Ease Score was 33.84 (SD 2.97), indicating a fairly challenging reading level, while the Flesch-Kincaid Grade Level averaged 14.21 (SD 0.97), suitable for readers with college-level proficiency. The passive voice was used in 7.22% (SD 5.60%) of sentences, indicating moderate reliance. Statistical analysis showed no significant differences in the Flesch Reading Ease Score (P=.38), Flesch-Kincaid Grade Level (P=.55), or passive sentence use (P=.60) across categories, as determined by one-way ANOVA. Readability remained relatively constant across the 3 categories, as determined by trend analysis.
    Conclusions: ChatGPT shows strong potential in providing accurate and relevant information about chemical ocular injuries. However, its language complexity may prevent accessibility for individuals with lower health literacy and sometimes miss critical aspects. Future improvements should focus on enhancing readability, increasing context-specific accuracy, and tailoring responses to a person's needs and literacy levels.
    Keywords:  ChatGPT; ICD-10; ICD-9; artificial intelligence; chemical eye injuries; medical information; ophthalmology; patient education; readability
    DOI:  https://doi.org/10.2196/73642
  13. Eur Arch Otorhinolaryngol. 2025 Aug 13.
       OBJECTIVE: This study aims to evaluate online patient education materials on retrograde cricopharyngeal dysfunction (RCPD) by comparing the readability, understandability, and quality of content generated by large language models (LLM).
    METHOD: A web search in December 2024 evaluated 51 online resources and four LLMs (ChatGPT 4.0, Gemini 1.5 Flash, Perplexity GPT-3.5, DeepSeek-V2.5). Readability was analyzed using Readable.io, understandability actionability was assessed using PEMAT, and information quality was assessed using DISCERN.
    RESULTS: The average readability level of the online material and the LLM responses was at the 11th-12th grade level. The Flesch Reading Ease score was lowest for the LLMs, especially for the DeepSeek-V2.5 model (24.21). While PEMAT understandability scores were adequate for online (82%) and LLMs (79%), actionability was low across all groups (25-37%). DISCERN analyses showed that both sources of information were of limited quality in supporting treatment decisions.
    CONCLUSION: This study revealed that both online and LLM-generated materials on RCPD exceeded the recommended readability levels. Although the materials demonstrated acceptable understandability, they exhibited low actionability and inadequate overall quality, emphasizing the need for more patient-centered digital health communication.
    Keywords:  Artificial intelligence; Deglutition disorders; Health literacy; Patient education; Readability
    DOI:  https://doi.org/10.1007/s00405-025-09628-x
  14. BMC Med Educ. 2025 Aug 11. 25(1): 1157
       BACKGROUND: The quality and reliability of health-related content on YouTube remain a growing concern. This study aimed to evaluate tonsillectomy-related YouTube videos using a multi-method framework that combines human expert review, large language model (ChatGPT-4) analysis, and transcript readability assessment.
    METHODS: A total of 76 English-language YouTube videos were assessed. Two otolaryngologists independently rated video quality using the DISCERN instrument and JAMA benchmarks. Corrected transcripts were evaluated by ChatGPT-4 (May 2024 version) for accuracy and completeness. Spearman correlations and regression analyses were used to explore associations between human and AI evaluations. Videos were also categorized as transcript-heavy or visually rich to examine the effect of visual presentation.
    RESULTS: Professional videos consistently outperformed patient-generated content in quality metrics. ChatGPT-4 accuracy scores showed a strong correlation with JAMA ratings (ρ = 0.56), while completeness was strongly associated with DISCERN scores (ρ = 0.72). Visually rich videos demonstrated significantly higher AI accuracy than transcript-heavy videos (Cohen's d = 0.600, p = 0.030), suggesting that visual context may enhance transcript-based interpretation. However, the average transcript readability (FKGL = 8.38) exceeded the recommended level for patient education.
    CONCLUSION: Tonsillectomy-related YouTube content varies widely in quality. Human-AI alignment supports the use of large language models for preliminary content screening. Visually enriched content may improve AI interpretability, while readability concerns highlight the need for more accessible educational resources. Multimodal evaluation and design should be prioritized in future digital health content.
    Keywords:  Health information quality; Large language models; Tonsillectomy; YouTube
    DOI:  https://doi.org/10.1186/s12909-025-07739-x
  15. Rev Assoc Med Bras (1992). 2025 ;pii: S0104-42302025000700701. [Epub ahead of print]71(7): e20250140
      
    DOI:  https://doi.org/10.1590/1806-9282.20250140
  16. J Med Internet Res. 2025 Aug 14. 27 e55360
       BACKGROUND: With the continuous advancement of science and technology, the demand for health knowledge about pediatric orthopedics is also gradually growing. The traditional paper-based and multimedia health education models can no longer fully meet the needs of society. Fortunately, the emergence of social media has mitigated the problem of insufficient medical education resources. However, there is currently relatively little published evidence on the use of social media in pediatric orthopedics.
    OBJECTIVE: This study aimed to examine the current applications of social media in pediatric orthopedics and to evaluate the quality and readability of related online health information. Its purpose is to provide relevant evidence to promote the understanding and development of the field.
    METHODS: This review followed the methodological framework of Arksey and O'Malley and the Joanna Briggs Institute reviewer manual. First, a literature search was performed in the PubMed, Embase, CINAHL, Web of Science, and Cochrane databases. The search time range was from the establishment of the databases to September 21, 2023. We endeavored to include research articles related to social media and involving pediatric orthopedics in the review. The literature was reviewed at the title, abstract, and full-text levels.
    RESULTS: We included 35 of 3400 (1.03%) studies retrieved. Most of the articles used social media to help with medical staff and patient education and training (23/35, 66%) and to disseminate information (21/35, 60%), followed by helping medical staff collect data (8/35, 23%). Medical institutions and staff also used social media to increase attention (6/35, 17%), enhance social support (5/35, 14%), facilitate the recruitment of research participants (3/35, 9%), support professional development (3/35, 9%) and implement health intervention (2/35, 6%). Five general quality of information (QOI) tools, 7 specific QOI tools, and 6 readability tools were used in 12 studies analyzed for quality and readability, with overall quality being fair and readability exceeding the recommended level. According to the research data, people are increasingly interested in pediatric orthopedics on social media platforms and eager to obtain and learn relevant knowledge.
    CONCLUSIONS: This scoping review found that social media has a growing body of literature on pediatric orthopedic conditions and is playing an increasingly important role in knowledge dissemination and education. A variety of tools are being used for assessing the QOI, but little attention has been paid to the readability of the information. The QOI was largely fair, with readability above the recommended level. Future research should further explore the role of social media in pediatric orthopedics and continue to optimize QOI and information readability.
    Keywords:  health education; information dissemination; pediatric orthopedics; smart medicine; social media
    DOI:  https://doi.org/10.2196/55360
  17. Arch Orthop Trauma Surg. 2025 Aug 11. 145(1): 404
       INTRODUCTION: With the rise in robotic-assisted surgery, platforms like YouTube have become popular for patient education. Robotic total knee replacement (RTKR) is frequently featured, but the quality of content remains uncertain. This study evaluated the quality and educational value of YouTube videos on RTKR using standardized scoring systems.
    MATERIALS AND METHODS: A total of 100 videos related to robotic total knee replacement were identified through YouTube searches, and 38 of them were included in the study. Video characteristics, video sources, and video themes were recorded. Quality and content were assessed using DISCERN, JAMA Benchmark, Global Quality Score (GQS), and the Robotic Total Knee Replacement Score (RTKRS). The RTKRS scoring system was used to investigate the differences between robotic knee replacement and standard knee replacement.
    RESULTS: The median scores were 28.25 for DISCERN, 2 for JAMA, 2 for GQS, and 1 for RTKRS. RTKRS was lower in patient-sourced videos than in physician- and speaker-sourced videos (p < 0.05). General knowledge-themed videos had higher RTKRS scores than patient testimony videos (p = 0.010). A negative correlation was found between view count and RTKRS, while video duration correlated positively with GQS. Only 24% of videos addressed differences in patient satisfaction. 21% discussed potential differences in complication rates, while only 13% covered prosthesis survival. In contrast, 82% mentioned alignment differences, and just 11% addressed cost differences.
    CONCLUSIONS: Despite the increasing accessibility of robotic surgery information online, the quality of YouTube videos on robotic total knee replacement was generally low. Patient-generated content was particularly lacking in educational value, while professionally produced general information videos demonstrated better quality scores. Critical topics such as complication rates, prosthesis longevity, and patient satisfaction were underrepresented, suggesting a need for improved and more balanced online educational resources.
    Keywords:  DISCERN; GQS; Patient education; Robotic total knee replacement; Video quality; YouTube
    DOI:  https://doi.org/10.1007/s00402-025-06024-2
  18. Digit Health. 2025 Jan-Dec;11:11 20552076251366390
       Background: Knee osteoarthritis (KOA), a prevalent degenerative joint disease, burdens global health. Amid the digital era, patients increasingly seek KOA-related information on TikTok and Bilibili, but its quality is scarcely studied, raising accuracy, and reliability concerns.
    Aim: To systematically evaluate the reliability and quality of KOA educational videos on TikTok and Bilibili using validated tools (modified DISCERN and Global Quality Score, GQS), and to analyze associations between content quality, uploader types, and user engagement metrics.
    Methods: Using "Knee Osteoarthritis" as the keyword, the top 100 videos from each platform were retrieved. After excluding duplicates and irrelevant videos, 164 were analyzed. Videos were classified by uploader type and content. Two senior orthopedic physicians evaluated their reliability and quality via a modified DISCERN tool and GQS. Nonparametric statistical methods were applied for data analysis.
    Results: Bilibili had a significantly higher proportion of high-quality videos (GQS ≥4: 38.0% vs. 11.8%; DISCERN ≥4: 49.3% vs. 24.7%, P < 0.05). Professional institutions' videos ranked highest, while TikTok was mostly run by professional uploaders (with medical or healthcare-related qualifications) (98%). Disease knowledge and treatment were the main content types. Engagement metrics were intercorrelated but not with quality scores.
    Conclusion: Bilibili hosted more high-quality KOA videos than TikTok (GQS ≥4: 38.0% vs. 11.8%, DISCERN ≥4: 49.3% vs. 24.7%, P < 0.05), with professional institutions showing the highest reliability. Engagement metrics did not correlate with quality. To mitigate misinformation, targeted strategies-such as platform-specific guidelines for health content and integration of video quality discussions into clinical consultations-are needed.
    Keywords:  Bilibili; DISCERN; Knee osteoarthritis; TikTok; health information quality; short-video platforms
    DOI:  https://doi.org/10.1177/20552076251366390
  19. Arch Osteoporos. 2025 Aug 15. 20(1): 115
      This study evaluated the quality of osteoporosis videos on TikTok, finding that while most are by doctors, the information quality is low. Longer videos tend to have better quality but receive less engagement, highlighting concerns about the suitability of TikTok for medical education.
    BACKGROUND: TikTok has become a significant channel for the general public to access and adopt health information. However, the quality of health content about osteoporosis on TikTok remains underexplored.
    OBJECTIVE: This study aimed to investigate the information quality of osteoporosis videos on TikTok.
    METHODS: We analyzed the first 200 videos related to osteoporosis on TikTok, focusing on 128 videos that met our criteria. The quality of these videos was evaluated using quantitative scoring tools such as the DISCERN instrument and Content Integrity Assessment. Additionally, the correlation between video quality and characteristics, including duration, likes, comments, and shares, was investigated.
    RESULTS: Of the videos analyzed, 93.0% were posted by doctors. Content integrity scores were as follows: definition 0.61 ± 0.77, symptoms 0.34 ± 0.71, evaluation 0.39 ± 0.71, risk factors 0.55 ± 0.65, management 0.82 ± 0.56, and outcomes 1.17 ± 0.75. The average DISCERN score was 36.51 ± 6.87, the majority of videos were rated as poor (71.1%) or fair (22.7%) in quality. DISCERN scores of videos published by doctors were lower than those created by non-professionals (Z = -2.062, P = 0.039). DISCERN scores were significantly correlated with video duration (r = 0.581, P < 0.001). Engagement metrics such as likes, comments, favorites, and shares were highly interrelated (r = 0.855 to 0.901, P < 0.001), but did not correlate with video quality (P > 0.05).
    CONCLUSION: Although the videos about osteoporosis on TikTok are mainly provided by doctors, their quality is low. We found a positive correlation between video duration and video quality. High-quality videos received low attention, while popular videos were of low quality. The medical information on TikTok is currently not rigorous enough to guide patients to make accurate judgments. Due to the low quality and reliability of the information, TikTok is not an appropriate source of knowledge to educate patients.
    Keywords:  Bone health; Health information; Osteoporosis; Patient education; TikTok; Video quality
    DOI:  https://doi.org/10.1007/s11657-025-01597-2
  20. Can J Dent Hyg. 2025 Jun;59(2): 89-97
       Background: Social media platforms such as Instagram have emerged as alternative sources for oral hygiene instructions. This cross-sectional study evaluates the usefulness, understandability, and actionability of Instagram oral hygiene instruction posts.
    Methods: A systematic search of Instagram posts was conducted using the hashtags #dentalhygiene and #oralhygiene. The first 100 posts meeting the inclusion criteria for each hashtag were evaluated using 2 tools: the Oral Hygiene Content Usefulness Score (OHCUS), a newly developed scoring system, and the Patient Education Materials Assessment Tool (PEMAT). The OHCUS assessed the quality and clinical value of posts, while PEMAT evaluated their understandability and actionability. Statistical analysis included the Mann- Whitney U test, Kruskal-Wallis test, and Spearman's correlation.
    Results: Among the 200 posts, 110 were videos and 90 were photos. The average number of likes was 2,981.92 (±9,635.64), and the average number of views for videos was 196,583 (±933,509). Seventy-one percent of posts were educational. The mean usefulness score was 2.37 (±1.94), the mean understandability score was 74.4% (±14.87%), and the actionability score averaged 35.6% (±24.37%).
    Discussion: Posts from oral health professionals, including dental hygienists, were more useful, understandable, and actionable than posts from other sources, with most posts shared by dental clinic accounts.
    Conclusions: Social media, particularly Instagram, has potential as a platform for disseminating oral health education. However, the quality and reliability of the information vary significantly. Posts from oral health professionals, especially dental hygienists, are more beneficial. Enhancing the quality and accuracy of social media content is crucial to maximizing its public health impact.
    Keywords:  dental hygienists; dentist; dentistry; education; oral hygiene; patient education; social media