bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–12–21
thirty-six papers selected by
Thomas Krichel, Open Library Society



  1. Nucleic Acids Res. 2025 Dec 17. pii: gkaf1329. [Epub ahead of print]
    RNAcentral Consortium
      RNAcentral was founded in 2014 to serve as a comprehensive database of non-coding RNA sequences. It began by providing a single unified interface to more specialized resources and now contains 45 million sequences. It has grown beyond providing a single interface to many specialized resources and now provides several services and analyses. These include secondary structure prediction with R2DT, sequence search, and analysis with Rfam. Since its last publication in 2021, RNAcentral has developed two major features. First, literature integration with the development of LitScan and LitSumm. LitScan automatically identifies and links relevant publications to RNA entries, while LitSumm uses natural language processing to generate functional summaries from the literature. Together, these tools address the critical challenge of connecting sequence data with scattered functional knowledge across thousands of publications. Second, RNAcentral has created gene-level entries. Gene-level entries represent a large structural change to RNAcentral. While RNAcentral previously organized data exclusively at the sequence level, we now group related transcripts into gene-centric views. This allows researchers to explore all isoforms, splice variants, and related sequences for a gene in a unified interface, better reflecting biological organization and facilitating comparative analyses. RNAcentral is freely available at https://rnacentral.org.
    DOI:  https://doi.org/10.1093/nar/gkaf1329
  2. Hum Resour Health. 2025 Dec 19.
       BACKGROUND: Tracking country-wide human resources for health (HRH) information is a milestone in the global strategy for HRH 2030, and digitalized HRH information systems have been recommended by the World Health Organization. However, the implementation status differs among countries, and most systematic reviews on this topic have been conducted in high-income countries. This scoping review aimed to identify (1) stages of implementation, (2) functional components, (3) facilitators and barriers, and (4) policy impacts or outcomes of digitalized HRH information systems in low- and middle-income countries (LMICs).
    METHODS: The methodological framework of the Johanna Briggs Institute was used in this scoping review. English articles in two databases (PubMed and Web of Science) with publication dates ranging from inception to August 2023 were gathered, followed by a gray literature search and a reference search. Two author pairs independently performed the study selection. Data were extracted, analyzed, and presented in tabular form alongside a narrative summary.
    RESULTS: Forty studies and gray literature from 26 LMICs in Asia and Africa were included in the scoping review. Thirty-three studies and gray literature covered different stages of digitalized HRH information systems' implementation, including development, pilot, rollout, and maintenance. The HRH registry was the most common, whereas finances and migration were the least common functional components. Thirty-two studies and gray literature reported barriers and facilitators, stratified into four factors and stages. Many barriers were identified in organizational and environmental factors, especially in governance. Interoperability among multiple HRH information systems within a country is the key facilitator, where development partners play a critical role. Sixteen studies and gray literature from nine countries reported positive policy impacts/outcomes. Political commitment, strong national and subnational leadership, and coordination mechanisms among national stakeholders and development partners were key to gaining policy impact.
    CONCLUSIONS: Barriers and facilitators were common across the studies, and governance factors were particularly crucial at all stages of digitalization. Our stratified methodology for analyzing facilitators and barriers can serve as an analytical framework for evaluating HRH information systems in any country. Data on the private sector and migration could be further strengthened as system components.
    Keywords:  Digital technology; Health workforce; Low and middle income countries; Management information systems; Scoping review
    DOI:  https://doi.org/10.1186/s12960-025-01043-x
  3. Eur J Cancer. 2025 Dec 11. pii: S0959-8049(25)01054-8. [Epub ahead of print]233 116168
       INTRODUCTION: Large language models (LLMs) are utilized to answer queries in urology and oncology, yet the performance is limited due to outdated data and missing source transparency, which undermines clinical reliability and therefore adoption.
    MATERIAL AND METHODS: We developed UroBot, a urology-specific chatbot integrating retrieval-augmented generation (RAG) to provide in-line references and source text previews for each response. In a randomized controlled reader study, UroBot and ChatGPT were compared across ten uro-oncological case rounds. Thirty urologists assessed recommendation correctness, source verifiability and trust with preference ratings collected after each round.
    RESULTS: UroBot performed significantly better than ChatGPT in recommendation correctness (73 % vs. 50 %; p < 0.001), source attribution (74 % vs. 30 %; p < 0.001) and verifiability of sources (84 % vs. 35 %; p < 0.001). Furthermore, clinicians consistently preferred UroBot for accuracy, source verifiability and trust. Qualitative analysis showed that ChatGPT often produced vague or incorrect citations, with 28 % being non-existent or outdated and 83 % lacking specific sections, whereas UroBot achieved complete alignment on guideline sub-section and page level. These gains in citation precision were mirrored by higher clinician ratings for verifiability and trust. Limitations include the small sample size of ten cases due to feasibility, which may not cover the full uro-oncological spectrum.
    CONCLUSION: Our findings show that combining LLMs with RAG with in-line references and source text previews markedly enhances perceived source attribution and verifiability compared to state-of-the-art conventional LLMs. Importantly, this approach is readily transferable across medical subspecialties, enabling reliable and up-to-date clinical decision support.
    Keywords:  Chatbot; Explainability; In-line references; Retrieval augmented generation; Source text preview; Verifiability
    DOI:  https://doi.org/10.1016/j.ejca.2025.116168
  4. Jugan Geongang Gwa Jilbyeong. 2025 Dec 11. 18(48): 1952-1966
       Objectives: This study analyzed the current operation of the National Health Information Portal, which provides trustworthy evidence-based health information to the public, to propose directions for improvement in response to the evolving artificial intelligence (AI)-based search environment and increasing demand for personalized health information.
    Methods: To examine Portal operation, we reviewed project reports, internal planning documents, content management system data, Google Analytics 4 statistics, and triennial quality assessment results produced by the Seoul National University R&DB Foundation, which has managed the project since 2018. Relevant policy documents and domestic and international best practices were also analyzed to identify directions for improvement.
    Results: The Portal operates to achieve its core values of "verified, easy-to-understand, and integrated information" through a standardized authoring and review system. It enhances reliability and accessibility by developing life cycle-based customized content, providing plain language explanations and summaries, and establishing an ontology- and metadata-based management framework. User engagement has been strengthened by improvements in user interfaces, newsletters, and interactive events.
    Conclusions: In response to expanding AI-based search availability, the Portal has implemented a Generative Engine Optimization strategy and is exploring strategies to more effectively provide personalized health information in the future, including connectivity with national personal health record systems. The Portal will continue to strengthen its role as a leading public platform that improves health literacy and promotes healthy lifestyle practices in the population.
    Keywords:  Health information; Health literacy; Metadata; National Health Information Portal
    DOI:  https://doi.org/10.56786/PHWR.2025.18.48.3
  5. J Cheminform. 2025 Dec 17.
      The knowledge panels in PubChem allow users to quickly identify and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing the co-occurrences of those entities in a collection of text documents. In the present study, the analysis and summarization techniques used to develop the literature knowledge panels in PubChem were extended to patent documents from the Google Patent Research Data (GPRD) set. The annotations of the patent documents in the GPRD set were mapped to NCBI database records to create the patent co-occurrence data. The annotations were not only from the titles and abstracts of patents but also from other parts such as claims and descriptions, greatly improving the coverage of the co-occurrence-based entity relationships in PubChem. Informativeness weights of entities were introduced in the co-occurrence and relevance score computations to account for a significant variation in the number of matched annotations per patent section. This narrows the focus to the co-occurrences that are more relevant to the subject matter of the patent. The resulting co-occurrence data was used to generate the patent knowledge panels, enabling users to identify entities co-mentioned in patents alongside a specific chemical or gene. The patent co-occurrence data can be downloaded interactively or accessed programmatically. Overall, the patent knowledge panels described in this study provide users with quick access to essential biomedical entities associated with a given PubChem record. Users can delve into relevant patent documents related to these entities or download the underlying co-occurrence data for further exploration and analysis.
    Keywords:  Co-occurrence; Knowledge panel; Patent; PubChem
    DOI:  https://doi.org/10.1186/s13321-025-01134-w
  6. JMIR Form Res. 2025 Dec 17. 9 e84251
       BACKGROUND: Assessment of medical information provided by artificial intelligence (AI) chatbots like ChatGPT and Google's Gemini and comparison with international guidelines is a burgeoning area of research. These AI models are increasingly being considered for their potential to support clinical decision-making and patient education. However, their accuracy and reliability in delivering medical information that aligns with established guidelines remain under scrutiny.
    OBJECTIVE: This study aims to assess the accuracy of medical information generated by ChatGPT and Gemini and its alignment with international guidelines for sepsis management.
    METHODS: ChatGPT and Gemini were asked 18 questions about the Surviving Sepsis Campaign guidelines, and the responses were evaluated by 7 independent intensive care physicians. The responses generated were scored as follows: 3=correct, complete, and accurate; 2=correct but incomplete or inaccurate; and 1=incorrect. This scoring system was chosen to provide a clear and straightforward assessment of the accuracy and completeness of the responses. The Fleiss κ test was used to assess the agreement between evaluators, and the Mann-Whitney U test was used to test for the significance of differences between the correct responses generated by ChatGPT and Gemini.
    RESULTS: ChatGPT provided 5 (28%) perfect responses, 12 (67%) nearly perfect responses, and 1 (5%) low-quality response, with substantial agreement among the evaluators (Fleiss κ=0.656). Gemini, on the other hand, provided 3 (17%) perfect responses, 14 (78%) nearly perfect responses, and 1 (5%) low-quality response, with moderate agreement among the evaluators (Fleiss κ=0.582). The Mann-Whitney U test revealed no statistically significant difference between the two platforms (P=.48).
    CONCLUSIONS: ChatGPT and Gemini both demonstrated potential for generating medical information. Despite their current limitations, both showed promise as complementary tools in patient education and clinical decision-making. The medical information generated by ChatGPT and Gemini still needs ongoing evaluation regarding its accuracy and alignment with international guidelines in different medical domains, particularly in the sepsis field.
    Keywords:  AI; ChatGPT; Gemini; artificial intelligence; chatbots; large language model; medical information; sepsis
    DOI:  https://doi.org/10.2196/84251
  7. Am J Health Syst Pharm. 2025 Dec 19. pii: zxaf350. [Epub ahead of print]
       PURPOSE: Artificial intelligence (AI) is a rapidly growing field within healthcare and is quickly becoming a staple resource for pharmacists for drug information (DI). The primary aim of this study is to determine the accuracy of 2 common AI tools in answering DI questions compared to the accuracy of information provided by University of Michigan Health (UMH) DI pharmacists.
    SUMMARY: A sample of 300 questions answered by the UMH DI service from July 2022 to June 2023 were included in this evaluation. Each DI query was condensed into a single question for input into each of 2 AI chatbot services: OpenAI's ChatGPT and Google's Gemini. Each question was entered as a new conversation in the AI tools, and initial responses were recorded along with any follow-up responses with references. After all questions were answered, the accuracy of each response and quality of references were adjudicated on a 3-point scale. ChatGPT and Gemini were both completely accurate, with reliable references provided, 19% of the time. Fifteen percent of answers by ChatGPT were completely inaccurate and included fake/unreliable references, while 5% of Gemini's answers were categorized as such. The remaining questions (66% for ChatGPT and 76% for Gemini) contained partially correct responses and/or incomplete resources.
    CONCLUSION: This analysis underscores that generative AI tools should not replace healthcare professionals' own research or consultation with DI teams at their institution, as AI tools lack the ability to create evidence-based responses that DI pharmacists can offer. While chatbots may be helpful for certain tasks, they should not be used as a replacement for the work of clinical pharmacists.
    Keywords:  ChatGPT; Google Gemini; artificial intelligence; drug information services; generative artificial intelligence
    DOI:  https://doi.org/10.1093/ajhp/zxaf350
  8. Epileptic Disord. 2025 Dec 16.
       OBJECTIVE: As large language models (LLMs) become more accessible, they may be used to explain challenging EEG concepts to nonspecialists. This study aimed to compare the accuracy, completeness, and readability of EEG-related responses from three LLM-based chatbots and to assess inter-rater agreement.
    METHODS: One hundred questions, covering 10 EEG categories, were entered into ChatGPT, Copilot, and Gemini. Six raters from the clinical neurophysiology field (two physicians, two teachers, and two technicians) evaluated the responses. Accuracy was rated on a 6-point scale, completeness on a 3-point scale, and readability was assessed using the Automated Readability Index (ARI). We used a repeated-measures ANOVA for group differences in accuracy and readability, the intraclass correlation coefficient (ICC) for inter-rater reliability, and a two-way ANOVA, with chatbot and raters as factors, for completeness.
    RESULTS: Total accuracy was significantly higher for ChatGPT (mean ± SD 4.54 ± .05) compared with Copilot (mean ± SD 4.11 ± .08) and Gemini (mean ± SD 4.16 ± .13) (p < .001). ChatGPT's lowest performance was in normal variants and patterns of uncertain significance (mean ± SD 3.10 ± .14), while Copilot and Gemini performed lowest in ictal EEG patterns (mean ± SD 2.93 ± .11 and 3.37 ± .24, respectively). Although inter-rater agreement for accuracy was excellent among physicians (ICC = .969) and teachers (ICC = .926), it was poor for technicians in several EEG categories. ChatGPT achieved significantly higher completeness scores than Copilot (p < .001) and Gemini (p = .01). ChatGPT text (ARI - mean ± SD 17.41 ± 2.38) was less readable than Copilot (ARI -mean ± SD 11.14 ± 2.60) (p < .001) and Gemini (ARI - mean ± SD 14.16 ± 3.33).
    SIGNIFICANCE: Chatbots achieved relatively high accuracy, but not without flaws, emphasizing that the information provided requires verification. ChatGPT outperformed the other chatbots in accuracy and completeness, though at the expense of readability. The lower inter-rater agreement among technicians may reflect a gap in standardized training or practical experience, potentially impacting the consistency of EEG-related content assessment.
    Keywords:  ChatGPT; Copilot; Gemini; artificial intelligence; electroencephalography; large language model
    DOI:  https://doi.org/10.1002/epd2.70156
  9. Digit Health. 2025 Jan-Dec;11:11 20552076251406304
      Cataract surgery is one of the most common and effective surgeries performed worldwide, yet patient education remains a challenge due to limitations in health literacy among the general population. Our study evaluated the reliability of different large language models (LLMs) in providing accurate, complete, and clear responses to frequently asked questions (FAQs) related to cataract surgery. A comprehensive list of 20 FAQs about cataract surgery were submitted sequentially as a prompt to nine different LLMs. All 180 answers were recorded and scored by two expert ophthalmologists, blinded to the model type, on a 5-point scale measuring the degree of accuracy, completeness, and clarity. Interrater agreement was measured using a weighted kappa coefficient and model performances were compared using the Friedman test and post-hoc analysis. Our results showed all models performed well responding to FAQs (79% of responses scored "excellent"), serving as effective tools in answering patient FAQs. LLaMA 4 and Copilot scored lower on average relative to other models (p < .05), however, they remained effective at FAQ responses overall. Potential expansion of LLMs as patient education tools into clinical settings should be considered, as they exhibit effectiveness in providing clear, accurate, and complete responses to cataract surgery FAQs.
    Keywords:  Artificial intelligence; cataract surgery; frequently asked questions; large language models; patient education
    DOI:  https://doi.org/10.1177/20552076251406304
  10. Can Urol Assoc J. 2025 Nov 25.
       INTRODUCTION: This study aimed to evaluate the performance of three artificial intelligence (AI) models - ChatGPT, Gemini, and Copilot - in addressing priapism-related inquiries. The accuracy, comprehensiveness, and clinical applicability of AI-generated responses were systematically analyzed.
    METHODS: Frequently asked questions (FAQs) regarding priapism were collected from medical guidelines, literature, and online health platforms. Each AI model generated responses, which were independently assessed by two experts based on accuracy, fluency, and clinical relevance. The Global Quality Score (GQS) was used for evaluation. Statistical analysis was performed using one-way ANOVA, with a significance threshold of p<0.05.
    RESULTS: ChatGPT and Gemini demonstrated comparable performance across all thematic categories, with mean scores ranging from 4.5-4.9, while Copilot showed significantly lower scores (3.2-4.2, p<0.001). Both ChatGPT and Gemini provided clinically relevant and accurate information, whereas Copilot's responses frequently lacked guideline-based recommendations.
    CONCLUSIONS: ChatGPT and Gemini were statistically comparable in generating reliable, clinically useful responses, making them valuable tools for medical education and patient counseling. Copilot, however, exhibited lower accuracy and applicability. These findings highlight the need for continuous refinement of AI models to enhance their role in clinical decision-making while ensuring human expertise remains central to patient care.
    DOI:  https://doi.org/10.5489/cuaj.9302
  11. Brachytherapy. 2025 Dec 16. pii: S1538-4721(25)00323-X. [Epub ahead of print]
       PURPOSE: Patients are increasingly using artificial intelligence (AI) chatbots for health information. Evaluating their reliability for specialized topics, such as brachytherapy, is crucial for guiding their safe use. We assessed a readily accessible AI chatbot's suitability for answering frequently asked questions (FAQ) related to brachytherapy.
    METHODS: We compared responses from an AI chatbot (ChatGPT 4o-mini) against gold standard (GS) authoritative sources for 10 brachytherapy frequently asked questions. Four blinded board-certified brachytherapy experts evaluated 80 response pairs using metrics, including accuracy, clinical appropriateness, readability, and tone. Five simulated patient personas with varying literacy levels were used to assess helpfulness, readability, and emotional tone. The objective readability metrics were also calculated.
    RESULTS: Experts rated the AI chatbot higher for accuracy (75% highly/mostly accurate vs. 50% for GS) and appropriateness (77% vs 55%), although inaccuracies were noted in both sources in a blinded review. Simulated patients preferred GS responses (62% vs. 34%), particularly lower-literacy personas, citing better perceived readability (92% easy/very easy vs. 44% for AI) and a more reassuring tone (42% vs. 24% for AI). Objective analysis confirmed that both sources significantly exceeded the recommended reading levels (e.g., >12th grade Flesch-Kincaid), with AI responses being substantially longer. Performance varied considerably across individual questions for both AI and GS sources.
    CONCLUSIONS: In this blinded cross-sectional evaluation, a publicly available AI chatbot provided accurate responses to brachytherapy-related FAQs. However, further development and validation focused on accessibility, trustworthiness, and user-centered design are required before these tools can be safely and effectively integrated into patient-care workflows.
    Keywords:  Artificial intelligence; Brachytherapy; Large Language Models; Patient education
    DOI:  https://doi.org/10.1016/j.brachy.2025.10.005
  12. J Am Acad Orthop Surg Glob Res Rev. 2025 Dec 01. 9(12):
       INTRODUCTION: Artificial intelligence chatbots, such as ChatGPT-4o ("omni"), a large language model developed by OpenAI that integrates text, image, and audio processing with web connectivity, have gained traction as potential patient education tools in orthopaedic surgery. This study aimed to evaluate the accuracy, completeness, and clinical utility of ChatGPT-4o's responses to common patient questions about six widely performed orthopaedic procedures.
    METHODS: We assessed ChatGPT-4o's responses to five standardized patient-oriented queries for total knee arthroplasty, total hip arthroplasty, anterior cruciate ligament reconstruction, rotator cuff repair, anterior cervical diskectomy and fusion, and carpal tunnel release. Responses were generated using ChatGPT-4o's web-enabled version in January 2025. Two resident orthopaedic surgeons independently rated each response for accuracy, completeness, layperson clarity, misleading content, and conciseness using a structured binary rubric. The validated DISCERN instrument (16 items, max score 80) was adapted for quantitative assessment of information quality. Interrater reliability was assessed with Cohen kappa.
    RESULTS: Overall, ChatGPT-4o generated accurate and structured responses, free of overt errors. The average DISCERN score across procedures was 43.5, classifying the information as fair. The highest average DISCERN score was for anterior cervical diskectomy and fusion (mean 45.8 ± 10.1), whereas the lowest was for rotator cuff repair (mean 41.6 ± 5.9). Factual accuracy was high (>90%), but 36% of responses contained some misleading or incomplete information. Responses explaining treatment alternatives were the most accurate and complete, whereas those outlining surgical risks performed worst. Interrater agreement was good (Cohen kappa = 0.64).
    DISCUSSION: ChatGPT-4o provided generally accurate, clear, and empathetic explanations of common orthopaedic surgeries, offering a promising adjunct to conventional patient education. However, key limitations particularly regarding alternative treatments, nuanced risks, and lack of tailored advice limit its stand-alone use in clinical practice. Careful oversight and clinician vetting remain essential.
    CONCLUSIONS: ChatGPT-4o can supplement orthopaedic patient education by offering accessible, engaging content. However, notablenotable gaps in detail and occasional misleading information necessitate careful review and contextual explanation by orthopaedic surgeons.
    DOI:  https://doi.org/e25.00341
  13. Cutan Ocul Toxicol. 2025 Dec 18. 1-6
       OBJECTIVE: In this study, we aimed to examine the responses given by ChatGPT (OpenAI), Copilot (Microsoft), and Gemini (Bard) artificial intelligence applications to questions about the active ingredient isotretinoin in terms of accuracy, readability, applicability, and understandability.
    MATERIAL AND METHODS: The readability of the answers given by the artificial intelligence programs was evaluated using the Flesch-Kincaid ease score, and the applicability and understandability levels were evaluated using the Patient Education Materials Evaluation Tool scales. The accuracy of the answers was compared by two dermatologists who scored them between 1 and 5.
    RESULTS: No significant difference was found between the groups in terms of Flesch Kincaid reading ease scores (p = 0.671), and all three programs were found to be at a difficult level of reading. In the Patient Education Materials Evaluation Tool scales, it was observed that Gemini and ChatGPT rates were >70% and there was a significant difference in favor of these programs between the groups (p < 0.001). In the accuracy scores of the answers, Gemini (4.90 ± 0.31) and ChatGPT (4.60 ± 0.69) had high scores and there was a significant difference between the groups (p < 0.001).
    CONCLUSION: While the AI chatbots we used in the study demonstrated reasonable accuracy in answering questions about isotretinoin, they performed limited in terms of readability and usability. These findings suggest that AI programs alone are not sufficient for patient education and need to be improved to simplify responses.
    Keywords:  Artificial intelligence; ChatGPT; Copilot; Gemini; isotretinoin
    DOI:  https://doi.org/10.1080/15569527.2025.2601639
  14. Rev Assoc Med Bras (1992). 2025 ;pii: S0104-42302025001200613. [Epub ahead of print]71(12): e20250892
       OBJECTIVE: Artificial intelligence tools like Chat Generative Pretrained Transformer-4 are increasingly used in clinical decision-making, but their reliability for acute compartment syndrome remains understudied. The aim of the study was to evaluate Chat Generative Pretrained Transformer-4's accuracy, completeness, and quality in responding to acute compartment syndrome-related queries, addressing gaps in artificial intelligence-assisted medical information.
    METHODS: Chat Generative Pretrained Transformer-4 was given 60 questions (40 open-ended and 20 binary) that were taken from the American Academy of Orthopaedic Surgeons 2019 acute compartment syndrome guidelines. Responses were evaluated independently by two orthopedic specialists using the Quality Criteria for Consumer Health Information instrument (information quality), Flesch-Kincaid Reading Ease Score (readability), and Likert scales (accuracy and completeness). Inter-rater reliability (Cohen's kappa) was a statistical analysis.
    RESULTS: Chat Generative Pretrained Transformer-4 demonstrated high accuracy (95% for both question types) and completeness (mean scores: 5.92±0.8 [accuracy], 2.9±0.5 [completeness]). DISCERN scores were "excellent" (69-72), though source reliability was limited. Readability was "very difficult" (Flesch-Kincaid Reading Ease Score: 22.19), potentially hindering patient comprehension.
    CONCLUSION: Although Chat Generative Pretrained Transformer-4 is excellent at providing precise, high-quality acute compartment syndrome information, its complicated language and lack of credible sources make it difficult for wider adoption. To improve clinical utility and patient education, readability and transparency must be given top priority in future artificial intelligence developments.
    DOI:  https://doi.org/10.1590/1806-9282.20250892
  15. Jt Dis Relat Surg. 2026 Jan 01. pii: jdrs.2026.2368. [Epub ahead of print]37(1): 142-155
       OBJECTIVES: This study aims to compare ChatGPT (Generative Pre-Trained Transformer) and Google in addressing frequently asked questions (FAQs), answers, and online sources regarding robot-assisted total hip arthroplasty (RATHA).
    MATERIALS AND METHODS: On December 15th, 2024, the 20 most FAQs were identified by inputting the search term "Robot-Assisted Total Hip Replacement" into both Google Search and ChatGPT-4o. Twenty FAQs were independently identified using a clean Google search and a prompt to ChatGPT-4o. The FAQs on Google were sourced from the "People also ask" section, while ChatGPT was requested to generate the 20 most often asked questions. All questions, answers, and references cited were recorded. A modified version of the Rothwell system was used to categorize questions into 10 subtopics: special activities, timeline of recovery, restrictions, technical details, cost, indications/management, risks and complications, pain, longevity, and evaluation of surgery. Each reference was categorized into the following groups: commercial, academic, medical practice, single surgeon personal, or social media. Responses were also graded as "excellent response not requiring clarification" (1), "satisfactory requiring minimal clarification" (2), "satisfactory requiring moderate clarification" (3), or "unsatisfactory requiring substantial clarification" (4).
    RESULTS: Overall, 20% of the questions that Google and ChatGPT-4o considered as the most FAQ were similar to each other. Technical details (35%) were the most common categories of questions. The ChatGPT provided significantly more academic references than Google search (70% vs. 20%, p=0.0113). Conversely, Google web search cited more medical practice references (40% vs. 0%, p=0.0033), single surgeon websites (20% vs. 0%, p=0.1060), and government websites (10% vs. 0%, p=0.4872) more frequently than ChatGPT. In terms of response quality, 62% of answers were rated as Grade 1-2 (excellent or satisfactory with minimal clarification), while 38% required moderate or substantial clarification (Grades 3-4).
    CONCLUSION: ChatGPT demonstrated comparable results to those of Google searches on information regarding RATHA, with a higher reliance on academic sources. While most responses were satisfactory, a notable proportion required further clarification, emphasizing the need for continued evaluation of these platforms to ensure accuracy and reliability in patient education. Taken together, these technologies have the capacity to enhance health literacy and provide enhanced shared decision-making for patients seeking information on RATHA.
    DOI:  https://doi.org/10.52312/jdrs.2026.2368
  16. Rev Assoc Med Bras (1992). 2025 ;pii: S0104-42302025001200607. [Epub ahead of print]71(12): e20250750
       OBJECTIVE: The aim of the study was to evaluate the quality and readability of ChatGPT responses to frequently asked questions by individuals with posture disorder. Providing reliable and evidence-based information about posture disorders is vital for individuals to be correctly informed.
    METHODS: The 10 most frequently asked questions about posture disorder were selected by two researchers from a list created by ChatGPT. The questions were transmitted to ChatGPT 4.0, and the initial responses were recorded without further follow-up questions. The quality of the responses was then assessed by five independent experts (three physiotherapists, one physical therapy and rehabilitation specialist, and one orthopedics and traumatology specialist) with a four-grade evaluation system. Readability levels were analyzed with the Flesch-Kincaid Grade Level through WordCalc software. Statistical analysis was performed using Statistical Package for the Social Sciences v29.0, and intraclass correlation coefficients were used to measure inter-rater reliability.
    RESULTS: Following a thorough evaluation of the 10 responses received, six were rated as "Excellent responses requiring no explanation," while a further four were designated as "Satisfactory responses requiring minimal explanation." The median quality score of the responses was high, indicating good alignment with current evidence-based practice. The average readability level of the responses was determined to be 8.4. Inter-rater reliability was good, with an intraclass correlation coefficients value of 0.756.
    CONCLUSION: ChatGPT provides relatively coherent and generally readable answers to frequently asked questions about posture disorders, with most needing minimal explanation. While promising as a resource to meet the information needs of people with posture disorders, further improvements are needed to align it with personalized health needs.
    DOI:  https://doi.org/10.1590/1806-9282.20250750
  17. J Cancer Res Clin Oncol. 2025 Dec 20. 152(1): 17
       PURPOSE: With increasing reliance on large language models (LLMs) for health information, this study evaluated reliability and quality, understandability, actionability, readability and misinformation risk of responses from LLMs to oral health concerns and oral side effects in head and neck cancer (HNC) patients.
    METHODS: Frequently asked questions on oral health and HNC therapy side effects were identified via ChatGPT-GPT-4-turbo and Gemini-2.5 Flash, then submitted to eight LLMs (ChatGPT-GPT-4-turbo, Gemini-2.5 Flash, Microsoft Copilot, Perplexity, Chatsonic, Mistral, Meta AI-Llama 4, DeepSeek-R1). Responses were assessed using DISCERN and modified DISCERN instruments (reliability and quality), Patient Education Materials Assessment Tool (PEMAT [understandability and actionability]), Flesch-Reading-Ease-Score (FRES [readability]), misinformation score, citations, and wordcounts. Statistical analysis was done by Scheirer-Ray-Hare-test followed by Dunn's post-hoc-tests and Bonferroni-Holm correction (p < 0.05).
    RESULTS: A total of 40 questions belonging to 12 oral health-related categories were identified. Statistically significant differences between LLMs were found for DISCERN, modified DISCERN, PEMAT-understandability, PEMAT-actionability, FRES, and word counts (p < 0.001). Median DISCERN and modified DISCERN scores amounted from 47.0 (ChatGPT-GPT-4-turbo) to 59.0 (Perplexity, Chatsonic) and from 2.0 (Gemini-2.5 Flash, Mistral) to 5.0 (Perplexity) indicating good to fair reliability. LLMs were understandable (median PEMAT-understandability scores ≥ 75.0), but provided limited specific guidance (median PEMAT-actionability scores ≤ 40) and used complex language (median FRES ≤ 40.2). Misinformation risk was generally low and not statistically significant among LLMs (p = 0.768).
    CONCLUSION: Despite a low overall misinformation risk, deficits in actionability highlight the need for cautious integration of LLMs into HNC patient education.
    Keywords:  Artificial intelligence; Head and neck cancer; Information quality; Large language model; Oral health; Readability
    DOI:  https://doi.org/10.1007/s00432-025-06400-w
  18. Clin Rheumatol. 2025 Dec 13.
       OBJECTIVE: To evaluate and compare the quality and readability of patient education materials (PEM) related to ankylosing spondylitis (AS) generated by four AI-based large language models (LLMs): ChatGPT-4o, ChatGPT-3.5, DeepSeek R1, and DeepSeek V3.
    METHODS: On May 1, 2025, the ten most frequently searched AS-related questions were identified using Google Trends (Turkey). These questions were posed to the four LLMs, and the responses were recorded without modification. Quality was assessed by two independent rheumatologists. The quality was evaluated using the DISCERN tool. Readability and comprehensibility were assessed using the Flesch Reading Ease Score (FRES) and the Flesch-Kincaid Grade Level (FKGL). Inter-rater reliability was analyzed using the intraclass correlation coefficient (ICC). Mean scores and 95% confidence intervals (CI) were reported.
    RESULTS: ChatGPT-4o achieved the highest average DISCERN score (72.38), followed by DeepSeek R1 (69.76), ChatGPT-3.5 (68.82), and DeepSeek V3 (68.79). Inter-rater reliability for DISCERN was excellent (ICC, 0.931). ChatGPT-4o had the highest mean DISCERN score, although the difference was not statistically significant. For readability analysis, DeepSeek V3 had the highest FERS score (14.93). This suggested that DeepSeek V3 was more easily understandable than other LLMs. ChatGPT-3.5 received the lowest score (5.29). FKGL scores varied within a narrow range (15.33-15.93) across models. Therefore, it was interpreted that the data required university-level reading skills. Conclusion For AS, AI-generated PEMs were generally complex enough to meet the needs of highly educated patients. The responses were information-dense and complex, requiring excessive expertise regardless of the recipient's educational level. In the future, improving the clarity and comprehensibility of the language according to personal characteristics (educational level, etc.) and providing evidence-based citations could help make LLMs more useful in clinical settings or for the public. Key Points • This study compared how different AI chatbots explain ankylosing spondylitis to patients. • Although the information quality was high, the language used was too complex for most patients. • ChatGPT-4o gave the most accurate content, while DeepSeek V3 used the easiest words. • Future AI tools should use simpler language and include reliable references to better support patient education.
    Keywords:  Ankylosing spondylitis; Artificial intelligence; DISCERN; Patient education; Readability
    DOI:  https://doi.org/10.1007/s10067-025-07771-8
  19. Pharmaceut Med. 2025 Dec 15.
       BACKGROUND AND OBJECTIVES: Efficient and informed patient recruitment, followed by successful enrolment, is essential for the conduct of clinical trials. As major stakeholders in the research process, pharmaceutical companies have an important role in ensuring that information for patients is available and readable. Several global institutions recommend that patient-facing material be written at a literacy level suitable for 11-12-year-olds. This study assessed the availability and readability of clinical trial sections on UK, Canadian, Australian, and global pharmaceutical company websites.
    METHODS: The 30 largest global pharmaceutical companies (assessed by market capitalisation in April 2025) were selected. Clinical trial content was reviewed for availability and analysed using three validated readability metrics: the Flesch-Kincaid Grade (FKG), Flesch Reading Ease Score (FRES) and Simple Measure of Gobbledygook (SMOG) Index.
    RESULTS: Of 115 websites assessed, 54 were eligible for readability analysis. While 96% of global websites included clinical trial information, 55% of non-global websites lacked such content or contained only external links. FKG scores, which estimate the US school grade level needed for comprehension, averaged 10.9 (± 3.5) for global, 14.2 (± 2.0) for UK, 12.1 (± 2.7) for Canadian and 12.8 (± 1.9) for Australian websites-suggesting readability at high school to college level. FRES scores showed similar trends: 42.2 (± 8.0) global, 31.2 (± 9.4) UK, 38.8 (± 12.3) Canadian and 35.7 (± 12.3) Australian, indicating college-level complexity. SMOG scores suggested that 13-15 years of education were needed to understand the material.
    CONCLUSIONS: These results indicate that clinical trial information on pharmaceutical company websites is often missing or difficult to read and exceeds recommended literacy levels, which may limit comprehension and engagement. Poor readability disproportionately affects individuals with lower literacy, limited English proficiency or disabilities, creating inequities in trial participation. Applying health literacy and plain-language principles-such as simplifying terminology, shortening sentences and using clear formatting-could improve accessibility and support informed decision-making.
    DOI:  https://doi.org/10.1007/s40290-025-00595-6
  20. BMJ Open. 2025 Oct 29. 15(10): e093666
       OBJECTIVE: In China, a large number of health-related short videos are posted on video platforms, including educational videos about irritable bowel syndrome (IBS). This study aimed to evaluate the reliability and quality of IBS-related video content on TikTok, Kwai and BiliBili.
    METHODS: Using 'irritable bowel syndrome' as the Chinese keyword, a new account was registered on each platform. On 1 November 2023, searches were conducted on TikTok, Kwai and BiliBili, and the top 100 recommended videos from each platform were analysed. After those that met the predefined exclusion criteria were removed, 244 short videos were included and evaluated for their characteristics, content, reliability and quality. Information quality was assessed using the Journal of the American Medical Association (JAMA) criteria, Global Quality Scale (GQS) and the modified Designed Information System Containing Evaluations of Reliability and Need (DISCERN) tool. Correlation analysis was conducted to evaluate the relationship between video characteristics and video reliability and quality.
    RESULTS: A total of 244 eligible short videos were included. BiliBili videos were longer than TikTok and Kwai videos (p<0.001), and TikTok videos were more popular than Kwai and BiliBili videos. The proportion of health professionals was the highest on TikTok and the lowest on BiliBili. The proportion of general users was the highest on Kwai and the lowest on TikTok. The median JAMA scores of TikTok, Kwai and BiliBili videos were 3 (IQR 2-3), 3 (IQR 2-3) and 2 (IQR 2-3), respectively. The median GQS scores of TikTok, Kwai and BiliBili videos were 3 (IQR 2-4), 3 (IQR 2-4) and 3 (IQR 3-4), and the median modified DISCERN scores were 3 (IQR 2.75-3), 3 (IQR 3-3), and 3 (IQR 2-4), respectively. Video source was an influencing factor for JAMA scores, whereas video duration and source were influencing factors for GQS scores. The number of days since publication (r=0.19, p=0.003) and duration (r=0.27, p<0.001) were positively correlated with GQS scores, whereas likes (r=0.18, p=0.004), comments (r=0.21, p=0.001) and collections (r=0.21, p=0.001) were positively correlated with modified DISCERN scores.
    CONCLUSION: Short videos of IBS-related health information on TikTok, Kwai and BiliBili were of poor quality; however, videos uploaded by health professionals and science communicators were relatively more reliable and comprehensive. Thus, the public are recommended to learn about IBS-relevant information through videos uploaded by health professionals and science communicators.
    Keywords:  Education, Medical; Health informatics; Irritable Bowel Syndrome; MEDICAL EDUCATION & TRAINING; PUBLIC HEALTH; eHealth
    DOI:  https://doi.org/10.1136/bmjopen-2024-093666
  21. Clin Oral Investig. 2025 Dec 20. 30(1): 18
       OBJECTIVES: The aim of this study was to analyze the content of YouTube™ videos related to dental visits for children with autism spectrum disorder (ASD), as well as to evaluate their quality, usefulness, reliability, and accuracy in supporting preparation for the first visit to the dentist.
    MATERIAL AND METHOD: A total of 93 videos were analyzed between April 24th to 4th May 2025. Two evaluators assessed each video using the Global Quality Scale (GQS), modicied DISCERN (mDISCERN) and a Veracity Classification. Discrepancies were resolved by a third reviewer. Recorded variables included: video type, language, country, duration, views, likes, comments, source, category, and the interaction index. As well as quality, usefulness, reliability, and accuracy classifications. Benjamini-Hochberg False Discovery Rate (FDR) method to adjust multiple comparisons.
    RESULTS: Videos created by health professionals showed significantly higher mean GQS scores (3.9 ± 0.6) compared with those based on personal experiences (2.8 ± 0.7), with a mean difference of 1.1 points (95% CI: 0.8-1.4; p < 0.001). Webinars were rated as the most useful by dental professionals, while families valued personal experiences more highly. Tips-oriented videos were consistently considered useful by both groups.
    CONCLUSIONS: YouTube™ serves as a valuable, freely accessible repository of information to help families prepare children with ASD for dental visits.
    CLINICAL RELEVANCE: Promoting high-quality, accurate, and reliable online content can support better preparation for first dental visits of children with ASD. Dental professionals should recommend trustworthy videos and participate in the creation of evidence-based educational materials.
    Keywords:  Autism spectrum disorder; Content analysis; Dental care for children; Health information; MDISCERN; YouTube™
    DOI:  https://doi.org/10.1007/s00784-025-06717-3
  22. Urology. 2025 Dec 17. pii: S0090-4295(25)01394-9. [Epub ahead of print]
       OBJECTIVES: To evaluate the quality, popularity, and educational content of YouTube videos on intravesical botulinum toxin injection (IBI).
    METHODS: A YouTube search was conducted on 10 May 2025 using four keywords related to IBI. Eligible videos were categorized into three groups: academic institutions, health information websites, and physicians or private medical organizations. Their quality and educational value were then evaluated using the Global Quality Score (GQS), the Patient Education Materials Assessment Tool for Audiovisual Materials (PEMAT-A/V), and a 15-item IBI-Specific Checklist (IBI-SC). Group comparisons were conducted using Pearson's chi-square test and the Kruskal-Wallis test, and Pearson correlation was employed to evaluate relationships among the assessment tools.
    RESULTS: Comparison among groups based on the sources of the final set of 45 videos revealed that the physician or private medical organization group performed significantly worse than the other groups in the analysis of PEMAT-A/V Understandability (U), Actionability (A), and IBI-SC scores (p<0.001). GQS categories also differed significantly among videos from different sources, with the academic institutions group performing the best (p<0.001). In the correlation analysis, a strong correlation was observed between PEMAT-A/V U and PEMAT-A/V A (r=0.71, p<0.001), while moderate correlations were found between PEMAT-A/V U and IBI-SC (r=0.58, p<0.001), and between PEMAT-A/V A and IBI-SC (r=0.42, p=0.004).
    CONCLUSIONS: The effectiveness of educational content varied significantly by source, with videos from academic institutions demonstrating the highest quality. Our findings highlight the need for standardized, open-access health information resources tailored to specific educational purposes.
    DOI:  https://doi.org/10.1016/j.urology.2025.12.019
  23. Cureus. 2025 Dec;17(12): e98880
      Introduction The increasing use of advanced imaging techniques has led to more frequent detection of renal masses, including benign tumours like renal oncocytomas. Although typically asymptomatic, these tumours can complicate diagnosis due to their resemblance to renal cell carcinoma (RCC) on imaging studies. Treatment strategies for renal oncocytomas depend on tumour size, symptoms, and imaging features. As patients increasingly turn to online platforms for health information, YouTube has become a popular source for educational videos. However, the quality and reliability of such videos vary. This study evaluates YouTube videos on renal oncocytoma to assess their role in patient education. Materials and methods A systematic search of YouTube was conducted on September 5, 2025, using the keyword "renal oncocytoma." A total of 60 videos were screened, and 10 met the inclusion criteria. The educational quality and reliability of the selected videos were assessed using the Global Quality Score (GQS) and the DISCERN instrument. Video popularity was assessed with the Video Power Index (VPI). A one-sample t-test was used to compare the average GQS and DISCERN scores with predefined benchmarks. Spearman's rank correlation coefficient (ρ) was used to evaluate associations between these metrics. Results The videos analysed had a mean GQS of 3.7 ± 0.90, significantly higher than the midpoint score of 3 (p = 0.037), indicating moderate educational quality. DISCERN scores revealed that while the videos clearly stated their aims (mean score 4.8 ± 0.98, p = 0.0003), they often fell short in providing detailed, patient-centered information. The mean score for describing treatment options was 2.1 ± 1.81 (p = 0.15), and there was a notable lack of transparency regarding information sources, with a mean score of 1.7 ± 1.23 (p = 0.008). Although the videos provided current information (mean score 4.4 ± 0.94, p = 0.001), they did not adequately address uncertainties (mean score 2.0 ± 1.00, p = 0.011) or support shared decision-making (mean score 2.3 ± 1.65, p = 0.207). The VPI revealed no significant correlation with the quality metrics, with a moderate negative correlation between GQS and VPI (ρ = -0.424, p = 0.299) and a weak negative correlation between DISCERN and VPI (ρ = -0.141, p = 0.757). Conclusion This study highlights the potential of YouTube as an educational tool for renal oncocytoma, but it also underscores significant gaps in the quality and reliability of the information provided. The videos often lack comprehensive discussions on treatment options, transparency, and addressing uncertainties, which are essential for informed patient decision-making. Although YouTube can serve as a starting point for general information, patients should consult healthcare professionals for accurate, personalised advice. Efforts to improve the quality of health-related content on YouTube are needed to better support patient education.
    Keywords:  benign renal mass; renal neoplasm; renal oncocytoma; youtube study; youtube videos
    DOI:  https://doi.org/10.7759/cureus.98880
  24. Sci Rep. 2025 Dec 17. 15(1): 44007
      Scoliosis is a common spinal disorder that affects 2-4% of adolescents worldwide. With the rise of short-video platforms like TikTok, they have increasingly played a significant role in health information dissemination. However, the quality and accuracy of scoliosis-related video content on these platforms have not been thoroughly studied. The objective of this study was to assess the accuracy and quality of scoliosis-related videos on TikTok. Using a cross-sectional design and a newly created TikTok account with cleared cache to minimize bias, we retrieved the platform's top 100 scoliosis-related short videos via its default sorting algorithm on August 17, 2025. Two independent reviewers with backgrounds in orthopedic surgery and health-information assessment extracted basic data and evaluated each video's quality and accuracy with the Global Quality Score (GQS, 1-5), modified DISCERN (0-5), and JAMA (0-4). Bivariate associations used Spearman's rank correlation; multivariable associations used a proportional-odds ordinal logistic model with GQS as the outcome. A total of 95 videos were included in the analysis. 5 videos were excluded due to language mismatch, redundancy, or commercial content. Video duration was median 60 s (IQR 45-87). The median number of views was 8109 (IQR 2112-33,856). Professional individuals accounted for 63.8% of the uploaded videos, while non-professional individuals contributed 33.0%. The median GQS for professional videos was 3, while for non-professional videos it was 2. The video content primarily focused on treatment (68.1%), with less emphasis on diagnosis (16%) and prevention (8.5%). The videos exhibited moderate overall quality, with median scores of 3 for GQS and mDISCERN, and 2 for JAMA, with IQRs of 2-4, 2-3, and 1-2, respectively. Videos uploaded by professional individuals had higher quality and accuracy scores. Fans correlated with mDISCERN and JAMA scores (r = 0.241, P = 0.017 and r = 0.275, P = 0.005, respectively). In the multivariable model, collections (OR = 4.287, 95% CI 1.313-13.989) and duration (OR = 2.664, 95% CI 1.370-5.182) were associated with higher GQS, whereas patient as the uploader identity (OR = 0.064, 95% CI 0.007-0.621) and prognosis content (OR = 0.052, 95% CI 0.005-0.555) were associated with lower GQS. The quality and accuracy of scoliosis-related short videos on TikTok are generally low, with lower quality and accuracy scores for non-professionals.. The main problems are lack of professional review, misleading information, and emphasis on treatment rather than prevention. It is recommended that the public exercise caution when browsing health information on short-video platforms, and that medical professionals provide higher-quality educational content. This study provides a basis for the regulation and optimization of health information on short-video platforms.
    Keywords:  Accuracy; Quality; Scoliosis; Short videos; TikTok
    DOI:  https://doi.org/10.1038/s41598-025-27684-5
  25. Digit Health. 2025 Jan-Dec;11:11 20552076251406648
       Background: Sudden sensorineural hearing loss (SSNHL) has increasingly become a critical public health concern worldwide, with limited access to health knowledge among Chinese patients. TikTok is considered one of the most popular short-video platforms for health education information in China. However, there remains a lack of scientific investigation and evaluation for the quality of these videos.
    Objective: The study aimed to examine the quality and content coverage of the short videos about SSNHL on TikTok as one of the most significant information sources for Chinese.
    Methods: We retrieved 215 TikTok videos by comprehensive ranking with the Chinese search term "SSNHL" on 1 June 2025. Video sources, audience engagement, and video content were extracted. Two independent researchers evaluated the information of each video using m-DISCERN, Global Quality Score (GQS), Goobie's coding scheme, Journal of the American Medical Association (JAMA), and Video Information and Quality Index (VIQI). Besides, spearman correlation analysis was conducted.
    Results: A total of 174 TikTok videos were ultimately included, 157 from healthcare, and 17 from nonhealthcare. The median video lengths were 50 s (healthcare) and 70 s (nonhealthcare). Videos from healthcare source gained higher overall audience engagement except comments. The majority of the video style was medical questions and answers (51.7%), while the most common video background was medical scenario (90.3%). Video uploaders were predominantly from first-tier cities (45.4%). Videos from healthcare showed more scores on the median m-DISCERN, GQS, JAMA, and VIQI than videos from nonhealthcare. The GQS and VIQI score positively correlated with metrics such as likes, shares and collections (p < .001), but the correlation was slight in most cases.
    Conclusion: Videos from healthcare performed better than nonhealthcare, at the video quality of SSNHL-related knowledge. However, the overall quality and content coverage from both sources were unsatisfying. Despite some limited positive correlations between video quality and audience engagement, it suggests that individuals should be vigilant when discerning health-related information on TikTok.
    Keywords:  GQS; Goobie's coding scheme; JAMA benchmarks; TikTok; VIQI; information; m-DISCERN; quality; short videos; sudden sensorineural hearing loss
    DOI:  https://doi.org/10.1177/20552076251406648
  26. Sci Rep. 2025 Dec 15.
      Amblyopia is the main cause of monocular vision loss in children. Early recognition and treatment are important to prevent vision loss. As public health awareness increases, short videos platforms like TikTok and Bilibili are increasingly being used to disseminate health information. However, due to the lack of peer review and supervision, short-video platforms tend to disseminate incorrect and incomplete health information. At present, the quality of videos on amblyopia has not been systematically evaluated. To evaluate the quality of videos related to amblyopia, this cross-sectional study used the Chinese term "amblyopia" as the search keyword to collect videos from TikTok and Bilibili. After applying exclusion criteria, 185 videos (94 from TikTok, 91 from Bilibili) were analyzed. Data on video length and characteristics, including engagement metrics (likes, collections, comments and shares) were collected. The assessment tools including the Global Quality Score (GQS), the modified DISCERN, the Journal of the American Medical Association (JAMA) benchmark criteria and the Video Information and Quality Index (VIQI) were used to evaluate video reliability and quality. Through statistical analysis, the quality and reliability among two platforms, video sources, and video quality were evaluated. On the TikTok, videos were mainly uploaded by specialists with accounting for 71.3%. While on the Bilibili, videos were mainly uploaded by individual users with accounting for 45%. TikTok videos scored higher in quality (GQS: 2.862 ± 1.033; modified DISCERN: 2.277 ± 0.8848; VIQI: 10.88 ± 2.531) compared to Bilibili (GQS: 2.242 ± 1.089 p < 0.0001; modified DISCERN: 1.846 ± 0.8154 p = 0.001; VIQI: 6.571 ± 1.910 p < 0.0001). Specialist-uploaded videos performed notably better in quality, with GQS, modified DISCERN, JAMA and VIQI scores of 3(3-4), 3(2-3), 3(2-3) and 11(9-13), respectively. On both platforms, the topic of amblyopia treatment was the most frequently discussed one, while the topic of prevention received the lowest level of discussion. The TikTok videos demonstrated a significantly higher level of audience engagement compared to Bilibili. Correlation analysis revealed that there were strong correlations between interaction data, but interaction data had no correlation with GQS, modified DISCERN, JAMA and VIQI scores. On the whole, the user engagement and quality of TikTok are both higher than those of Bilibili. However, both of two platforms fall short in terms of the quality and reliability of videos related to amblyopia. The reliability of specialist-uploaded videos is higher. This might be because they can provide information that is more valuable to the audience. The two platforms' videos pay far more attention to the treatment of amblyopia than to its prevention. The proposed intervention measures include robust platform certification, active involvement of medical specialists in content creation, and enriching the video content.
    Keywords:  Amblyopia; Bilibili; Global quality score; Journal of the american medical association; Modified DISCERN; Public health; TikTok; Video information and quality index
    DOI:  https://doi.org/10.1038/s41598-025-31758-9
  27. Front Public Health. 2025 ;13 1683561
       Background: Short-video platforms have become major sources of health information in China, influencing public awareness and health behavior. However, the quality and dissemination patterns of lung cancer-related content across different platforms remain unclear. This study aimed to evaluate the informational quality, reliability, and engagement patterns of lung cancer short videos on three leading Chinese platforms.
    Methods: We conducted a comprehensive cross-sectional content analysis of 1,288 lung cancer-related videos retrieved from TikTok, Kwai and Rednote. Video quality was systematically evaluated using a multidimensional toolkit, including the Journal of the American Medical Association (JAMA) benchmark criteria, Global Quality Scale (GQS), modified DISCERN (mDISCERN), and the Patient Education Materials Assessment Tool (PEMAT-U/A). We analyzed heterogeneity and correlations of quality and engagement metrics (likes, comments, shares, collections) across platforms, creator types, content themes, and presentation formats.
    Results: Overall information quality was suboptimal (Median JAMA = 2; Median GQS = 3). Significant heterogeneity (p < 0.001) was found, with TikTok demonstrating the highest quality, while Kwai exhibited the lowest quality but high engagement. Videos by physicians and news agencies demonstrated significantly higher reliability, understandability, and actionability than those by non-professional creators (p < 0.001). Disease knowledge videos-particularly those focusing on prevention, definitions, and risk factors-exhibited superior quality compared to personal experience or metastasis-related content. Expert monolog videos were the most common and effective presentation format. Engagement did not align linearly with quality. Patient vlogs and metastasis-related videos achieved higher interaction rates despite lower accuracy, indicating a "quality-engagement paradox." Weak-to-moderate positive correlations were found between GQS and engagement, while PEMAT-A was negatively correlated with likes and comments.
    Conclusion: Marked disparities in the quality and dissemination of lung cancer-related short videos exist across Chinese platforms. Professional, evidence-based content enhances reliability, whereas emotional and visually driven content drives engagement. Strengthening algorithmic governance, metadata transparency, and expert involvement-alongside audience-centered, evidence-informed communication-may enhance the educational value and public health impact of short-video platforms.
    Keywords:  digital health; health communication; health information quality; lung cancer; short-video; social media
    DOI:  https://doi.org/10.3389/fpubh.2025.1683561
  28. BMC Med Educ. 2025 Dec 17.
       BACKGROUND: This research aimed to assess the usefulness of existing YouTube videos on endodontic irrigation, published between January 2013 and December 2022, as a learning resource for dental undergraduate students. Additionally, the study evaluated the educational value of these videos about factors such as video duration, authorship source, and view count.
    METHODS: YouTube searches for videos related to endodontic irrigation were conducted using search terms such as "irrigation protocol" and "endodontic irrigation." After screening, ninety-six videos were selected and assessed based on factors such as days since upload, video duration, number of views and likes, authorship source, and viewing rate. A set of six parameters was established to evaluate the educational value of the videos, with each element being assigned a score of 0 or 1. Statistical analyses were performed using the Kruskal-Wallis and Mann-Whitney tests, with a significant level of 5%.
    RESULTS: The videos received an average of 4,474.76 views, 38.67 likes, a 290.5% viewing rate, and a video length of 161.15 s. The mean usefulness score was 3.89, with the most discussed elements being demonstration, description, and equipment/apparatus. The majority of videos were from healthcare professionals (n = 73, 76.04%) or commercial sources (n = 22, 22.92%). Twenty-seven videos were classified as highly useful, sixty-one as moderately useful, and eight as low usefulness. Among the content usefulness categories, the highly useful videos had the shortest length (mean = 158.93s), while the moderately useful videos had the highest viewing rate. Videos uploaded by different authorship sources showed minimal statistical significance in content usefulness.
    CONCLUSIONS: The usefulness of endodontic irrigation videos on YouTube has low educational value, and the available data to assess their effectiveness as a learning tool for undergraduate students is inconclusive.
    Keywords:  Dental education; Endodontics; Online learning; Root Canal irrigants; Root Canal therapy
    DOI:  https://doi.org/10.1186/s12909-025-07647-0
  29. Facts Views Vis Obgyn. 2025 Dec 19.
       Background: TikTok is a popular platform for sharing health experiences, including those related to endometriosis. However, the quality and tone of the surgical information shared remain unclear.
    Objectives: To characterise TikTok content regarding perceptions of surgical management for endometriosis and analyse content for information quality and differences between healthcare professionals and patients.
    Methods: A cross-sectional analysis of the top 100 most-viewed TikTok videos under the search term "endometriosis surgery" was conducted on September 22, 2024. Videos were included if in English, referenced "endometriosis," and mentioned "surgery," "operation," or "laparoscopy." Two independent reviewers assessed creator identity, tone, and content. The brief DISCERN tool evaluated information quality.
    Main Outcome Measures: Primary outcomes included the perceived benefits and drawbacks of surgery, tone towards surgical intervention, and thematic content. Secondary outcomes included DISCERN scores and comparison of content across creator identities.
    Results: Of the included videos (2021-2024), 80% were created by patients. Most conveyed a neutral tone (41%) towards surgery. Perceived benefits included therapeutic effects (68%) and diagnostic clarity (61%). Reported drawbacks were postoperative recovery (58%) and symptom persistence (22%). Common themes among patients included barriers to surgery (35%), medical gaslighting (30%), delayed diagnosis/misdiagnosis (25%), and inadequate presurgical counselling (20%). Median DISCERN scores were significantly lower for patient videos (1.00) vs. healthcare professionals (1.96; P<0.001).
    Conclusions: TikTok content on endometriosis surgery is largely driven by patient narratives that highlight both hope and frustration. The low quality of information underscores the need for accessible, evidence-based educational content. Our findings represent a cross-sectional snapshot subject to algorithmic ranking and platform dynamics.
    What is New?: This is the first study to systematically evaluate TikTok content focused on surgical management of endometriosis, demonstrating that patient-generated videos overwhelmingly drive the conversation. While patients frequently describe benefits such as diagnostic clarity and symptom relief, they also highlight barriers to surgery, postoperative challenges, recurrent symptoms, and experiences of medical gaslighting. Patient-created videos had significantly lower information quality than provider-generated content, underscoring a critical gap in evidence-based surgical education on social media and an opportunity for clinician engagement.
    Keywords:  Endometriosis; TikTok; experience; social media; surgical resection
    DOI:  https://doi.org/10.52054/FVVO.2025.198
  30. Cognition. 2025 Dec 16. pii: S0010-0277(25)00351-8. [Epub ahead of print]269 106410
      In everyday decision making, people often need to actively search for information to construct value estimates before committing to a final choice. Effective information search relies on an accurate internal representation of uncertainty within the task environment, an ability closely linked to metacognition. However, few studies have directly examined whether metacognitive sensitivity (the ability to distinguish between good and bad options) is related to the quality of information search process in value-based decision making. Across five experiments (total N = 477), we investigated the relationship between metacognitive sensitivity and information search quality by asking participants to gather information in a multi-armed bandit task and rate confidence in their final decisions. Results showed that metacognitive sensitivity reliably predicted information search quality, including the ability to decide which information to search for (sampling quality) and when to terminate the search process (termination quality). When the working memory demands of the task were removed, the relationship between metacognitive sensitivity and termination quality persisted, whereas its association with sampling quality diminished, suggesting that sampling quality correlates with metacognitive sensitivity only when working memory is engaged during information search. Furthermore, direct training in information search did not improve metacognitive sensitivity, indicating that individual differences in metacognitive sensitivity are not merely a consequence of information search quality. These findings highlight the crucial role of metacognitive sensitivity as a predictor of efficient information search in value-based decision making.
    Keywords:  Confidence; Information search; Metacognition; Multi-armed bandit; Value-based decision making
    DOI:  https://doi.org/10.1016/j.cognition.2025.106410
  31. Front Neurol. 2025 ;16 1683198
       Objective: This study explored latent profiles of Health Information-Seeking Behavior (HISB) among stroke patients and analyzed its influencing factors.
    Methods: In this cross-sectional study, 311 stroke participants from two tertiary care hospitals in Gansu Province, China, were recruited between January and May 2025 using convenience sampling. Data were collected using a general information questionnaire, the Health Information-Seeking Behavior Scale, and the Health Behavior Decision-Making Assessment Scale for Stroke Patients. Latent profile analysis (LPA) was employed to identify distinct HISB profiles.
    Results: Three latent profiles were identified: the high-demand low-barrier positive group, the moderate-balanced group, and the low-demand high-barrier negative group. Key predictors of profile membership included age, education level, monthly personal income, and the presence of comorbid chronic diseases.
    Conclusion: The identification of three distinct HISB trait types provides an evidence-based foundation for developing personalized health education and tailored decision support interventions. Healthcare professionals can leverage this classification system to customize communication strategies for patients with different traits, deliver tiered information support, and ultimately empower patients to achieve better health behaviors and health outcomes.
    Keywords:  behavioral decision-making; health information-seeking behavior; influencing factors; latent profile analysis; stroke
    DOI:  https://doi.org/10.3389/fneur.2025.1683198
  32. BMC Bioinformatics. 2025 Dec 17.
      
    Keywords:  Biomedical text mining; Co-occurrence; Fine-tuning; Hypothesis evaluation; Large language models (LLM); Literature-based discovery (LBD); Retrieval-augmented generation (RAG)
    DOI:  https://doi.org/10.1186/s12859-025-06350-7