bims-librar 2025-09-21 papers

bims-librar

Biomed News

on Biomedical librarianship

Issue of 2025–09–21
35 papers selected by
Thomas Krichel, Open Library Society

Evaluation of a Popular Large Language Model in Orthopedic Literature Review: Comparison to Previously Published Reviews.
How can we improve the diversity of archival collections with AI? Opportunities, risks, and solutions.
İstanbul Seririyatı (1919-1952): Medical Periodical Digitalization, Index and Open Access Project.
Gold standard science requires gold standard scholarship.
Investigating patients' perceptions of ChatGPT as a health information resource: A qualitative study.
Too few health library workers: a national benchmarking study of staffing and structure in health libraries.
Search Broadly: The way you search the Internet can reinforce your beliefs-without you realizing it.
ChatGPT can provide satisfactory answers to patient questions regarding biceps tenodesis.
Examination of social media nutrition information related to multiple sclerosis: A cross-sectional social network analysis.
Reply letter to the editor regarding "how reliable are ChatGPT and Google's answers to frequently asked questions about unicondylar knee arthroplasty from a scientific perspective?"
Accuracy of AI chatbots in answering frequently asked questions on cervical cancer.
Improving the readability of trauma patient education materials: a ChatGPT solution demonstrated using materials by the Orthopaedic Trauma Association.
A comparative study of AI chatbots and traditional medical sources for hysterectomy patient education: Assessing professionalism, readability, and patient education quality.
AI-powered insights: Analyzing ChatGPT's responses on myofascial pain syndrome.
Correspondence to "Online Information on Lymphedema: Systematic Review of the Quality of Online Patient Resources".
Assessment of the Artificial Intelligence- Generated Fibromyalgia Information: Beyond the Hype.
Comparative Evaluation of ChatGPT-4o and Grok-3 on Cleft Lip and Palate and Presurgical Infant Orthopedics: A Multidisciplinary Assessment by Orthodontists, Pediatricians, and Plastic Surgeons.
Evaluating of the readability of ChatGPT generated responses on travel health risks.
Evaluation of artificial intelligence-based patient education models for irritable bowel syndrome.
Can Chatbots Provide Accurate and Readable Information for Patients With Temporomandibular Disorders?
Disparities in online patient education materials for rheumatic skin diseases.
Quality and readability of online resources for atopic dermatitis: a cross-sectional analysis.
Assessment of Online Patient Education Materials for Pancreatic Neuroendocrine Tumors.
Patient-Centred Web-Based Information on Head and Neck Squamous Cell Carcinoma: Quality and Readability.
Reliability and readability of online patient information for contact lens wearers.
Are videos uploaded by dental professionals on lip repositioning surgery of higher quality? A youtube video analysis.
Assessment of educational YouTube videos on proximal humeral fracture treatment (YouTube videos on proximal humeral fractures).
Web-Based Video Platforms as Sources of Information on Body Image Dissatisfaction in Adolescents: Content and Quality Analysis of a Cross-Sectional Study.
Quality and Reliability of Transarterial Chemoembolization Videos on TikTok and Bilibili: Cross-Sectional Content Analysis Study.
Evaluating the quality and reliability of YouTube videos about phenylketonuria.
Educational Quality and Reliability of YouTube Content Related to Musculoskeletal Ultrasound.
Short video platforms as sources of health information about HPV vaccine: A content and quality analysis.
Instagram videos provide limited information on complications and return to social life regarding total knee arthroplasty: A multilingual analysis.
How does oncologists' communication affect patients' well-being and online health information seeking? - A randomized experiment.
Online Health Information Seeking: Implications for Self-Management in Hypertension.

Arch Bone Jt Surg. 2025 ;13(8): 460-469

Evaluation of a Popular Large Language Model in Orthopedic Literature Review: Comparison to Previously Published Reviews.

Jie J Yao, Ryan D Lopez, Adam A Rizk, Manan Aggarwal, Surena Namdari.

   Objectives: Large language models (LLMs) may improve the process of conducting systematic literature reviews. Our aim was to evaluate the utility of one popular LLM chatbot, Chat Generative Pre-trained Transformer (ChatGPT), in systematic literature reviews when compared to traditionally conducted reviews.
Methods: We identified five systematic reviews published in the Journal of Bone and Joint Surgery from 2021 to 2022. We retrieved the clinical questions, methodologies, and included studies for each review. We evaluated ChatGPT's performance on three tasks. (1) For each published systematic review's core clinical question, ChatGPT designed a relevant database search strategy. (2) ChatGPT screened the abstracts of those articles identified by that search strategy for inclusion in a review. (3) For one systematic review, ChatGPT reviewed each individual manuscript identified after screening to identify those that fit inclusion criteria. We compared the performance of ChatGPT on each of these three tasks to the previously published systematic reviews.
Results: ChatGPT captured a median of 91% (interquartile range, IQR 84%, 94%) of articles in the published systematic reviews. After screening of these abstracts, ChatGPT was able to capture a median of 75% (IQR 70%, 79%) of articles included in the published systematic reviews. On in-depth screening of manuscripts, ChatGPT captured only 55% of target publications; however, this improved to 100% on review of the manuscripts that ChatGPT identified on this step. Qualitative analysis of ChatGPT's performance highlighted the importance of prompt design and engineering.
Conclusion: Using published reviews as a gold standard, ChatGPT demonstrated ability in replicating fundamental tasks for orthopedic systematic review. Cautious use and supervision of this general purpose LLM, ChatGPT, may aid in the process of systematic literature review. Further study and discussion regarding the role of LLMs in literature review is needed.

Keywords:  ChatGPT; Large language models; Orthopedics; Systematic review

DOI:  https://doi.org/10.22038/ABJS.2025.84896.3874
AI Soc. 2025 ;40(6): 4447-4459

How can we improve the diversity of archival collections with AI? Opportunities, risks, and solutions.

Lise Jaillant, Olivia Mitchell, Eric Ewoh-Opu, Maribel Hidalgo Urbaneja.

  This article is the first study to examine the impact (positive and negative) of Artificial Intelligence on the diversity of archival collections. Representing the diverse audiences they serve is a key objective for libraries and archives. For example, institutions with colonial-era archival documents are experimenting with AI to improve the discoverability of their collections and to enhance access for source communities and other users. Indeed, AI can be used to automatically create metadata, search vast amounts of historical records, and answer questions with natural language. However, these technologies also come with risks-for instance when AI systems are trained on potentially biased data. Very little is known about the impact of these computational tools on diversity in archival collections. Do AI technologies compound or alleviate the lack of diversity in archives? Drawing from interviews with academics, archivists, curators, and other experts across the UK/Europe and the USA, this article sheds light on the lack of collaboration between producers of AI technologies on the one side, and archivists, librarians and other cultural heritage professionals on the other side. We argue that bringing these stakeholders together is essential to improve the diversity of archival collections, using ethical and responsible AI. Finally, we offer recommendations to help professionals in libraries and archives assess the opportunities and risks associated with AI and find solutions to make their collections more representative of diverse audiences.

Keywords:  Archives; Artificial intelligence; Diversity; Ethics

DOI:  https://doi.org/10.1007/s00146-025-02222-z
Noro Psikiyatr Ars. 2025 ;62(3): 286-289

İstanbul Seririyatı (1919-1952): Medical Periodical Digitalization, Index and Open Access Project.

Cem Hakan Başaran, Fatih Artvinli.

   Introduction: İstanbul Seririyatı (1919-1952) was a pioneering and comprehensive medical journal in the field of neuropsychiatry in Türkiye. Published monthly for 33 years, the journal comprises a total of 389 issues and over 10,000 pages. This project aimed to digitize the entire archive of the journal and make it freely accessible. This article provides an overview of the journal "Istanbul Seririyati" and the website www.istanbulseririyati.com, where its archive has been recently made available online, also addressing its historical context and significance.
Methods: The project, which spanned approximately six years, focused on locating all issues of the journal and compiling a complete collection. The primary goal was to obtain the most difficult-to-find Ottoman Turkish issues published between 1919 and 1929, which were collected from various individuals, institutions, libraries, antiquarian booksellers, auctions, and online marketplaces. Once acquired, they were professionally scanned and converted into PDF format. From 1929 onwards, the journal was published in Latin-script Turkish, and Optical Character Recognition (OCR) technology was applied to facilitate text searchability whenever possible. The project was structured in four phases: identifying and gathering all journal issues, scanning and digitalizing them, creating a detailed index for each issue, and establishing an online platform for free and open access to the archive. For each issue, the medical section has been indexed with details including the author, title, and page numbers, and a structured keyword system was developed to enhance searchability within the archive.
Results: The complete archive of İstanbul Seririyatı (www.istanbulseririyati.com) has now been made available online. The website offers advanced search functionalities based on year, issue, topic, author, concept, and keyword, ensuring ease of use for both researchers and enthusiasts. Users can read journal issues online and also download them. The website's blog section features articles exploring İstanbul Seririyatı's historical legacy, examples from various years, and in-depth discussions of its content. Moreover, selected articles from the 1919-1929 Ottoman Turkish issues have been transliterated into modern Turkish, making them more accessible to contemporary readers. It can be said that a serious historical gap in this field has been filled with online access to İstanbul Seririyatı, which sheds light on the birth and development years of neuropsychiatry in Türkiye.
Conclusion: İstanbul Seririyatı serves as a vital resource for tracking discussions and transformations in neuropsychiatry and various other branches of medicine. The journal was organized into two main sections: medical and paramedical. By bringing together physicians from various medical disciplines, particularly neuropsychiatry, İstanbul Seririyatı served as a platform that functioned like a school of thought, allowing young doctors to publish their first works and research, ultimately shaping the future of the profession. The digitalization of such rare collections ensures accessibility to valuable resources while preserving cultural heritage and securely transmitting it to future generations. It is hoped that this initiative will benefit not only today's researchers but also future generations, as İstanbul Seririyatı is now accessible to the neuropsychiatry community and anyone interested in the accumulation and legacy of medical knowledge.

Keywords:  Mazhar Osman; Neuropsychiatry; history of medicine; history of neurology; history of psychiatry; Şişli müsamereleri

DOI:  https://doi.org/10.29399/npa.29035
Science. 2025 Sep 18. 389(6766): 1165

Gold standard science requires gold standard scholarship.

H Holden Thorp.

Scientists often casually refer to research and "library work" as separate endeavors. Research involves the execution of experiments in the laboratory whereas library work means finding references to relevant studies in the literature and analyzing them-often as a precursor to writing a paper. Treating careful scholarship as somehow less important than the acquisition of data can adversely affect the reliability of the scientific record and consequently, the course of science. In today's tense environment around science and politics, meticulous scholarship has never been more important.

DOI: https://doi.org/10.1126/science.aec2360
Nutr Health. 2025 Sep 15. 2601060251376091

Investigating patients' perceptions of ChatGPT as a health information resource: A qualitative study.

Miznah Hizam AlShammary, Norah Mohammed Alyahya, Eman M Alanazi, Abdullah Aldaeej, Aljouharah Mohammed Alanazi, Walaa Hassan, Tamer Farag, Sager Mohammed Alanazi, Hamad Mohammed Al Otaibi, Salem Albagmi, Wejdan M Arif, Amal Mubarak Bakhshwain, Afnan Fahd Almuhanna, Fahad Alanezi.

  BackgroundThe rapid adoption of artificial intelligence-powered tools like ChatGPT has introduced new avenues for patients to access health information independently. Understanding how patients perceive and engage with such tools is essential to evaluating their trustworthiness, usability, and potential impact on health decision-making.AimThe purpose of this study is to investigate the facilitators and barriers of using ChatGPT as a health information resource for patients' health management.MethodsA qualitative research design was adopted in this study. The participants included outpatients at a public hospital. Participants interacted with ChatGPT (version 3.5) for at least 15 min daily over 2 weeks to explore health-related topics before participating in semi-structured interviews. A total of 28 outpatients participated in the interviews.ResultsThe findings from this study have indicated both positive and negative aspects of ChatGPT as a health information resource. Among the 28 participants, the most frequently reported facilitators included improved health literacy (reported by 26 participants, 92.9%), effectiveness and efficiency (24 participants, 85.7%), cost-effectiveness (23 participants, 82.1%), accessibility (17 participants, 60.7%), empowerment (13 participants, 46.4%), and anonymity (11 participants, 39.3%). Reported barriers included lack of personalized information (15 participants, 53.6%), limited reliability (9 participants, 32.1%), restricted diagnostic capability (6 participants, 21.4%), lack of human interaction (14 participants, 50%), privacy concerns (4 participants, 14.3%), legal and ethical issues (9 participants, 32.1%), and lack of emotional support (3 participants, 10.7%).ConclusionAlthough ChatGPT has significant benefits of being used as a health information resource, to arrive at specific conclusions, there is a need to extend these kinds of studies across the regions to assess the impact of ChatGPT on different populations for promoting health literacy.

Keywords:  ChatGPT; artificial intelligence; awareness; health information; health literacy; public

DOI:  https://doi.org/10.1177/02601060251376091
Aust Health Rev. 2025 Sep 18.

Too few health library workers: a national benchmarking study of staffing and structure in health libraries.

Alice Anderson, Caroline Ondracek.

ObjectiveThis research presents a benchmarking study of staffing levels and reporting structures in libraries that support evidence-based health care, and deliver education and research support services within the Australian health system.MethodsBenchmarking data were collected through a two-phase approach. First, a set of questions was distributed via email to health libraries across Australia, using a national health libraries e-list and professional networks. Second, an international literature review was conducted to examine workforce composition and organisational structures in health libraries over the past 10years.ResultsThis study reveals that Australian health libraries operate with staffing levels approximately 34% below the country's national guidelines. The recommended ratio of 1 health library staff member per 1250 institutional full-time equivalent is proposed to guide workforce planning. Reporting structures vary widely, with libraries most commonly reporting to corporate divisions. However, reporting to clinical, education or research-aligned portfolios was associated with stronger advocacy and strategic alignment.ConclusionsAustralian health libraries play a critical role in supporting clinical decision-making, research and education. Despite their importance, health libraries are increasingly under-resourced, threatening equitable access to evidence and information services. Strategic investment and targeted funding are needed to address the workforce shortfall. Reporting structures should be aligned with clinical or research functions to enhance visibility and support.

DOI: https://doi.org/10.1071/AH25200
Sci Am. 2025 Oct 01. 333(3): 20

Search Broadly: The way you search the Internet can reinforce your beliefs-without you realizing it.

Simon Makin.

DOI: https://doi.org/10.1038/scientificamerican102025-5gMojea7p79ihag0ZZZJpg
JSES Int. 2025 Jul;9(4): 1378-1384

ChatGPT can provide satisfactory answers to patient questions regarding biceps tenodesis.

Darcy I Ottomanelli, Paxton G Sweeney, Stephen G Silver, Rocco Bassora, Eitan M Kohan, Oscar Vazquez.

   Background: Shoulder pain often arises from pathology affecting the long head of the biceps brachii tendon, with biceps tenodesis recognized as a widely-accepted treatment. Research indicates a significant correlation between preoperative expectations and postoperative satisfaction, emphasizing the pivotal role of patient education. Although advancements in artificial intelligence (AI) technology, such as ChatGPT, hold promise for providing comprehensive health-related information, the accuracy can vary. This study aims to assess the quality and accuracy of ChatGPT's responses to patient inquiries about biceps tenodesis, offering insight on AI's contribution to enhancing patients' understanding and expectation management.
Methods: A list of the 10 most frequently asked patient questions regarding biceps tenodesis were identified from the websites of various orthopedic institutions. Each question was individually entered into the most current version of ChatGPT at the time of data collection (v3.5), and the responses were recorded. The responses were then reviewed by four board-certified orthopedic surgeons and graded based on two evidence-based rating systems: the ChatGPT response rating system and the AI response metric. Additionally, the reading level required to fully comprehend the responses was calculated using the Flesch-Kincaid Grade Level assessment.
Results: Of the 10 responses evaluated, 1 was deemed excellent, requiring no clarification, and 7 were satisfactory, needing minimal clarification according to the ChatGPT response rating system. Two out of 10 responses were rated by surgeons as excellent, clear, comprehensive, and aligned with current literature. Seven responses were deemed good, being mostly clear and complete and in line with current literature. On average, a Flesch-Kincaid score indicated that 18.5 years of education were required to fully comprehend the responses, corresponding to the level of a college graduate.
Discussion/Conclusion: Most of the ChatGPT responses can provide satisfactory but limited information to answer patients' questions about biceps tenodesis. The reading level required to comprehend the response was too advanced for the average patient, leading to potential misunderstanding and misinterpretation of the response. Recognizing the limitations of ChatGPT to accurately and comprehensively answer patient questions can help guide surgeons when approaching discussions with patients regarding biceps tenodesis.

Keywords:  Artificial intelligence; Biceps tenodesis; ChatGPT; Frequently asked questions; Patient education; Patient satisfaction

DOI:  https://doi.org/10.1016/j.jseint.2025.04.013
Public Health Nutr. 2025 Sep 18. 1-27

Examination of social media nutrition information related to multiple sclerosis: A cross-sectional social network analysis.

Yasmine Probst, Emiliana Saffioti, Sarah Manche, Melissa Eaton.

   OBJECTIVE: Multiple sclerosis (MS) is a chronic neurodegenerative condition with increasing global prevalence. People living with multiple sclerosis (plwMS) have reported limited guidance relating to nutrition information. Paired with varied health literacy levels, this makes plwMS susceptible to nutrition misinformation.
DESIGN: A cross-sectional online social network analysis (SNA) examining nutrition information for MS.
SETTING: A systematic SNA using Twitter/X and YouTube platforms using NodeXL to summarise metrics. Quality was assessed using the QUEST tool. Content analysis of YouTube videos was synthesised into themes for misinformation.
PARTICIPANTS: Online publicly available social media user posts and video content.
RESULTS: Twitter/X SNA revealed keywords were used most by an account representing 72.8% of the user network with common diet mentions including Wahls (57 times), paleo (15 times) and ketogenic (11 times). 'Favourite count' metrics were strongly correlated with 'repost count' (r=0.83, p=0.000). Videos which endorsed a diet were more likely to have a lower QUEST score. User engagement metrics were higher for lower quality videos. The quality of online nutrition information relating to MS was moderate (61%). Physicians were the most likely source of nutrition information endorsing a diet for MS. The content analysis identified a knowledge gap for both medical professionals and plwMS.
CONCLUSIONS: Nutrition misinformation for MS occurs on social media and information quality is variable. Audiences need to be cautioned about users with large followings and evaluate the credibility of all information. This study reiterates the importance of evidence-based information for the MS community.

Keywords:  Multiple Sclerosis; nutrition information; online; social media

DOI:  https://doi.org/10.1017/S1368980025100943
J Orthop Surg (Hong Kong). 2025 Sep-Dec;33(3):33(3): 10225536251368567

Reply letter to the editor regarding "how reliable are ChatGPT and Google's answers to frequently asked questions about unicondylar knee arthroplasty from a scientific perspective?"

Ali Aydilek, Ömer Levent Karadamar.

DOI: https://doi.org/10.1177/10225536251368567
Front Artif Intell. 2025 ;8 1655303

Accuracy of AI chatbots in answering frequently asked questions on cervical cancer.

Jielin Fan, Wenhong Xiao, Zhipeng Yan, Qiang Ouyang.

   Objective: To compare the accuracy of Deepseek and ChatGPT in answering frequently asked questions (FAQs) about cervical cancer.
Methods: To compile a list of FAQs concerning cervical cancer, a comprehensive search was conducted on social media and community platforms. The answer keys for all the selected questions were created on the basis of the guidelines of the National Comprehensive Cancer Network (NCCN), the International Federation of Gynecology and Obstetrics (FIGO), and the World Health Organization (WHO) for cervical cancer. The answers given by Deepseek-R1 and ChatGPT O1 were scored according to the Global Quality Score (GQS).
Results: A total of 74 FAQs covered a diverse range of topics related to cervical cancer, including diagnosis (n = 16), risk factors and epidemiology (n = 19), treatment (n = 20), and prevention (n = 19). When all the answers provided by DeepSeek to the FAQs about cervical cancer according to the GQS were evaluated, 68 answers were rated as score five, 4 answers were rated as score four, and 2 answers were rated as score three. For ChatGPT's responses to the same set of FAQs, 67 answers were classified as score five, 6 answers were classified as score four, and 1 answer was classified as score three. There was no statistically significant difference between the two groups (p > 0.05).
Conclusion: Both DeepSeek and ChatGPT demonstrated accurate and satisfactory responses to FAQs about cervical cancer when evaluated according to the GQS. However, in regard to treatment issues, a cautious attitude should be maintained. Compared to ChatGPT, DeepSeek stands out for its free availability, which makes it more accessible in resource-limited scenarios to the public.

Keywords:  ChatGPT; DeepSeek; artificial intelligence; cervical cancer; frequently asked questions

DOI:  https://doi.org/10.3389/frai.2025.1655303
OTA Int. 2025 Dec;8(4): e417

Improving the readability of trauma patient education materials: a ChatGPT solution demonstrated using materials by the Orthopaedic Trauma Association.

Oscar Covarrubias, Diane Ghanem, Christopher Murdock, Christopher Domes, Babar Shafiq.

   Introduction: ChatGPT is an artificial intelligence language model capable of understanding, contextualizing, and generating human-like text. The purpose of this study was to assess the ability of ChatGPT to rewrite orthopaedic trauma patient education materials at the recommended sixth grade level.
Methods: The academic grade level of each of the 41 Orthopaedic Trauma Association (OTA/AO) online patient education articles was evaluated using the Flesh-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE). Each article was then provided to ChatGPT along with instructions to simplify the readability of the text to a sixth grade level. The FKGL and FRE of the ChatGPT revised articles were calculated and compared with the original articles. Two orthopaedic trauma surgeons assessed the content of the revised articles and categorized them as "accurate," "refinable," or "insufficient" based on the preservation of information from the original articles.
Results: ChatGPT significantly reduced the FKGL (8.2 ± 1.1‒5.7 ± 0.5, P < 0.001) and increased the FRE (65.5 ± 6.6‒76.4 ± 5.7, P < 0.001) of the OTA/AO patient education articles. Twenty-nine (70.7%) revised articles were accurate without modifications. Three (7.3%) articles required minor modifications, and 9 (22%) articles required substantial edits.
Conclusion: ChatGPT can be used to simplify and enhance the readability of patient education materials. The average readability of the OTA/AO educational articles was changed from an eighth grade to a fifth grade level. However, nearly a third of the ChatGPT revised articles required revisions due to content omissions thus highlighting the importance of expert review.
Level of Evidence: NA.

Keywords:  ChatGPT; artificial intelligence; large language models; patient education; readability; trauma education

DOI:  https://doi.org/10.1097/OI9.0000000000000417
Medicine (Baltimore). 2025 Sep 12. 104(37): e44403

A comparative study of AI chatbots and traditional medical sources for hysterectomy patient education: Assessing professionalism, readability, and patient education quality.

Guanghua Zhou, Yang Wang, Xiangming Che.

  This study compared professionalism, readability, and patient education quality between artificial intelligence (AI)-generated responses (ChatGPT and Gemini) and the American Society of Anesthesiologists (ASA) website for 8 frequently asked questions. To compare the differences in professionalism, readability, and patient education quality between AI (ChatGPT and Gemini) and the ASA website when answering 8 common hysterectomy questions, and to assess whether AI-generated content can serve as a reliable source of patient education for hysterectomy. Blinded experts evaluated professionalism, while 6 readability indices and the patient education materials assessment tool were used to assess content quality. Statistical comparisons were performed with P <.05 considered significant. ChatGPT and Gemini demonstrated significantly higher professionalism scores than the ASA website (P <.05); however, their readability was lower (P <.05). There were no significant differences in professionalism or readability between the ChatGPT and Gemini (P >.05). Although AI responses align with clinical guidelines, their low readability poses a usability concern. AI-driven content provides professional and accurate patient education on hysterectomy. However, further refinements are required to improve accessibility without compromising quality.

Keywords:  AI chatbots; ChatGPT; PEMAT score; gemini; generative artificial intelligence; healthcare communication; hysterectomy; medical information quality; patient education; professionalism in AI; readability assessment

DOI:  https://doi.org/10.1097/MD.0000000000044403
J Bodyw Mov Ther. 2025 Oct;pii: S1360-8592(25)00196-2. [Epub ahead of print]44 558-563

AI-powered insights: Analyzing ChatGPT's responses on myofascial pain syndrome.

Sidar Burcu Ateş Demiroglu, Özge Özpolat Bulut, Fatih Bağcier.

  This study examines the quality, completeness, accuracy, and readability of responses generated by ChatGPT on Myofascial Pain Syndrome (MPS), a common chronic pain condition characterized by muscle pain and tenderness. Given the increasing reliance on AI chatbots for health information, the study aims to evaluate the suitability of ChatGPT in providing accessible and reliable information on MPS. Using Google Trends data, we identified the most frequently searched keywords related to MPS and entered them into the GPT-4 version of ChatGPT. The responses were evaluated with the Enhanced Quality Information Profile (EQIP) scale, Likert scales, and Flesch-Kincaid readability metrics. Results indicated that while ChatGPT's responses generally scored well in accuracy, they displayed variability in readability, suggesting a range of accessibility levels for different audience segments. The study identified the Philippines, Thailand, and the United States as the top three countries searching for MPS-related information. Despite promising results in information accessibility, ChatGPT's responses lack the depth required for comprehensive patient care and cannot substitute for professional medical consultation. Enhancements in quality control, along with the use of reliable medical sources, are recommended to improve the chatbot's capacity to provide accurate and comprehensible health information. This study underscores the importance of integrating human oversight in AI systems to better serve the public's health information needs.

Keywords:  AI chatbot; ChatGPT; Myofascial pain syndrome; Patient information

DOI:  https://doi.org/10.1016/j.jbmt.2025.05.043
J Cancer Educ. 2025 Sep 16.

Correspondence to "Online Information on Lymphedema: Systematic Review of the Quality of Online Patient Resources".

Marco Marcasciano, Giuseppe Antonio D'Amico, Martina Astolfi, Diego Ribuffo, Federico Lo Torto.

  Lymphedema represents a chronic and debilitating disorder affecting hundreds of millions of people worldwide, with increasing trends due to population ageing and rising cancer incidence rates. The quality of online information patients can have access to seems to be critical in guiding them and their choices. The Ensuring Quality Information for Patients tool helps us in highlighting the lack of comprehensive coverage of all the relevant topics and procedural benefits for lymphedema disease. In this regard, surgeons and specialists still have a central role in actively leading their patients in navigating this complex field, warning them of possible misinformation and supporting them during counseling.

Keywords:  EQIP; LVA; Lymph-node transfer; Lymphedema

DOI:  https://doi.org/10.1007/s13187-025-02734-8
Arch Rheumatol. 2025 Sep 01. 40(3): 358-364

Assessment of the Artificial Intelligence- Generated Fibromyalgia Information: Beyond the Hype.

Mert Zure, Ahmet Kıvanç Menekşeoğlu.

Background/Aims: Individuals increasingly turn to artificial intelligence (AI) chatbots for health-related information; however, the accuracy and usability of their responses remain uncertain. This study assessed the quality, comprehensiveness, and readability of responses from 6 AI chatbots-ChatGPT-3.5, ChatGPT-4o (OpenAI), Copilot AI (Microsoft), Perplexity AI (Perplexity.AI), Gemini AI (Google), and ChatSonic AI (Writesonic)-to the most commonly searched fibromyalgia-related queries. Materials and Methods: The top 10 most frequently searched fibromyalgia-related questions from the past 2 years were retrieved from the Google Trends database. Each chatbot was queried separately, and a total of 60 responses (10 per chatbot) were assessed both qualitatively and quantitatively by 2 reviewers, focusing on content quality, accuracy, readability, and alignment with evidence-based guidelines. Results: ChatGPT-3.5 had the lowest Ensuring Quality Information for Patients score (20.6 ± 4.5), indicating very low quality information, while Gemini achieved the highest (40.5 ± 5), which was still classified as low quality. Understandability was moderate for Copilot, Gemini, and Perplexity (67.2) but lowest for ChatGPT-3.5 (43.2 ± 10.2). Actionability was weak and the misinformation assessment revealed a moderate level across all chatbots. Readability scores indicated university-level complexity, with ChatGPT-4o having the lowest Reading Ease score (11.3 ± 11.2) and Copilot the highest (30.3 ± 13.2). Conclusion: While AI chatbots provide accessible health information, their accuracy and depth vary. Gemini, Copilot, and Perplexity AI showed better quality, but citation inconsistencies, readability challenges, and misinformation risks highlight the need for refinement beyond the hype. Clinicians should guide fibromyalgia patients in critically assessing AI-generated health content. Future research should explore improvements in AI chatbot applicability for medical inquiries.

DOI: https://doi.org/10.5152/ArchRheumatol.2025.11149
Cleft Palate Craniofac J. 2025 Sep 16. 10556656251378591

Comparative Evaluation of ChatGPT-4o and Grok-3 on Cleft Lip and Palate and Presurgical Infant Orthopedics: A Multidisciplinary Assessment by Orthodontists, Pediatricians, and Plastic Surgeons.

Esra Ekizer, Kevser Kurt Demirsoy, Süleyman Kutalmış Büyük, Semih Canpolat, Ahmet Bilirer.

  Objective: This study aimed to evaluate and compare the accuracy, clarity, and clinical applicability of 2 state-of-the-art large language models (LLMs), Chat Generative Pretrained Transformer (ChatGPT)-4o and Grok-3, in generating health information related to cleft lip and palate (CLP) and presurgical infant orthopedics (PSIO). To ensure a multidisciplinary perspective, experts from orthodontics, pediatrics, and plastic surgery independently evaluated the responses. Methods: Six structured questions addressing general and presurgical aspects of CLP were submitted to both ChatGPT-4o and Grok-3. Forty-five blinded specialists (15 from each specialty) assessed the 12 generated responses using 2 validated instruments: the DISCERN tool and the Global Quality Scale (GQS). We conducted interspecialty comparisons to explore variations in model evaluation. Results: We observed no statistically significant differences between ChatGPT-4o and Grok-3 in DISCERN or GQS scores (P > .05). However, pediatricians consistently assigned higher ratings than orthodontists and plastic surgeons in terms of reliability, clarity, and treatment-related content. Patient-directed questions received higher overall scores than those aimed at healthcare professionals. Grok-3 performed slightly better on questions about PSIO, whereas ChatGPT-4o provided more comprehensive and structured answers. Conclusion: Both LLMs demonstrated notable potential in producing readable, informative responses about CLP and PSIO. While they may aid in patient communication and support clinical education, professional oversight remains critical to ensure medical accuracy. The inclusion of Grok-3 in this orthodontic evaluation provides valuable insights and sets the stage for future research on artificial intelligence integration in interdisciplinary cleft care.

Keywords:  ChatGPT-4o; Grok-3; artificial intelligence; cleft lip and palate; presurgical infant orthopedics

DOI:  https://doi.org/10.1177/10556656251378591
Int Marit Health. 2025 Sep 16.

Evaluating of the readability of ChatGPT generated responses on travel health risks.

Zeynep Demirtaş Özyılmaz.

   BACKGROUND: Impressive advances in artificial intelligence have given travellers a new source of health information. AI-powered tools, such as ChatGPT, allow users to obtain health information in a fast and accessible way. The aim of this study was to assess the readability of ChatGPT responses to questions about health risks when travelling.
MATERIAL AND METHODS: Ten Questions about the health risks when travelling in the "Questions and Answers" section of the World Health Organisation's website have been asked 10 times to ChatGPT. A total of 100 answers was obtained and analyzed for readability.
RESULTS: The mean ± SD of Flesch Reading Ease was 35.82 ± 6.46, Flesch-Kincaid grade level was 13.25 ± 1.45, Simple Measure of Gobbledygook was 12.34 ± 1.29, Gunning Fog Index was 13.77 ± 1.34, Coleman-Liau Index was 14.52 ± 1.09, Automated Readability Index was 14.93 ± 1.81.
CONCLUSIONS: The readability of the answers produced by ChatGPT was 'difficult' and a college level education is required to understand the text. Lack of understanding of information can reduce the likelihood of travellers making good health decisions. To improve the understandability of ChatGPT responses, it may be useful to generate responses at a significantly lower reading level.

Keywords:  ChatGPT; readability; travel health risks

DOI:  https://doi.org/10.5603/imh.104400
Indian J Gastroenterol. 2025 Sep 20.

Evaluation of artificial intelligence-based patient education models for irritable bowel syndrome.

Anand Kumar Raghavendran, Balaji Musunuri, Siddheesh Rajpurohit, Ganesh Pai C, Shiran Shetty, Pretty Kumari, Rakshand Shetty, Athish Shetty, Ganesh Bhat.

   BACKGROUND: Irritable bowel syndrome (IBS) is a common functional gastrointestinal disorder with a significant psycho-social burden. Despite medical advancements, patient education on IBS remains inadequate. This study compared two large language models (LLMs)-ChatGPT-4 and Gemini-1-for their performance in addressing IBS-related patient queries.
METHODS: Thirty-nine IBS-related frequently asked questions (FAQs) from IBS organizations and hospital websites were categorized into six domains: general understanding, symptoms and diagnosis, causes, dietary considerations, treatment and lifestyle factors. Responses from ChatGPT-4 and Gemini-1 were evaluated by two independent gastroenterologists for comprehensiveness and accuracy, with a third reviewer resolving disagreements. Readability was measured using five standardized indices (Flesch Reading Ease [FRE], Simple Measure of Gobbledygook [SMOG], Gunning Fog Index [GFI], Automated Readability Index [ARI], Reading Level Consensus [ARC]) and empathy was rated on a 4-point Likert scale by three reviewers.
RESULTS: Gemini produced comprehensive and accurate answers for 94.9% (37/39) of questions, with two rated as mixed (vague/outdated). ChatGPT achieved 89.7% (35/39) comprehensive responses, with four rated mixed. Domain-wise, both models performed best in "symptoms and diagnosis" and "treatment", while mixed responses were most frequent in "general understanding" and "lifestyle". There was no significant difference in comprehensiveness (p = 0.67). Readability analysis showed both LLMs generated difficult-to-read content: Gemini's FRE score was 35.83 ± 3.31 vs. ChatGPT's 32.33 ± 5.57 (p = 0.21), corresponding to college-level proficiency. ChatGPT's responses were more empathetic, with all responses rated moderately empathetic; Gemini was mostly rated minimally empathetic (66.7%).
CONCLUSION: While ChatGPT and Gemini provided extensive information, their limitations-such as complex language and occasional inaccuracies-must be addressed. Future improvements should focus on enhancing readability, contextual relevance and accuracy to better meet the diverse needs of patients and clinicians.

Keywords:  Artificial intelligence; Empathy; Health literacy; IBS; Patient education

DOI:  https://doi.org/10.1007/s12664-025-01872-7
J Oral Maxillofac Surg. 2025 Sep 03. pii: S0278-2391(25)00736-0. [Epub ahead of print]

Can Chatbots Provide Accurate and Readable Information for Patients With Temporomandibular Disorders?

Luís Eduardo Charles Pagotto, Dennys Ramon de Melo Fernandes Almeida, Thiago de Santana Santos, Everton Freitas de Morais.

BACKGROUND: Temporomandibular disorders (TMDs) are common musculoskeletal and neuromuscular conditions that impair jaw function and quality of life. Patients often lack access to reliable health information. Large language models (LLMs) have introduced chatbots as potential educational tools, yet concerns remain regarding accuracy, readability, empathy, and citation integrity.
PURPOSE: This study evaluated whether LLM-based chatbots can provide clinically accurate, empathic, and readable responses to patient-friendly questions about TMDs and whether their cited references are authentic.
STUDY DESIGN, SETTING, SAMPLE: This cross-sectional in silico study was conducted in March 2025. Twenty-three standardized TMD-related questions were used as prompts for each chatbot.
PREDICTOR/EXPOSURE/INDEPENDENT VARIABLE: The predictor variable was the chatbot platform, reflecting distinct LLM architectures: GPT-4 (transformer-based autoregressive model, OpenAI), Gemini Pro (multimodal transformer, Google), and DeepSeek-V3 (mixture-of-experts transformer, DeepSeek).
MAIN OUTCOME VARIABLES: Accuracy was defined as the proportion of responses judged clinically correct by two board-certified oral medicine specialists. Empathy was assessed by expert scoring of tone. Readability was determined with Flesch-Kincaid Reading Ease and Grade Level. Citation reliability was assessed by verifying whether references were authentic and retrievable in PubMed or other authoritative databases.
COVARIATES: No formal covariates were included; exploratory correlations between variables were performed.
ANALYSES: Descriptive statistics, 1-way Analysis of Variance with Tukey's post hoc tests, Pearson correlation, and χ2 tests were performed. Statistical significance was set at P < .05.
RESULTS: No statistically significant differences were observed in accuracy (P = .2) or empathy (P = .2). The mixture-of-experts transformer provided the most readable content (Flesch-Kincaid Reading Ease = 28.47; Flesch-Kincaid Grade Level = 12.19; P < .001). The transformer-based autoregressive model produced the highest proportion of hallucinated references (47.2%), compared with the multimodal transformer (18.8%) and the mixture-of-experts transformer (10.1%) (P < .001). A weak positive correlation was found between accuracy and readability (r = 0.27; P = .03), with no correlation between accuracy and empathy.
CONCLUSIONS AND RELEVANCE: While all LLM-based chatbots delivered generally accurate and empathetic responses, the mixture-of-experts transformer outperformed others in readability and citation reliability. The high rate of hallucinated references in the transformer-based autoregressive model underscores the need for human oversight in clinical applications.

DOI: https://doi.org/10.1016/j.joms.2025.08.012
Br J Dermatol. 2025 Sep 15. pii: ljaf371. [Epub ahead of print]

Disparities in online patient education materials for rheumatic skin diseases.

Jennifer Foster, Priya Sarlashkar, Grace Lu, Noelle Teske, Benjamin Chong, Heidi T Jacobe.

DOI: https://doi.org/10.1093/bjd/ljaf371
Clin Exp Dermatol. 2025 Sep 18. pii: llaf419. [Epub ahead of print]

Quality and readability of online resources for atopic dermatitis: a cross-sectional analysis.

Christopher Shenouda, Eric McMullen, Grace Xiong, Sana Gupta, Dea Metko, Andy Cui, Joony Hong, Shireen Dumont, Ilya Mukovozov.

DOI: https://doi.org/10.1093/ced/llaf419
Am Surg. 2025 Sep 17. 31348251381623

Assessment of Online Patient Education Materials for Pancreatic Neuroendocrine Tumors.

Brendan Dolan, Miguel Tzita, Miguel Tobon, Najeeb Al Hallak, Asfar Azmi, Lauren Hamel, Eliza W Beal.

  BackgroundIncidence of Pancreatic Neuroendocrine Tumors (PNET) has increased in recent decades. In navigating health diagnoses like pNETs, patients are increasingly turning to the internet for information. This study aims to provide a comprehensive overview of Patient Education Materials (PEMs) specific to pNETs using 6 primary criteria for evaluation: Quality, Understandability, Actionability, Readability, Comprehensiveness/Adherence to clinical guidelines, and Accountability.Methods36 unique web pages were selected using 9 different web browser/search engine combinations. Quality was evaluated using the DISCERN instrument, understandability and actionability with the PEMAT-P tool, readability with the Flesch-Kincaid Reading Ease algorithm, and comprehensiveness/adherence to clinical guidelines and accountability with author generated criteria. Scores were categorized based on affiliation to either a foundation, academic, or commercial publishing source, and by search position.ResultsOf the 36 web pages evaluated, 8 were published by foundations, 23 by academic sources and 5 by commercial sources. The mean understandability score for all sources using PEMAT-P was 75.45% (SD 10.89%), and actionability was 19.44% (SD 25.25%). The mean Flesch-Kincaid Reading Ease Score for all sources was 46.11 (SD 12.71), equivalent to a college reading level. Additionally, significant differences were found between the accountability scores for foundation (mean 1.75, SD 1.75), academic (mean 0.87, SD 1.49), and commercial (mean 3.2, SD 0.82) categories.DiscussionThis study reveals many shortcomings of online PEMs for PNETs, including average reading grade level and PEMAT-P actionability scores well below recommended standards. Academic web pages also demonstrated the lowest accountability scores to a statistically significant degree, indicating a need for that category of sources to increase transparency on author information and sources.

Keywords:  online health resources; pancreatic neuroendocrine tumor

DOI:  https://doi.org/10.1177/00031348251381623
Oral Dis. 2025 Sep 18.

Patient-Centred Web-Based Information on Head and Neck Squamous Cell Carcinoma: Quality and Readability.

Briana Jansen, Stella Mullane, Bryan Tan, Bobby Joseph, Mohammed Junaid, Ramesh Balasubramaniam, Agnieszka Frydrych, Omar Kujan.

   INTRODUCTION: The internet is a widely used source of health information for patients with head and neck cancer. However, the quality and readability of online content remain inconsistent. This study evaluated the usefulness of web-based resources by assessing their quality and readability.
METHODS: Searches were conducted using Google, Bing, and Yahoo! with nine common anatomical terms related to head and neck cancer. The first 50 results from each search engine were screened, and eligible websites were evaluated for quality using the DISCERN instrument by three independent reviewers. Readability was assessed using the Flesch-Kincaid Reading Grade Level (FKRGL) and the Flesch Reading Ease Score (FRES). Descriptive and inferential statistics were applied.
RESULTS: A total of 285 websites met the inclusion criteria. Of these, 46% were rated as poor quality (DISCERN score = 1). The median FKRGL was 8.6, and the median FRES was 55.7, both indicating reading levels above recommended thresholds for patient education materials.
CONCLUSIONS: Online information for patients with head and neck squamous cell carcinoma is often of low quality and too complex for the average reader. Improved, accessible, and reliable web-based resources are needed to support patient understanding and informed healthcare decisions.

Keywords:  head and neck cancer; readability; search engines; web‐based information

DOI:  https://doi.org/10.1111/odi.70098
Cont Lens Anterior Eye. 2025 Sep 13. pii: S1367-0484(25)00147-X. [Epub ahead of print] 102513

Reliability and readability of online patient information for contact lens wearers.

Genis Cardona, Carla Vega.

   PURPOSE: This study aimed to assess the reliability and readability of online patient information regarding contact lens (CL) wear and maintenance, given that many users may employ these resources to supplement or replace professional advice.
METHODS: Ten frequently asked questions (FAQs) concerning CL wear and maintenance were formulated based on clinical experience and literature search. Each FAQ was used to query Google, and the first 20 eligible websites were analysed, yielding a final sample of 200 websites. Reliability was assessed using the short version of the Ensuring Quality Information for Patients (EQIP) tool, while readability was evaluated through the Flesch Reading Ease Score (FRES) and the Flesch-Kincaid Grade Level (FKGL) tests. Websites were classified by country of origin and source type. Non-parametric group contrast and variable correlation analyses were conducted.
RESULTS: The median EQIP score was 68.0 % (range 29.0 %-90.0 %), with 30.0 % of websites providing high-quality content (≥75 %). Websites from encyclopaedias and medical centres/hospitals scored higher in reliability compared to commercial and practitioner sources (p < 0.05). Readability was generally poor, with mean FRES and FKGL values of 55.8 ± 11.3 and 9.9 ± 2.3, respectively, exceeding recommended reading levels. Unexplained technical jargon was found in 59.5 % of websites. Encyclopaedias demonstrated better readability scores than news centres (p = 0.036). A weak but significant inverse correlation was found between EQIP and FRES scores (rho = -0.215; p = 0.002), indicating that higher reliability was associated with slightly better readability.
CONCLUSION: Overall, online patient information regarding CL wear and maintenance evidenced moderately high reliability but insufficient readability. Contact lens wearers may find this information difficult to understand, leading to poor compliance and potential ocular complications. Given the critical role of online resources in patient education, eye care professionals should guide patients towards reliable, comprehensible websites and consider modern communication strategies to enhance compliance and safety in CL wear.

Keywords:  Contact lens; Health education; Online health information; Readability; Reliability

DOI:  https://doi.org/10.1016/j.clae.2025.102513
PLoS One. 2025 ;20(9): e0327194

Are videos uploaded by dental professionals on lip repositioning surgery of higher quality? A youtube video analysis.

Ebru Ece Sarıbas, Muzeyyen Kandemır.

Lip repositioning surgery is a minimally invasive procedure used in the treatment of gummy smile. With the increasing demand for aesthetic dental procedures, platforms like YouTube™ have become popular sources for visual health information. This study aimed to evaluate the quality, reliability, and educational value of YouTube™ videos related to lip repositioning surgery and to identify factors influencing video quality. This research was conducted on YouTube™ using the term "lip repositioning" on February 20, 2025. The first 150 videos sorted by relevance were screened. According to the inclusion criteria, 53 videos were recorded. Data such as video duration, upload source, view count, comments, likes/dislikes, and country of origin were recorded. Viewer engagement was analyzed through interaction index and viewing rate. Content was evaluated using the Video Content Quality (VCQ) score, Global Quality Scale (GQS) and DISCERN tool. Statistical analyses were performed with a significance level of P < 0.05. Most videos were uploaded by dentists (64.2%), and 71.7% were educational. The mean VCQ score was 9.98 ± 3.95, indicating low-to-moderate content quality. Videos uploaded by professionals had higher quality scores (p < 0.001), while 56.6% were of poor content quality. Although YouTube™ is a widely used source for health information, videos on lip repositioning surgery lack sufficient educational value and reliability. Professional content creation should be encouraged.

DOI: https://doi.org/10.1371/journal.pone.0327194
JSES Int. 2025 Jul;9(4): 1061-1068

Assessment of educational YouTube videos on proximal humeral fracture treatment (YouTube videos on proximal humeral fractures).

Taewoong Chae, Brandon S Chai, Adrian L Huang.

   Background: Social media platforms have become principal sources of information for patients to gather information before clinic visits. YouTube is a popular platform for educational videos, and since proximal humeral fractures (PHFs) are a common orthopedic trauma injury, this study aimed to assess the characteristics of YouTube videos on PHF and PHF treatment.
Methods: The terms "Proximal Humeral Fracture" and "Proximal Humeral Fracture Treatment" were used to gather the videos included in this study. Terms were searched programmatically using YouTube's search Application Program Interface. Top 50 videos from each search term were recorded and combined for a total of 100 videos. Duplicate videos were removed, and the remaining videos were rank ordered by the frequency and order of appearance from the initial search. Any non-English videos or videos irrelevant to the topic were excluded. The first 50 rank-order videos were included. Data collected were categorized into general parameters (eg, number of views, video length), source parameters (eg publisher affiliation, number of subscribers), and video content (eg, topic discussed, media type). Data were analyzed by 4 themes of basic information, information for health-care professionals, treatment, and rehabilitation. Each theme was further categorized by relevant subthemes.
Results: Publisher affiliation of the PHF videos was most commonly commercial (56%). Health-care professionals or students were the more common target audience (62%) than patients (36%). The predominant media type used in the videos was lecture-style presentation (52%), followed by demonstration (32%), and interviews (18%). Sixty-two percent of the videos discussed basic information on PHF, such as epidemiology or mechanism of injury. Treatment and rehabilitation were the most popular themes, both discussed in 80% of the videos. Among the subthemes, imaging and operative were the most popular subthemes discussed, discussed in 50% and 58% of the videos, respectively.
Conclusion: As YouTube is one of the most popular platforms on the Internet, this study assessed the YouTube videos regarding PHFs and their treatment. This study found the PHF videos to cover diverse topics and to be relevant to both patients and health-care professionals. Hence, they can serve as a valuable resource for patients to supplement information they receive from their care provider. However, as YouTube is a largely unregulated platform, there is a need to advocate for content creation from credible sources such as health-care facilities or providers.

Keywords:  Orthopedic surgery; Patient education; Proximal humeral fracture; Shoulder surgery; Social media; YouTube

DOI:  https://doi.org/10.1016/j.jseint.2025.04.006
JMIR Form Res. 2025 Sep 02. 9 e71652

Web-Based Video Platforms as Sources of Information on Body Image Dissatisfaction in Adolescents: Content and Quality Analysis of a Cross-Sectional Study.

Li Liu, Jianning Yang, Fengmei Tan, Huan Luo, Yanhua Chen, Xiaolei Zhao.

   Background: Body image dissatisfaction among children and adolescents is a significant public health concern and is associated with numerous physical and mental problems. Social media platforms, including TikTok, BiliBili, and YouTube, have become popular sources of health information. However, the quality and reliability of content related to body image dissatisfaction have not been comprehensively evaluated.
Objective: The primary goal of this study was to examine the quality and reliability of videos related to body image dissatisfaction on TikTok, BiliBili, and YouTube.
Methods: The keywords "body image dissatisfaction" were searched on YouTube, TikTok, and BiliBili in November 2024. Videos were collected based on platform-specific sort filters, including the filter of "Most liked" on TikTok and the filter of "Most viewed" on BiliBili and YouTube. The top 100 videos on each platform were reviewed and screened in the study. After excluding videos that were (1) not in English or Chinese, (2) duplicates, (3) irrelevant, (4) no audio or visual, (5) contained advertisements, and (6) with a Global Quality Scale (GQS) score of 1, the final sample consisted of 64 videos, which formed the basis of our research and subsequent findings. Two reviewers (LL and JNY) screened, selected, extracted data, and evaluated all videos using the GQS, the Modified DISCERN (mDISCERN) scores, and the Modified Journal of the American Medical Association (mJAMA) benchmark criteria. Statistical analysis was performed using SPSS (version 28.0; IBM Corp).
Results: In total, 64 videos were analyzed in the study, including 20 from TikTok, 13 from BiliBili, and 31 from YouTube. The median duration of the involved videos was 3.01 (IQR 1.00-5.94) minutes on TikTok, 3.52 (IQR 2.36-5.63) minutes on BiliBili, and 4.86 (IQR 3.10-6.93) minutes on YouTube. Compared with the other 2 platforms, BiliBili videos received higher likes and more comments. The majority of the videos (n=40, 62%) were uploaded by self-media. The quality of the videos on YouTube shows the highest overall scores. Videos uploaded by professional authors had significantly higher GQS, mDISCERN, and mJAMA scores compared to those uploaded by nonprofessionals. There was no significant correlation between video quality and the number of views or likes. However, the number of views and likes were significantly positively correlated. Furthermore, a significant correlation was found between the mJAMA, mDISCERN, and GQS scores.
Conclusions: Web-based video platforms have become an important source for adolescents to access health information. However, the lack of a significant correlation between video quality and the number of likes and comments poses a challenge for users seeking reliable health information. It is suggested that the quality of the videos on health information would be taken into consideration in the recommendation algorithm on web-based video platforms.

Keywords:  BiliBili; GQS; Global Quality Scale; Modified DISCERN; Modified Journal of the American Medical Association; TikTok; YouTube; adolescents; body image; mDISCERN; mJAMA; quality analysis; video

DOI:  https://doi.org/10.2196/71652
JMIR Form Res. 2025 Sep 17. 9 e73855

Quality and Reliability of Transarterial Chemoembolization Videos on TikTok and Bilibili: Cross-Sectional Content Analysis Study.

Yushuo Niu, Guilan Song, Zheyu Niu, Sijian Xiao, Cuicui Li, Na Han, Hao Wan, Xiaohong Hou.

   Background: Transarterial chemoembolization (TACE) is a widely used treatment for advanced, unresectable hepatocellular carcinoma, often requiring multiple sessions for optimal efficacy. TikTok and Bilibili have gained widespread popularity as easily accessible sources of health information.
Objective: This study aims to assess the quality of the information in Chinese short videos on TACE shared on TikTok and Bilibili.
Methods: In November 2024, the top 100 TACE-related Chinese-language short videos on TikTok and Bilibili (a total of 200 videos) were assessed and reviewed. Initially, basic information about the videos was recorded and analyzed. Subsequently, Global Quality Score and the DISCERN tool were used to evaluate the information quality and reliability of the videos on both platforms. Finally, multifactorial analysis was used to identify potential factors influencing the quality of the videos.
Results: TikTok is more popular than Bilibili, despite its videos being shorter in length (P<.001). The quality of short videos on TACE found on both platforms was of low quality, with average Global Quality Score scores of 2.31 (SD 0.81) on TikTok and 2.48 (SD 0.80) on Bilibili, as well as DISCERN scores of 1.86 (SD 0.40) on TikTok and 2.00 (SD 0.44) on Bilibili. The number of saves (β=.184, P=.008; β=.176, P=.01) and days (β=.214, P=.002; β=.168, P=.01) since publication were identified as closely related variables to video quality and reliability. Furthermore, the duration of the video was closely related to its reliability (β=.213, P=.002).
Conclusions: This study indicates that the quality of TACE-related health information in the top 100 short videos on both Bilibili and TikTok platforms is suboptimal. Patients should exercise caution when relying on health-related information from these platforms. Social media companies should establish review teams with basic medical knowledge. It is essential for the platforms to enhance their recommendation algorithms and implement measures for video quality assessment. Health care professionals should be aware of the limitations of these videos and work to improve their quality.

Keywords:  TACE; health education; hepatocellular carcinoma; quality analysis; short videos; transarterial chemoembolization

DOI:  https://doi.org/10.2196/73855
Medicine (Baltimore). 2025 Sep 12. 104(37): e44309

Evaluating the quality and reliability of YouTube videos about phenylketonuria.

Bahar Kulu, Sibel Burçak Şahin Uyar, Batuhan Kulu, Volkan Hanci.

  Phenylketonuria (PKU) is a congenital metabolic disorder characterized by defective phenylalanine metabolism, leading to neurotoxicity when untreated. Early detection through newborn screening and timely dietary intervention can ensure normal intellectual development. YouTube is a widely used source for health-related content, but the quality of information remains variable. The objective of this study is to evaluate the quality and reliability of YouTube videos related to PKU using validated scoring tools. This cross-sectional study was conducted on December 30, 2024. A YouTube search using the keyword "phenylketonuria" yielded 150 videos, of which 104 met the inclusion criteria. Video parameters including number of views, likes, duration, and content type were collected. Reliability was assessed using the Journal of the American Medical Association Benchmark Criteria and the modified DISCERN questionnaire. Quality and accuracy were evaluated using the Global Quality Score. Statistical analyses were performed to determine the relationships between video characteristics and evaluation scores. Of the analyzed videos, 45% were animated and 68% were uploaded by healthcare professionals. The median number of views was 1064 (range: 12-806,000) and the median view ratio was 0.68 (range: 0-245.36). Significant associations were found between the year of upload and view ratio (P = .012), and between continent and both view ratio (P = .045) and daily views (P = .003). Likes-per-view differed significantly by country (P < .001). According to the Global Quality Score, 56% of videos were of medium quality, with the highest scores observed in videos from professional organizations and academic sources. Journal of the American Medical Association assessment showed that 58.3% of videos contained sufficient information. Modified DISCERN questionnaire revealed that 46.6% of videos had poor reliability. The majority of PKU-related YouTube videos analyzed were of low to moderate quality and reliability. Given the impact of such content on patient decision-making, healthcare authorities should take an active role in producing and promoting high-quality digital health content.

Keywords:  DISCERN; Global Quality Score; JAMA Benchmark Criteria; YouTube video analysis; phenylketonuria

DOI:  https://doi.org/10.1097/MD.0000000000044309
Arch Rheumatol. 2025 Sep 01. 40(3): 365-375

Educational Quality and Reliability of YouTube Content Related to Musculoskeletal Ultrasound.

Selcuk Akkaya, Gonca Saglam Akkaya.

Background/Aims: YouTube's growing popularity as an educational resource for musculoskeletal ultrasound (MSKUS) raises questions about its potential to supplement medical education. This study evaluates MSKUS-related YouTube content comprehensively to determine its potential as a supplementary tool in medical education. Materials and Methods: A cross-sectional analysis was performed on 151 YouTube videos related to MSKUS. Video characteristics and viewer interaction metrics were recorded. Video popularity was quantified using the Video Power Index. The Global Quality Score (GQS), the Quality Criteria for Consumer Health Information (DISCERN), and the Medical Quality Video Evaluation Tool (MQ-VET) were employed to assess the educational value and quality of the videos. Video reliability was evaluated using the Journal of the American Medical Association (JAMA) Benchmark Criteria. Results: The most frequent MSKUS topic covered was shoulder ultrasound (29.8%), primarily focusing on anatomical landmarks (38.7%). Educational quality assessment indicated that 40.4% of videos were classified as low quality by the GQS. DISCERN rated 43.7% of videos as "very poor" quality, whereas MQ-VET scored 25.8% as average quality. The JAMA criteria indicated that 69.5% of the videos provided only partially sufficient information. No videos cited clinical guidelines, 24.5% provided references, and 18.5% included captions. Academic sources demonstrated significantly higher quality (DISCERN: P = .018; JAMA: P = .015; MQ-VET: P = .009). Videos with captions and references/citations demonstrated significantly higher GQS, DISCERN, JAMA, and MQ-VET scores (all P < .001). Diagnostic videos had higher GQS (median 3 vs. 2; P = .021) and JAMA scores (median 2.5 vs. 2; P = .032) compared to injection videos. Conclusion: This study highlights the inconsistent quality of YouTubebased MSKUS educational content. While academic and well-referenced videos are of high quality, unvetted content often lacks accuracy, making uncurated YouTube videos unreliable for clinical learning. It is recommended that educators guide learners toward content from academic institutions or highly engaged videos with cited guidelines/sources. Standardized guidelines are crucial for integrating trustworthy YouTube MSKUS content into medical curricula.

DOI: https://doi.org/10.5152/ArchRheumatol.2025.25038
Digit Health. 2025 Jan-Dec;11:11 20552076251379340

Short video platforms as sources of health information about HPV vaccine: A content and quality analysis.

Luyang Su, Jingrun Yao, Xiaohang Ai, Yanan Ren, Ren Xu, Xiaoqian Wu, Shixia Zhao, Weilan Liu, Liyun Song, Zeqing Du.

   Background: Numerous HPV vaccine-related videos are available on Kwai, Bilibili, and TikTok; however, their quality and professionalism vary considerably. This study evaluates the quality and reliability of HPV vaccine-related videos on these platforms to offer Chinese-speaking users a reference for informed vaccination decisions.
Method: A keyword search for the top 100 relevant videos on "HPV vaccine" was performed on Kwai, Bilibili, and TikTok, yielding a total of 238 eligible videos. A comparative analysis was conducted on video characteristics, uploader profiles, content categories, uploader attitudes, and public responses. The Global Quality Score (GQS) and the modified DISCERN (mDISCERN) instrument were applied to assess video quality and reliability.
Result: TikTok had the highest median number of likes (1744.5) and shares (1338), while Bilibili led in median comments (179.25) and favorites (527). Public support for the HPV vaccine was highest on Kwai (63.2%), whereas TikTok showed notable opposition (15.8%). Interestingly, only Bilibili lacked a neutral stance. The proportion of physician uploaders was highest on TikTok (61.8%), whereas Bilibili had the largest share of self-media contributors (66.3%). Among professional uploaders, 92.3% supported the HPV vaccine, and their videos received 55.4% public approval-significantly higher than the 34.6% for individual users (p = 0.021). Significant differences in mDISCERN scores were observed across all three platforms (all pairwise comparisons, p < 0.001).
Conclusion: Videos uploaded by professionals tend to have higher engagement and greater informational reliability, making them more effective in promoting public support for vaccination. TikTok videos scored highest on both GQS and mDISCERN metrics and had the largest proportion of professional uploaders, indicating superior overall quality and reliability. These findings should be interpreted within the context of the mainland-Chinese short-video ecosystem and may not be generalizable to non-Chinese-speaking populations.

Keywords:  Bilibili; HPV vaccine; Kwai; TikTok; information quality; short videos; social media

DOI:  https://doi.org/10.1177/20552076251379340
Technol Health Care. 2025 Sep 18. 9287329251367431

Instagram videos provide limited information on complications and return to social life regarding total knee arthroplasty: A multilingual analysis.

Yavuz Sahbat, Mustafa Fatih Dasci, Aziz Emre Nokay, Alicia Maria Ramos Tellez, Luigi Zanna, Abdulaziz Hariri, Serkan Surucu, Mustafa Citak.

  IntroductionThe purpose of this study was to examine the content quality and potential shortcomings of arthroplasty training videos on Instagram.Materials and MethodsA search on Instagram was performed from November 1, 2023, to April 30, 2024. The hashtags Replacement, Total knee replacement and Knee arthroplasty were translated into 6 different languages and searched on Instagram by 6 observers who are native speakers of those languages. The videos were scored using the DISCERN score and Global Quality Score (GQS). The extent to which the videos addressed the processes about which patients need to be informed was also examined.ResultA total of 126 videos were analyzed in this study. The median DISCERN and GQS scores were 3.0 [1.0-5.0] and 3.0 [2.0-5.0], respectively. The most frequently mentioned subheading was arthroplasty procedure and prosthesis technology (74%), followed by treatment options (66%). The least mentioned subheading was complications (19%), followed by return to social life (44%).ConclusionsThe main finding of this study was that knee arthroplasty videos posted on Instagram were lacking in data. Video content largely describes surgical techniques but is insufficient to inform patients about postoperative processes. The video content quality was found to be moderately good according to both video quality scores, and these quality scores were moderately correlated with the mention of subheadings.

Keywords:  Arthritis; Instagram; Internet; content quality; replacement; total knee arthroplasty

DOI:  https://doi.org/10.1177/09287329251367431
Patient Educ Couns. 2025 Sep 13. pii: S0738-3991(25)00719-0. [Epub ahead of print]141 109352

How does oncologists' communication affect patients' well-being and online health information seeking? - A randomized experiment.

Tanja Henkel, Chamoetal Zeidler, Annemiek J Linn, Julia C M van Weert, Ellen M A Smets, Marij A Hillen.

   OBJECTIVE: Patients with cancer increasingly rely on online information about their disease. However, the impact of clinicians' responses to patients presenting this information remains unclear. This randomized experiment tested the effects of oncologists' communication approaches on patients' trust, satisfaction, and intentions to seek and discuss online information. Additionally, we explored moderating effects of patients' psychological characteristics.
METHODS: In an online vignette experiment, we manipulated clinicians' communication approaches (patient-centered vs. clinician-centered) in hypothetical oncology consultations. (Former) cancer patients (N = 270, 62 ± 13 years, 55 % female) were randomly assigned to one out of eight conditions. We performed 1-way ANOVA's, independent samples t-tests and multiple regressions.
RESULTS: Participants exposed to a patient-centered approach reported higher satisfaction with the consultation (d =0.62, p < .001), stronger trust in the clinician (d =0.49, p < .001), and stronger intentions to seek (d =0.40, p < .001) and discuss online information (d =0.69, p < .001) compared to participants exposed to a clinician-centered approach. Moderation analyses indicated that the effect of communication approach on intention to discuss online information depended on participants' trait anxiety (b =-0.43, p = .017) and uncertainty intolerance (b =-0.35, p = .041). Uncertainty intolerance further moderated patient satisfaction with the consultation (b =-0.33, p = .049). Participants' monitoring coping style moderated the effect of communication approach on online information seeking (b =0.23, p = .036).
CONCLUSION: Clinicians' patient-centered responses to online information seeking may positively affect patient satisfaction with the consultation, trust in the consultation, and online information seeking behavior. We provide initial evidence that these effects do not apply equally to every patient: levels of trait anxiety, uncertainty intolerance and monitoring coping style influence the relationship between the applied communication approach and patient outcomes.
PRACTICE IMPLICATIONS: Clinicians are advised to emphasize collaborative information exchange and guide patients to trustworthy online sources.

Keywords:  Experiment; Oncology; Online health information; Patient-Centred Communication; Patient-provider-interaction; Vignettes

DOI:  https://doi.org/10.1016/j.pec.2025.109352
Risk Manag Healthc Policy. 2025 ;18 2951-2965

Online Health Information Seeking: Implications for Self-Management in Hypertension.

Shiya Liu, Sufang Huang, Yaru Xiao, Jingjing Huang.

   Background: Hypertension, a major cardiovascular disease risk factor, is a global public health challenge. Self-management is key, and with information and communication technology prevalence, online health information seeking behavior (OHISB) has become a common trend to boost patients' self-management.
Purpose: This study aims to explore hypertensive patients' OHISB and their impact on self-management practices, providing a basis for further improving patients' OHISB and self-management behaviors.
Patients and Methods: This study selected 312 hypertensive patients from the Cardiology Department of a Wuhan tertiary hospital (March-April 2025), using a general information questionnaire, revised version of the Online Health seeking behavior Scale (OHB-S) and the Hypertension Patients Self-Management Behavior Rating Scale (HPSMBRS) for surveys. SPSS 26.0 did descriptive analysis of enumeration/measurement data; t-tests/ANOVA analyzed group differences; multiple linear regression examined OHISB-influencing factors; Pearson correlation and hierarchical regression explored relationships between OHISB and self-management.
Results: The total scores of OHISB and self-management were (55.20±14.29) and (96.54±16.62) respectively in the patients. The total scores of OHISB and self-management were significantly positively correlated (r=0.634, P<0.05). The results of hierarchical regression analysis show that OHISB is an important influencing factor of self-management and can independently explain 21.2% of the variation in patients' self-management.
Conclusion: Both the OHISB and self-management behaviors of hypertensive patients are at a relatively low level. OHISB is an important influencing factor of self-management. Hypertensive patients with a higher level of OHISB have a higher level of self-management. In the future, information sources should be carefully controlled, and a variety of online health information channels should be combined to provide targeted online hypertension health education, thereby enhancing the self-management capabilities of hypertension patients.

Keywords:  hypertension; online health information seeking behavior; self-management

DOI:  https://doi.org/10.2147/RMHP.S539905