bims-librar 2025-05-11 papers

bims-librar

Biomed News

on Biomedical librarianship

Issue of 2025–05–11
forty-one papers selected by
Thomas Krichel, Open Library Society

The structure and experience of interim roles in academic health sciences libraries.
Moving to a digital only library.
Academic Health Sciences Library Develops Novel Online Teaching and Learning Resource for Dermatology.
Development and validation of LGBTQIA+ search filters: report on process and pilot filter for queer women.
Physical therapy students' perceptions of embedded medical librarians within evidence-based practice courses: a mixed-methods pilot study.
Integrating PICO principles into generative artificial intelligence prompt engineering to enhance information retrieval for medical librarians.
Enhancing automated indexing of publication types and study designs in biomedical literature using full-text features.
Navigating the Scholarly Literature: A Practical Guide to Searching Effectively (Without Too Much Stress).
uCite: The union of nine large-scale public PubMed citation datasets with reliability filtering.
EBI Search: providing discovery tools for biological metadata in 2025.
Insights from search summary tables for evidence and gap maps: a case study on peer support interventions.
A decade of Does: celebrating the 125th anniversary of MLA through an annual meeting conversation with past Janet Doe lecturers.
Medical education research quality (MERSQ) checklist development: Are searches of BEME and non-BEME reviews standard?: A mixed method study.
Evaluating Accuracy and Readability of Responses to Midlife Health Questions: A Comparative Analysis of Six Large Language Model Chatbots.
Can urticaria images found on the internet be a source of health information?
Performance of AI-Chatbots to Common Temporomandibular Joint Disorders (TMDs) Patient Queries: Accuracy, Completeness, Reliability and Readability.
Are Large Language Model-Based Chatbots Effective in Providing Reliable Medical Advice for Achilles Tendinopathy? An International Multispecialist Evaluation.
Artificial Intelligence Performs Well for Patient-Level Education of Benign Anorectal Conditions.
The actual performance of large language models in providing liver cirrhosis-related information: A comparative study.
Is ChatGPT a More Academic Source Than Google Searches for Patients Questions About Hip Arthroscopy? An Analysis of the Most Frequently Asked Questions.
The Quality and Readability of Online Resources for Women With Hair Loss: A Cross-Sectional Study.
The Effectiveness of a Custom AI Chatbot for Type 2 Diabetes Mellitus Health Literacy: Development and Evaluation Study.
Parental education in pediatric dysphagia: A comparative analysis of three large language models.
Analyzing the Quality and Readability of Online Resources Available for Androgenetic Alopecia: A Cross-Sectional Analysis.
Usability and accessibility of total shoulder arthroplasty and rotator cuff repair online patient educational materials for persons with disabilities.
Navigating the Whipple procedure patient educational materials: A readability and understandability analysis.
Evaluating Large Language Models in Addressing Patient Questions on Endodontic Pain: A Comparative Analysis of accessible chatbots.
Common Disease-Difficult Understanding: Readability Analysis of Superficial Skin Fungal Infections On-Line Materials in European Languages.
Assessing the Readability, Quality, and Visual Accessibility of Patient Education Websites for Laser Refractive Surgery.
A comparative evaluation of laparoscopic TAPP videos on YouTube and WebSurg: surgical steps and educational value.
Is YouTube™ a useful resource of information about bichectomy? A cross-sectional study.
Bridging the Gap in Patient Guidance: Quality Analysis of Kegel Exercise Videos on YouTube.
Rabies in Social Media Videos: Comparison of Instagram and YouTube.
Hip fractures in Chinese TikTok (Douyin) short videos: an analysis of information quality, content and user comment attitudes.
The content quality and educational significance of early childhood caries on short video platforms.
Author Reply to "Letter to the Editor Regarding 'YouTube is an Inconsistent Source of Information on Orthobiologics: Implications for Content Quality, Reliability, Comprehensiveness, and Patient Decision-Making'".
Understanding Breast Cancer (BRCA) Mutations Through TikTok.
Online Health Information-Seeking Behaviors Among the Chongqing Population: Cross-Sectional Questionnaire Study.
Appropriate trust in online health information is associated with information platform, commercial status, and misinformation in patients with high cardiovascular risk.
Linking internet health information seeking to psychological distress among Chinese adults with a family cancer history: The attenuating role of patient-clinician communication of internet searches.
Non-clinician involvement in interprofessional health sciences education: educator experiences and attitudes.

J Med Libr Assoc. 2025 Apr 18. 113(2): 148-157

The structure and experience of interim roles in academic health sciences libraries.

John W Cyrus, Roy E Brown, Emily J Hurst, Rasha Alsaadawi, Roy T Sabo.

   Objective: Interim leadership roles are commonly used in academic libraries to ensure continuity and oversight within the organization. Interim roles can be rewarding but fraught with challenges, including the assumption of responsibilities in unstable environments, unclear expectations, and poor organizational preparedness. This article presents findings from a survey of librarian's experiences serving in interim leadership positions.
Methods: A survey was designed to capture perceptions of the structure of the leadership position and the experience of the interim leaders. It was distributed via social media and through health sciences library listservs. Responses were analyzed using descriptive statistics and exploratory one-way ANOVA to test for response differences between respondent sub-groups.
Results: Fifty-four complete responses were collected. Respondents were predominantly White (89%) and female (77%). Seventy percent of respondents had worked in health sciences libraries for 11-25 years. Respondents indicated that expectations, expected duration, and transition plan for the role were unclear. Policies and procedures related to the interim role were lacking. Respondents agreed that full authority and acceptance were given as part of the role. There were statistically significant differences in responses relating to authority, retention, and acceptance by gender and race.
Conclusions: Results show that interim leaders were given adequate authority and support, but that organizations were not necessarily prepared for the interim leader, lacking policies, procedures, and clear expectations related to the position. Libraries can better prepare for the future by creating permanent structures and policies to facilitate the transition into and out of interim leadership.

Keywords:  Leadership; Library management; Surveys and Questionnaire; academic health sciences libraries; interim

DOI:  https://doi.org/10.5195/jmla.2025.1924
Br J Occup Ther. 2023 Sep;86(9): 597-598

Moving to a digital only library.

Lorna Rutherford.

DOI: https://doi.org/10.1177/03080226231188597
Med Ref Serv Q. 2025 May 08. 1-14

Academic Health Sciences Library Develops Novel Online Teaching and Learning Resource for Dermatology.

Bryan E Hull, Carmin I Smoot, Adriene Pavek, Annabelle Huntsman, Shreya Sreekantaswamy, Julia Curtis.

  Primary care providers and medical students often receive limited dermatologic education, leading to delayed diagnosis and treatment for patients with cutaneous conditions. Additionally, dermatology education has historically focused on light skin, neglecting skin of color, which exacerbates diagnostic delays and treatment disparities. The University of Utah's Eccles Health Sciences Library and Department of Dermatology, along with Oregon Health & Science University, developed Utah Dermatology Education Resources & Modules (UtahDERM) to address these educational gaps. UtahDERM features a custom-built slide-viewer platform with clinical dermatology images, diagnoses, clinical characteristics, and textbook references, along with a quick reference tool for core dermatology diagnoses.

Keywords:  Academic health sciences libraries; curriculum support; dermatology; medical education; open educational repositories

DOI:  https://doi.org/10.1080/02763869.2025.2498117
J Med Libr Assoc. 2025 Apr 18. 113(2): 123-132

Development and validation of LGBTQIA+ search filters: report on process and pilot filter for queer women.

Hannah M Schilperoort, Andy Hickner, Jane Morgan-Daniel, Robin M N Parker.

   Introduction: A search filter for studies involving lesbian, gay, bisexual, transgender, queer, intersex, asexual, and additional sexual minority and gender identities (LGBTQIA+) populations has been developed and validated; however, the filter contained very small gold standard sets for some populations, and terminology, controlled vocabulary, and database functionality has subsequently evolved. We therefore sought to update and re-test the search filters for these selected subgroups using larger gold standard sets. We report on the development and validation of two versions of a sensitivity-maximizing search filter for queer women, including but not limited to lesbians and women who have sex with women (WSW).
Methods: We developed a PubMed search filter for queer women using the relative recall approach and incorporating input from queer women. We tested different search combinations against the gold standard set; combinations were tested until a search with 100% sensitivity was identified.
Results: We developed and tested variations of the search and now present two versions of the strategy with 99% and 100% sensitivity. The strategies included additional terms to improve sensitivity and proximity searching to improve recall and precision.
Conclusions: The queer women search filters balance sensitivity and precision to facilitate comprehensive retrieval of studies involving queer women. The filters will require ongoing updates to adapt to evolving language and search platform functionalities. Strengths of the study include the involvement of the population of interest at each stage of the project. Future research will include development and testing of search filters for other LGBTQIA+ subgroups such as bisexual and transgender people.

Keywords:  LGBTQIA+; WSW; bisexual women; lesbians; queer women; relative recall; search filter validation; search hedge validation; systematic reviews as topic; women who have sex with women

DOI:  https://doi.org/10.5195/jmla.2025.2002
J Med Libr Assoc. 2025 Apr 18. 113(2): 143-147

Physical therapy students' perceptions of embedded medical librarians within evidence-based practice courses: a mixed-methods pilot study.

Lori Bolgla, Malorie Novak, Lachelle Smith.

   Objective: Previous work within academic medical centers has indicated the potential value of embedded medical librarian programs within health sciences professional degree programs. This study sought to determine the perceived benefit that an embedded medical librarian (EML) provided to an evidence-based practice (EBP) course within an entry-level physical therapy degree program.
Methods: Learners completed an anonymous survey at the end of an EBP course about the impact of the EML on the course and their own EML utilization. Frequency and percentages were calculated for quantitative data; qualitative data were analyzed using an iterative process for code development.
Results: Forty (98%) learners completed the survey. Seventy-five point six percent of learners utilized the EML 1-2 times per class session and 31.7% outside of class sessions. Learners overwhelmingly "agreed" (53.7%) or "strongly agreed" (39.0%) that they would consult the EML for literature searches required in future courses. Seventy point seven percent "strongly agreed" that the EML improved their ability to conduct a literature search. All learners either "agreed" (43.9%) or "strongly agreed" (56.1%) that the EML added value to the course. Ninety point two percent considered the EML as an integral part of the course. Themes from the qualitative analysis agreed that the EML added value to the course and facilitated skills that would be useful throughout the curriculum.
Conclusion: Learners believe that having an EML improves their ability to conduct a literature search. Providing learners with EML access during their education experience facilitates development of this skill. Early and continued instruction throughout the entry-level DPT curriculum in informatics ensures program compliance with accreditation standards.

Keywords:  Academic Health Sciences Libraries; Health Informatics; Physical Therapy; evidence based medicine

DOI:  https://doi.org/10.5195/jmla.2025.1977
J Med Libr Assoc. 2025 Apr 18. 113(2): 184-188

Integrating PICO principles into generative artificial intelligence prompt engineering to enhance information retrieval for medical librarians.

Kyle Robinson, Karen Bontekoe, Joanne Muellenbach.

  Prompt engineering, an emergent discipline at the intersection of Generative Artificial Intelligence (GAI), library science, and user experience design, presents an opportunity to enhance the quality and precision of information retrieval. An innovative approach applies the widely understood PICO framework, traditionally used in evidence-based medicine, to the art of prompt engineering. This approach is illustrated using the "Task, Context, Example, Persona, Format, Tone" (TCEPFT) prompt framework as an example. TCEPFT lends itself to a systematic methodology by incorporating elements of task specificity, contextual relevance, pertinent examples, personalization, formatting, and tonal appropriateness in a prompt design tailored to the desired outcome. Frameworks like TCEPFT offer substantial opportunities for librarians and information professionals to streamline prompt engineering and refine iterative processes. This practice can help information professionals produce consistent and high-quality outputs. Library professionals must embrace a renewed curiosity and develop expertise in prompt engineering to stay ahead in the digital information landscape and maintain their position at the forefront of the sector.

Keywords:  Generative Artificial Intelligence; Information Retrieval; PICO; Prompt Engineering

DOI:  https://doi.org/10.5195/jmla.2025.2022
medRxiv. 2025 Apr 28. pii: 2025.04.23.25326300. [Epub ahead of print]

Enhancing automated indexing of publication types and study designs in biomedical literature using full-text features.

Joe D Menke, Shufan Ming, Shruthan Radhakrishna, Halil Kilicoglu, Neil R Smalheiser.

Objective: Searching for biomedical articles by publication type or study design is essential for tasks like evidence synthesis. Prior work has relied solely on PubMed information or a limited set of types (e.g., randomized controlled trials). This study builds on our previous work by leveraging full-text features, alternative text representations, and advanced optimization techniques.
Methods: Using a dataset of PubMed articles published between 1987 and 2023 with human-curated indexing terms, we fine-tuned BERT-based en-coders (PubMedBERT, BioLinkBERT, SPECTER, SPECTER2, SPECTER2-Clf) to investigate whether text representations based on different pre-training objectives could benefit the task. We incorporated textual and verbalized metadata features, full-text extraction (rule-based, extractive, and abstractive summarization), and additional topical information about the articles. To improve calibration and mitigate label noise, we used asymmetric loss and label smoothing. We also explored contrastive learning approaches (SimCSE, ADNCE, HeroCon, WeighCon). Models were evaluated using precision, recall, F1 score (both micro- and macro-), and area under ROC curve (AUC).
Results: Fine-tuning SPECTER2-base with adding the MeSH term "Animals", asymmetric loss with label smoothing, and WeighCon contrastive loss improved performance significantly over the previous best architecture (micro-F1: 0.664 → 0.679 [+2.2%]; macro-F1: 0.663 → 0.690 [+4.1%]; p < 0.0001). Asymmetric loss and using SPECTER2-base instead of PubMedBERT contributed most to this gain. Full-text features boosted performance by 2.4% (micro-F1) and 1.8% (macro-F1) over the baseline (micro-F1: 0.616 → 0.631; macro-F1: 0.556 → 0.566; p < 0.0001). Topical label splitting and contrastive learning provided minor, non-significant improvements.
Conclusion: Full-text features, enhanced document representations, and fine-tuning optimizations improve publication type and study design indexing. Future work should refine label accuracy, better distill relevant article information, and expand label sets to meet needs of the research community. Data, code, and models are available at https://github.com/ScienceNLPLab/MultiTagger-v2 .

DOI: https://doi.org/10.1101/2025.04.23.25326300
Curr Protoc. 2025 May;5(5): e70138

Navigating the Scholarly Literature: A Practical Guide to Searching Effectively (Without Too Much Stress).

Kathryn P Kohn.

  Effectively searching the scholarly literature is a fundamental academic skill. However, the process can be overwhelming due to the vast amount of available research and the complexity of academic databases. This overview article provides a practical guide to navigating the literature with confidence, outlining key strategies for identifying relevant sources, refining search queries, and troubleshooting common challenges. © 2025 The Author(s). Current Protocols published by Wiley Periodicals LLC.

Keywords:  database searching; information sources; research methodology; scholarly literature; search strategies

DOI:  https://doi.org/10.1002/cpz1.70138
Data Brief. 2025 Jun;60 111535

uCite: The union of nine large-scale public PubMed citation datasets with reliability filtering.

Liri Fang, Malik Oyewale Salami, Griffin M Weber, Vetle I Torvik.

  There has been a recent push to make public, aggregate, and increase coverage of bibliographic citation data. Here we describe uCite, a citation dataset containing 564 million PubMed citation pairs aggregated from the following nine sources: PubMed Central, iCite, OpenCitations, Dimensions, Microsoft Academic Graph, Aminer, Semantic Scholar, Lens, and OpCitance. Of these, 51 million (9%) were labeled unreliable, as determined by patterns of source discrepancies explained by ambiguous metadata, crosswalk, and typographical errors, citing future publications, and multi-paper documents. Each source contributes to improved coverage and reliability, but varies dramatically in precision and recall, estimates of which are contrasted with the Web of Science and Scopus herein.

Keywords:  Bibliographic database; Citation data; Citation errors

DOI:  https://doi.org/10.1016/j.dib.2025.111535
Nucleic Acids Res. 2025 May 05. pii: gkaf359. [Epub ahead of print]

EBI Search: providing discovery tools for biological metadata in 2025.

Matthew Pearce, Prasad Basutkar, Renato Caminha Juaçaba Neto, Vijay Venkatesh Subramoniam, Kelsey Neis, Iva Tutis, Henning Hermjakob.

The data resources provided by the European Bioinformatics Institute (EMBL-EBI) cover major areas of biological and biomedical research, giving free and open access to users ranging from expert to casual level. The EBI Search engine provides a unified metadata search engine across these resources. It provides a full-text search engine across over 6.5 billion data items, accessed through a user-friendly website and an OpenAPI-compliant programmatic interface. Here, we discuss recent developments and improvements in the service.

DOI: https://doi.org/10.1093/nar/gkaf359
J Med Libr Assoc. 2025 Apr 18. 113(2): 177-183

Insights from search summary tables for evidence and gap maps: a case study on peer support interventions.

Alison C Bethel, Naomi Shaw, Rebecca Abbot, Morwenna Rogers, Anna Price, Rob Anderson, Sian de Bell, Jo Thompson Coon.

   Background: Evidence and Gap Maps (EGMs) are a visual representation of the available evidence relevant to a specific research question or topic area. They are produced using similar methods to systematic reviews, however, there is little guidance on which databases to search and how many. Information Specialists need to make decisions on which resources to search, often for a range of study designs within a broad topic area to ensure comprehensiveness.
Case Presentation: This case study presents two search summary tables (SSTs) from an evidence and gap map on peer support interventions. The first search summary table presents the findings of the search for systematic reviews and the second for randomised controlled trials. Different databases and different searches were undertaken for the two different study types.
Conclusion: The two SSTs indicated that MEDLINE and PsycINFO were key databases required for the identification of both systematic reviews and randomised controlled trials of peer support interventions, with the addition of CINAHL for systematic reviews, and CENTRAL for randomised controlled trials. For both study types, forward citation searching found additional included studies although it was more lucrative for identifying additional randomised controlled trials. Search summary tables are a simple way to share the effectiveness of the search methods chosen for a specific evidence synthesis project. The more SSTs we have, the more data we will have to inform evidence-based decisions on our search methods.

Keywords:  Evidence synthesis; evidence and gap maps; information retrieval; search summary table

DOI:  https://doi.org/10.5195/jmla.2025.1831
J Med Libr Assoc. 2025 Apr 18. 113(2): 116-122

A decade of Does: celebrating the 125th anniversary of MLA through an annual meeting conversation with past Janet Doe lecturers.

Gerald Perry, Mary Joan M J Tooey.

  At the Medical Library Association (MLA) 2024 Annual Meeting in Portland, Oregon, the Janet Doe Lectureship Series plenary session featured a panel of past Doe lecturers from the last decade. Reflecting on their lectures they were challenged to imagine how the Association's Core Values could guide and inform decision making in response to current and emerging challenges to the profession and in the environment. Panelists' reflections included themes of inclusivity, collaboration, leadership, technology, space planning, and the role of medical librarians in addressing issues of mis- and disinformation, bias, equity, and open access, today and in the future. Common themes included the centrality of collaboration as a necessary component of health sciences librarianship, and the ongoing criticality of the profession's commitment to ethical practices. The panelists shared insights on how MLA's Core Values can guide the profession and association through the challenges and opportunities of the evolving healthcare and information landscape, including the rise and the rapid evolution of advanced technologies.

Keywords:  Diversity, Equity, Inclusion; Ethics, Professional; Health Equity; Janet Doe Lecture; Leadership; Libraries; Medical Library Association, History; Open Access Publishing; Technology

DOI:  https://doi.org/10.5195/jmla.2025.2150
Medicine (Baltimore). 2025 May 02. 104(18): e42316

Medical education research quality (MERSQ) checklist development: Are searches of BEME and non-BEME reviews standard?: A mixed method study.

Maryam Alizadeh, Rahem Rahmati, Fatemeh Zarimeidani, Fatemeh Hasani, Arshin Ghaedi, Aida Bazrgar, Reza Hosseini Doalame, Hojat Vahedi, Hamidreza Hekmat, Negar Omidi.

  Even though there has been a lot of research in medical education, the quality of it has not increased similarly. This study aimed to provide a valid and reliable user-friendly tool for evaluating search strategies in medical education systematic reviews. This mixed study was conducted in 2019 to 2021, including 3 phases: systematic search, developing a medical education research quality (MERSQ) checklist, and evaluation of the search quality of best evidence in medical education collaboration (BEME) and non-BEME reviews. Three hundred nineteen items were retrieved from the systematic search of PubMed, Embase, Scopus, Psychinfo, ERIC, and Google Scholar. Following ensuring acceptable criteria, 30 items were included in comprehensiveness or reproducibility guarantees. The results showed that the instrument had an the intra-class correlation coefficient of 0.922 (P = .002), the reproducibility guarantee had 0.903 (P = .003), and the comprehensiveness guarantee had 0.926 (P = .006). We also calculated inter-rater reliability and internal consistency using Cronbach alpha of 0.827 (P < .001) and an instrument the intra-class correlation coefficient of 0.978. Using MERSQ, the overall search quality (41.75 vs 31.25, P = .009), reproducibility (22 vs 14.50, P = .004), and comprehensive score (18.75 vs 15.75, P = .880) of BEME studies were higher than non-BEME ones. Moreover, we found only 30% of studies completed searching documents. The search strategy query concerning the selection of synonym terms received the lowest score among studies. This study led to the development of a valid and reliable checklist for evaluating the search quality of medical education systematic reviews. Utilizing the MERSQ checklist, we found that BEME studies had higher quality than non-BEME ones, making the results from BEME studies more reliable.

Keywords:  best evidence medical education; medical education; search quality; search strategy; systematic review

DOI:  https://doi.org/10.1097/MD.0000000000042316
J Midlife Health. 2025 Jan-Mar;16(1):16(1): 45-50

Evaluating Accuracy and Readability of Responses to Midlife Health Questions: A Comparative Analysis of Six Large Language Model Chatbots.

Himel Mondal, Devendra Nath Tiu, Shaikat Mondal, Rajib Dutta, Avijit Naskar, Indrashis Podder.

   Background: The use of large language model (LLM) chatbots in health-related queries is growing due to their convenience and accessibility. However, concerns about the accuracy and readability of their information persist. Many individuals, including patients and healthy adults, may rely on chatbots for midlife health queries instead of consulting a doctor. In this context, we evaluated the accuracy and readability of responses from six LLM chatbots to midlife health questions for men and women.
Methods: Twenty questions on midlife health were asked to six different LLM chatbots - ChatGPT, Claude, Copilot, Gemini, Meta artificial intelligence (AI), and Perplexity. Each chatbot's responses were collected and evaluated for accuracy, relevancy, fluency, and coherence by three independent expert physicians. An overall score was also calculated by taking the average of four criteria. In addition, readability was analyzed using the Flesch-Kincaid Grade Level, to determine how easily the information could be understood by the general population.
Results: In terms of fluency, Perplexity scored the highest (4.3 ± 1.78), coherence was highest for Meta AI (4.26 ± 0.16), accuracy of responses was highest for Meta AI, and relevancy score was highest for Meta AI (4.35 ± 0.24). Overall, Meta AI scored the highest (4.28 ± 0.16), followed by ChatGPT (4.22 ± 0.21), whereas Copilot had the lowest score (3.72 ± 0.19) (P < 0.0001). Perplexity showed the highest score of 41.24 ± 10.57 in readability and lowest in grade level (11.11 ± 1.93), meaning its text is the easiest to read and requires a lower level of education.
Conclusion: LLM chatbots can answer midlife-related health questions with variable capabilities. Meta AI was found to be highest scoring chatbot for addressing men's and women's midlife health questions, whereas Perplexity offers high readability for accessible information. Hence, LLM chatbots can be used as educational tools for midlife health by selecting appropriate chatbots according to its capability.

Keywords:  Artificial intelligence; chatbots; health education; large language models; midlife health; patient education; patient queries

DOI:  https://doi.org/10.4103/jmh.jmh_182_24
Allergol Immunopathol (Madr). 2025 ;53(3): 80-87

Can urticaria images found on the internet be a source of health information?

Merve Erkoç, Makbule Seda Bayrak Durmaz.

   BACKGROUND: In parallel with technological developments, patients increasingly benefit from information and communication technologies.
OBJECTIVE: The aim was to evaluate urticaria images that are available on the internet in two different languages.
MATERIALS AND METHODS: The terms "urticaria" and "ürtiker" were used as search terms on Google Images. One hundred images were saved for each term, and each image was opened via its link. Two specialists in immunology and allergy jointly assessed the uploader information, pixel resolution, characteristics of the urticarial lesions, and image quality of the photos.
RESULTS: A total of 178 images were included, with 87 from the "urticaria" search term and 91 from the "ürtiker" search term-71.3% images had isolated urticaria, 1.7% had isolated angioedema, 0.6% had both urticaria and angioedema, and 26.4% had neither urticaria nor angioedema; 131 photographs depicting urticaria and/or angioedema were analyzed. The majority of urticarial plaques were erythematous (84%), with extremities (32.1%) being the most commonly affected area. Images in the preview on Google Images appeared more blurred and of lower resolution than the images after opening the link (n:99 vs. n:26, p < 0.001 and n:55 vs. n:10, p < 0.001, respectively). The quality of the images was found to be better after opening the link compared to the preview (n:34 vs. n:107; p < 0.001).
CONCLUSION: Our study found that approximately one-quarter of urticaria images on Google Images did not match true urticarial lesions and were of suboptimal quality in both Turkish and universally accessible English.

Keywords:  Google Images; angioedema; internet; photography; urticaria

DOI:  https://doi.org/10.15586/aei.v53i3.1351
Orthod Craniofac Res. 2025 May 07.

Performance of AI-Chatbots to Common Temporomandibular Joint Disorders (TMDs) Patient Queries: Accuracy, Completeness, Reliability and Readability.

Mohamed G Hassan, Ahmed A Abdelaziz, Hams H Abdelrahman, Mostafa M Y Mohamed, Mohamed T Ellabban.

  TMDs are a common group of conditions affecting the temporomandibular joint (TMJ) often resulting from factors like injury, stress or teeth grinding. This study aimed to evaluate the accuracy, completeness, reliability and readability of the responses generated by ChatGPT-3.5, -4o and Google Gemini to TMD-related inquiries. Forty-five questions covering various aspects of TMDs were created by two experts and submitted by one author to ChatGPT-3.5, ChatGPT-4 and Google Gemini on the same day. The responses were evaluated for accuracy, completeness and reliability using modified Likert scales. Readability was analysed with six validated indices via a specialised tool. Additional features, such as the inclusion of graphical elements, references and safeguard mechanisms, were also documented and analysed. The Pearson Chi-Square and One-Way ANOVA tests were used for data analysis. Google Gemini achieved the highest accuracy, providing 100% correct responses, followed by ChatGPT-3.5 (95.6%) and ChatGPT-4o (93.3%). ChatGPT-4o provided the most complete responses (91.1%), followed by ChatGPT-03 (64.4%) and Google Gemini (42.2%). The majority of responses were reliable, with ChatGPT-4o at 93.3% 'Absolutely Reliable', compared to 46.7% for ChatGPT-3.5 and 48.9% for Google Gemini. Both ChatGPT-4o and Google Gemini included references in responses, 22.2% and 13.3%, respectively, while ChatGPT-3.5 included none. Google Gemini was the only model that included multimedia (6.7%). Readability scores were highest for ChatGPT-3.5, suggesting its responses were more complex than those of Google Gemini and ChatGPT-4o. Both ChatGPT-4o and Google Gemini demonstrated accuracy and reliability in addressing TMD-related questions, with their responses being clear, easy to understand and complemented by safeguard statements encouraging specialist consultation. However, both platforms lacked evidence-based references. Only Google Gemini incorporated multimedia elements into its answers.

Keywords:  ChatGPT; Google Gemini; artificial intelligence; chatbot; education; temporomandibular joint; temporomandibular joint disorders

DOI:  https://doi.org/10.1111/ocr.12939
Orthop J Sports Med. 2025 Apr;13(4): 23259671251332596

Are Large Language Model-Based Chatbots Effective in Providing Reliable Medical Advice for Achilles Tendinopathy? An International Multispecialist Evaluation.

Zuru Liang, Ming Wang, Nasef Mohamed Nasef Abdelatif, Marut Arunakul, Carlo Angelo V Borbon, Keen Wai Chong, Man Wai Chow, Yinghui Hua, David Oji, Ximena Ahumada, Kwai Ming Siu, Ken Jin Tan, Yasuhito Tanaka, Akira Taniguchi, Patrick Shu-Hang Yung, Samuel Ka-Kin Ling.

   Background: Large language model (LLM)-based chatbots have shown potential in providing health information and patient education. However, the reliability of these chatbots in offering medical advice for specific conditions like Achilles tendinopathy remains uncertain. Mixed outcomes in the field of orthopaedics highlight the need for further examination of these chatbots' reliability.
Hypothesis: Three leading LLM-based chatbots can provide accurate and complete responses to inquiries related to Achilles tendinopathy.
Study Design: Cross-sectional study.
Methods: Eighteen questions derived from the Dutch clinical guideline on Achilles tendinopathy were posed to 3 leading LLM-based chatbots: ChatGPT 4.0, Claude 2, and Gemini. The responses were incorporated into an online survey assessed by orthopaedic surgeons specializing in Achilles tendinopathy. Responses were evaluated using a 4-point scoring system, where 1 indicates unsatisfactory and 4 indicates excellent. The total scores for the 18 responses were aggregated for each rater and compared across the chatbots. The intraclass correlation coefficient was calculated to assess consistency among the raters' evaluations.
Results: Thirteen specialists from 9 diverse countries and regions participated. Analysis showed no significant difference in the mean total scores among the chatbots: ChatGPT (59.7 ± 5.5), Claude 2 (53.4 ± 9.7), and Gemini (53.6 ± 8.4). The proportions of unsatisfactory responses (score 1) were low and comparable across chatbots: 0.9% for ChatGPT 4.0, 3.4% for Claude 2, and 3.4% for Gemini. In terms of excellent responses (score 4), ChatGPT 4.0 outperformed the others, with 43.6% of the responses rated as excellent, significantly higher than Claude 2 at 27.4% and Gemini at 25.2% (P < .001 for both comparisons). Intraclass correlation coefficients indicated poor reliability for ChatGPT 4.0 (0.420) and moderate reliability for Claude 2 (0.522) and Gemini (0.575).
Conclusion: While LLM-based chatbots such as ChatGPT 4.0 can deliver high-quality responses to queries regarding Achilles tendinopathy, the inconsistency among specialist evaluations and the absence of standardized assessment criteria significantly challenge our ability to draw definitive conclusions. These issues underscore the need for a cautious and standardized approach when considering the integration of LLM-based chatbots into clinical settings.

Keywords:  AI; Achilles tendinopathy; chatbot; large language model

DOI:  https://doi.org/10.1177/23259671251332596
Am Surg. 2025 May 06. 31348251337156

Artificial Intelligence Performs Well for Patient-Level Education of Benign Anorectal Conditions.

Makenna Marty, Seija Maniskas, Jonathan Zuo, Gabriel Akopian, Adam Truong.

  BackgroundChatGPT, the most widely recognized AI chatbot, has been increasingly utilized by patients for health care education, but its performance and knowledge base have not been evaluated for colorectal surgery topics. We hypothesize ChatGPT can provide accurate patient-level information for benign anorectal diseases.MethodsWe performed a single-institution prospective study evaluating OpenAI's GPT-4 chatbot against verified online medical literature using the modified EQIP (mEQIP) for hemorrhoids, anal fissures, and pruritus ani. Scoring was performed by three independent educated reviewers.ResultsChatGPT had a median overall score of 22/36 (61%) across all topics. It performed similarly for all topics: 24 for hemorrhoids, 22 for fissures, and 20 for pruritus (P = 0.15). ChatGPT was strongest within the mEQIP Content domain for all topics (P = 0.001). It performed weakest within the mEQIP Identification domain. It primarily lost points for recommending non-evidence-based treatments, lack of citations and visual aids, and generating broken source links.DiscussionChatGPT can provide accurate, reliable information at patient understanding level and is comparable to other validated online information for benign anorectal pathologies. It could improve on mEQIP performance by improving source documentation and visual aid capability, but it remains a promising patient resource with an intuitive user interface.

Keywords:  AI; ChatGPT; EQIP; anorectal; education

DOI:  https://doi.org/10.1177/00031348251337156
Int J Med Inform. 2025 May 05. pii: S1386-5056(25)00178-9. [Epub ahead of print]201 105961

The actual performance of large language models in providing liver cirrhosis-related information: A comparative study.

Yanqiu Li, Zhuojun Li, Jinze Li, Long Liu, Yao Liu, Bingbing Zhu, Ke Shi, Yu Lu, Yongqi Li, Xuanwei Zeng, Ying Feng, Xianbo Wang.

   OBJECTIVE: With the increasing prevalence of large language models (LLMs) in the medical field, patients are increasingly turning to advanced online resources for information related to liver cirrhosis due to its long-term management processes. Therefore, a comprehensive evaluation of real-world performance of LLMs in these specialized medical areas is necessary.
METHODS: This study evaluates the performance of four mainstream LLMs (ChatGPT-4o, Claude-3.5 Sonnet, Gemini-1.5 Pro, and Llama-3.1) in answering 39 questions related to liver cirrhosis. The information quality, readability and accuracy were assessed using Ensuring Quality Information for Patients tool, Flesch-Kincaid metrics and consensus scoring. The simplification and their self-correction ability of LLMs were also assessed.
RESULTS: Significant performance differences were observed among the models. Gemini scored highest in providing high-quality information. While the readability of all four LLMs was generally low, requiring a college-level reading comprehension ability, they exhibited strong capabilities in simplifying complex information. ChatGPT performed best in terms of accuracy, with a "Good" rating of 80%, higher than Claude (72%), Gemini (49%), and Llama (64%). All models received high scores for comprehensiveness. Each of the four LLMs demonstrated some degree of self-correction ability, improving the accuracy of initial answers with simple prompts. ChatGPT's and Llama's accuracy improved by 100%, Claude's by 50% and Gemini's by 67%.
CONCLUSION: LLMs demonstrate excellent performance in generating health information related to liver cirrhosis, yet they exhibit differences in answer quality, readability and accuracy. Future research should enhance their value in healthcare, ultimately achieving reliable, accessible and patient-centered medical information dissemination.

Keywords:  Accuracy Assessment; Large Language Model; Liver Cirrhosis; Quality Assessment; Readability Assessment

DOI:  https://doi.org/10.1016/j.ijmedinf.2025.105961
J ISAKOS. 2025 May 03. pii: S2059-7754(25)00509-7. [Epub ahead of print] 100892

Is ChatGPT a More Academic Source Than Google Searches for Patients Questions About Hip Arthroscopy? An Analysis of the Most Frequently Asked Questions.

Necati Bahadir Eravsar, Mahmud Aydin, Atahan Eryilmaz, Cihangir Turemis, Serkan Surucu, Andrew E Jimenez.

   OBJECTIVES: The purpose of this study was to compare the reliability and accuracy of responses provided to patients about hip arthroscopy by Chat Generative Pre-Trained Transformer (ChatGPT), an artificial intelligence (AI) and large language model (LLM) online program, with those obtained through a contemporary Google search for frequently asked questions (FAQs) regarding hip arthroscopy.
METHODS: "Hip arthroscopy" (HA) was entered into Google Search and ChatGPT, and the 15 most common FAQs and the answers to the FAQs were determined. In Google Search, the FAQs were obtained from the "People also ask" section. ChatGPT was queried to provide the 15 most common FAQs and subsequent answers. The Rothwell system groups the questions under 10 subheadings: Responses of ChatGPT and Google search engines were compared.
RESULTS: Timeline of recovery (23.3%) and technical details (20%) were the most common categories of questions. ChatGPT produced significantly more data in the technical details category (33.3% vs. 6.6%; p-value = 0.0455) than in the other categories. The most frequently asked questions were academic for both Google web search (46.6%) and ChatGPT (93.3%). ChatGPT provided significantly more academic references than Google web searches (93.3% vs. 46.6%). Conversely, Google web search cited more medical practice references (20% vs. 0%), single surgeon websites (26% vs. 0%), and government websites (6% vs. 0%) more frequently than ChatGPT.
CONCLUSION: ChatGPT performed similarly to Google searches for information about hip arthroscopy. Compared to Google, ChatGPT provided significantly more academic sources for its answers to patient questions.
LEVEL OF EVIDENCE: Level IV.

Keywords:  Artificial intelligence; ChatGPT; chatbot; google; hip arthroscopy; patient information

DOI:  https://doi.org/10.1016/j.jisako.2025.100892
J Cutan Med Surg. 2025 May 04. 12034754251338821

The Quality and Readability of Online Resources for Women With Hair Loss: A Cross-Sectional Study.

Eric McMullen, Shahnawaz Towheed, Grace Xiong, Sana Gupta, Chris Shenouda, Gresha Shah, Karanvir Singh, Ilya Mukovozov, Cathryn Sibbald, Jeffrey Donovan.



Keywords:  Toronto; hair; hair loss; hair specialist; online resources; readability

DOI:  https://doi.org/10.1177/12034754251338821
J Med Internet Res. 2025 May 05. 27 e70131

The Effectiveness of a Custom AI Chatbot for Type 2 Diabetes Mellitus Health Literacy: Development and Evaluation Study.

Anthony Kelly, Eoin Noctor, Laura Ryan, Pepijn van de Ven.

   BACKGROUND: People living with chronic diseases are increasingly seeking health information online. For individuals with diabetes, traditional educational materials often lack reliability and fail to engage or empower them effectively. Innovative approaches such as retrieval-augmented generation (RAG) powered by large language models have the potential to enhance health literacy by delivering interactive, medically accurate, and user-focused resources based on trusted sources.
OBJECTIVE: This study aimed to evaluate the effectiveness of a custom RAG-based artificial intelligence chatbot designed to improve health literacy on type 2 diabetes mellitus (T2DM) by sourcing information from validated reference documents and attributing sources.
METHODS: A T2DM chatbot was developed using a fixed prompt and reference documents. Two evaluations were performed: (1) a curated set of 44 questions assessed by specialists for appropriateness (appropriate, partly appropriate, or inappropriate) and source attribution (matched, partly matched, unmatched, or general knowledge) and (2) a simulated consultation of 16 queries reflecting a typical patient's concerns.
RESULTS: Of the 44 evaluated questions, 32 (73%) responses cited reference documents, and 12 (27%) were attributed to general knowledge. Among the 32 sourced responses, 30 (94%) were deemed fully appropriate, with the remaining 2 (6%) being deemed partly appropriate. Of the 12 general knowledge responses, 1 (8%) was inappropriate. In the 16-question simulated consultation, all responses (100%) were fully appropriate and sourced from the reference documents.
CONCLUSIONS: A RAG-based large language model chatbot can deliver contextually appropriate, empathetic, and clinically credible responses to T2DM queries. By consistently citing trusted sources and notifying users when relying on general knowledge, this approach enhances transparency and trust. The findings have relevance for health educators, highlighting that patient-centric reference documents-structured to address frequent patient questions-are particularly effective. Moreover, instances in which the chatbot signals that it has drawn on general knowledge can provide opportunities for health educators to refine and expand their materials, ensuring that more future queries are answered from trusted sources. The findings suggest that such chatbots may support patient education, promote self-management, and be readily adapted to other health contexts.

Keywords:  AI; ChatGPT; LLM; RAG; T2DM; artificial intelligence; chatbot; clinical credibility; conversational agent; diabetes; health literacy; large language model; retrieval-augmented generation; trust; type 2 diabetes mellitus

DOI:  https://doi.org/10.2196/70131
J Pediatr Gastroenterol Nutr. 2025 May 08.

Parental education in pediatric dysphagia: A comparative analysis of three large language models.

Bülent Alyanak, Burak Tayyip Dede, Fatih Bağcıer, Mazlum Serdar Akaltun.

   OBJECTIVES: This study evaluates the effectiveness of three widely used large language models (LLMs)-ChatGPT-4, Copilot, and Gemini-in providing accurate, reliable, and understandable answers to frequently asked questions about pediatric dysphagia.
METHODS: Twenty-five questions, selected based on Google Trends data, were presented to ChatGPT-4, Copilot, and Gemini, and the responses were evaluated using a 5-point Likert scale for accuracy, the Ensuring Quality Information for Patients (EQIP) and DISCERN scales for information quality and reliability, and the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) scores for readability. The performance of ChatGPT-4, Copilot, and Gemini was assessed by presenting the same set of questions at three different time points: August, September, and October 2024. Statistical analyses included analysis of variance, Kruskal-Wallis tests, and post hoc comparisons, with p values below 0.05 considered significant.
RESULTS: ChatGPT-4 achieved the highest mean accuracy score (4.1 ± 0.7) compared to Copilot (3.1 ± 0.7) and Gemini (3.8 ± 0.8), with significant differences observed in quality ratings (p < 0.001 and p < 0.05, respectively). EQIP and DISCERN scores further confirmed the superior performance of ChatGPT-4. In terms of readability, Gemini achieved the highest scores (FRE = 48.7 ± 9.9 and FKGL = 10.1 ± 1.6).
CONCLUSIONS: While ChatGPT-4 generally provided more accurate and reliable information, Gemini produced more readable content. However, variability in overall information quality indicates that, although LLMs hold potential as tools for pediatric dysphagia education, further improvements are necessary to ensure consistent delivery of reliable and accessible information.

Keywords:  ChatGPT; Copilot; Gemini; artificial intelligence; dysphagia

DOI:  https://doi.org/10.1002/jpn3.70069
J Cutan Med Surg. 2025 May 04. 12034754251338826

Analyzing the Quality and Readability of Online Resources Available for Androgenetic Alopecia: A Cross-Sectional Analysis.

Grace Xiong, Sana Gupta, Eric McMullen, Jeffrey Donovan, Shireen Dumont, Cathryn Sibbald, Ilya Mukovozov.



Keywords:  Google; androgenetic alopecia; androgenic alopecia; female pattern hair loss; male pattern hair loss; online resources; readability

DOI:  https://doi.org/10.1177/12034754251338826
JSES Rev Rep Tech. 2025 May;5(2): 140-145

Usability and accessibility of total shoulder arthroplasty and rotator cuff repair online patient educational materials for persons with disabilities.

S Shamtej Singh Rana, Jacob Sina Ghahremani, Ronald Anthony Navarro.

   Background: Online patient educational materials (OPEMs) enable patients to better understand their care. However, not every patient has the same ability to utilize Internet-based information, due to cognitive/medical limitations. The Web Content Accessibility Guidelines (WCAG) are a set of defined and evaluable rules and regulations for the development of websites that are accessible to persons with disabilities. The purpose of this study was to assess the accessibility, usability, and WCAG compliance of OPEMs pertaining to total shoulder arthroplasty (TSA) and rotator cuff repair (RCR).
Methods: Google searches of the terms "total shoulder replacement" and "rotator cuff repair" were conducted to include 25 eligible OPEMs each. The open-source WCAG conformity tool Site Analyzer was used to collect search engine optimization (SEO), content, design, performance, accessibility, and overall scores for each of the OPEMs. Another open-source WCAG conformity tool, Silktide Compliance Checker, was used to count the number of compliance errors at each WCAG compliance level (A, AA, AAA) and text contrast ratio for each OPEM.
Results: The mean conformity scores for TSA were as follows: 73.0 ± 12.8% SEO score, 73.6 ± 12.0% content score, 72.0 ± 10.8% design score, 64.7 ± 16.4 % performance score, 74.2 ± 11.9% accessibility score, and 75.2 ± 6.7 % overall score. The mean conformity scores for RCR were as follows: 74.2 ± 19.3% SEO score, 67.5 ± 18.6% content score, 66.7 ± 12.9% design score, 61.4 ± 16.4% performance score, 74.1 ± 14.0% accessibility score, and 72.7 ± 11.7% overall score. The mean number of TSA compliance errors was 12.36 ± 10.55 at WCAG 2.2 A, 17.32 ± 13.49 at WCAG 2.2 AA, and 62.04 ± 27.22 at WCAG 2.2 AAA. The mean number of RCR compliance errors was 20.48 ± 20.80 at WCAG 2.2 A, 26.48 ± 22.85 at WCAG 2.2 AA, and 58.12 ± 35.72 at WCAG 2.2 AAA. The mean contrast ratio was 14.23 ± 5.19:1 for TSA and 13.26 ± 5.66:1 for RCR.
Conclusion: OPEMs for both TSA and RCR, on average, contain high levels of WCAG compliance errors and have low SEO, content, design, performance, and accessibility as determined by WCAG. However, these OPEMs successfully ensure high contrast ratios, which are necessary for visually impaired users. There is no significant difference between scores, compliance errors, or contrast ratios between the two search terms.

Keywords:  Accessibility; Health equity; Patient education materials; Rotator cuff repair; Total shoulder arthroplasty; Usability; Web access guidelines

DOI:  https://doi.org/10.1016/j.xrrt.2024.09.009
Am J Surg. 2025 Apr 23. pii: S0002-9610(25)00175-8. [Epub ahead of print]245 116353

Navigating the Whipple procedure patient educational materials: A readability and understandability analysis.

Gabriela Rangel Brandao, Cristina Ponce, Manuel Castillo-Angeles, Tara Sotsky Kent.

   OBJECTIVE: Pancreatoduodenectomy (PD) is a complex surgical procedure that is challenging to understand. We aimed to assess the readability, understandability, and actionability of online patient-directed materials for PD.
METHODS: We evaluated online patient-focused educational materials about PD. Through the Leapfrog ratings for "Pancreatic Resection for Cancer" we classified high-volume (HV) and low-volume (LV) hospitals. Readability was measured using multiple tools. Understandability and actionability were measured using the Patient Education Materials Assessment Tool for Printable materials (PEMAT-P). As an external control source of comparison, we analyzed the patient materials from three patient-focused organizations.
RESULTS: Out of 550 HV hospitals, 10 % had any online patient educational material about PD. Readability was at a median grade level of 12 (IQR 4). All exceeded the recommended sixth-grade readability level. Websites solely focused on PD had significantly higher understandability and actionability scores compared to those where the procedure was only a section within a broader topic such as pancreatic cancer. There was no difference in readability, understandability, or actionability among HV, LV and control groups.
CONCLUSIONS: Online patient materials for PD were scarce, lengthy and difficult to comprehend. However, their understandability is crucial for providing patient education. Simplification of patient materials with clear guidance, visuals, and simple language, and strategies-focused research are needed.

Keywords:  Actionability; Educational materials; Pancreatic cancer; Pancreatoduodenectomy; Readability; Understandability; Whipple

DOI:  https://doi.org/10.1016/j.amjsurg.2025.116353
J Endod. 2025 May 05. pii: S0099-2399(25)00212-2. [Epub ahead of print]

Evaluating Large Language Models in Addressing Patient Questions on Endodontic Pain: A Comparative Analysis of accessible chatbots.

Sanaa Aljamani, Yazan Hassona, Hoda A Fansa, Heba Saadeh, K D Jamani.

   BACKGROUND: Patients increasingly use large language models (LLMs) for health-related information, but their reliability and usefulness remain controversial. Continuous assessment is essential to evaluate their role in patient education.
AIMS: This study evaluates the performance of ChatGPT 3.5 and Gemini in answering patient inquiries about endodontic pain.
METHODS: A total of 62 frequently asked questions on endodontic pain were categorized into etiology, symptoms, management, and incidence. Responses from ChatGPT 3.5 and Gemini were assessed using standardized tools, including the Global Quality Scale (GQS), CLEAR reliability tool, and readability indices (Flesch-Kincaid and SMOG).
RESULTS: Compared to Gemini, ChatGPT 3.5 responses scored significantly higher in terms of overall quality (GQS: 4.67-4.9 vs. 2.5-4, p < 0.001) and reliability (CLEAR: 23.5-23.6 vs. 19.35-22.7, p < 0.05). However, it required a higher reading level (SMOG: 14-17.6) compared to Gemini (8.7-11.3, p < 0.001). Gemini's responses were more readable (6th-7th grade level) but lacked depth and completeness.
CONCLUSION: While ChatGPT 3.5 outperformed Gemini in quality and reliability, its complex language reduced accessibility. In contrast, Gemini's simpler language enhanced readability but sacrificed comprehensiveness. These findings highlight the need for professional oversight in integrating AI-driven tools into healthcare communication to ensure accurate, accessible, and empathetic patient education.

Keywords:  ChatGPT 3.5; Endodontic pain; Gemini; Large Language Models (LLMs); Patient Education

DOI:  https://doi.org/10.1016/j.joen.2025.04.015
Mycoses. 2025 May;68(5): e70057

Common Disease-Difficult Understanding: Readability Analysis of Superficial Skin Fungal Infections On-Line Materials in European Languages.

Tomasz Skrzypczak, Anna Skrzypczak, Andrzej Jaworek, Jacek C Szepietowski.

   BACKGROUND: Studies analysing the readability of online materials about dermatomycoses were very limited.
OBJECTIVES: This study evaluated the readability of online materials related to superficial skin fungal infections in English, German, French, Italian, Spanish and Polish.
METHODS: The terms 'dermatomycosis', 'dermatophytosis' and 'trichophytosis' translated into included languages were searched using the Google search engine. The first 50 records in each language were screened for suitability. Articles that were accessible, relevant to dermatological fungal infections and aimed at patient education were included. The LIX score was utilised to assess readability.
RESULTS: In general, 167 articles out of 900 screened (19%) were analysed. The overall mean LIX score was 56 ± 7, which classified articles as very difficult to comprehend. The most readable were articles retrieved with the search term 'trichophytosis' with a mean LIX score of 49 ± 3, followed by 'dermatophytosis' with 54 ± 8 and 'dermatomycosis' with 58 ± 7 (p < 0.001). The most readable articles were in English (48 ± 7) and Spanish (50 ± 5), followed by German (54 ± 4), French (55 ± 6), Italian (59 ± 5) and Polish (63 ± 4) (p < 0.001). The increase in the number of analysed articles was correlated with a higher average LIX score (p = 0.036, R2 = 0.708).
CONCLUSIONS: Low availability and readability of online patient materials related to superficial skin fungal infections could hinder patient understanding, leading to improper antifungal use, increased recurrence rates and the risk of antifungal resistance. The dermatologists should take action to ensure adequate online materials in Internet-based society.

Keywords:  education; online‐materials; readability; skin fungal infections

DOI:  https://doi.org/10.1111/myc.70057
Ophthalmic Epidemiol. 2025 May 04. 1-7

Assessing the Readability, Quality, and Visual Accessibility of Patient Education Websites for Laser Refractive Surgery.

David Mothy, Aneesh P Reddy, Charlene W Cai, Hassaam S Choudhry, Mohammad H Dastjerdi.

   PURPOSE: To assess the usability of patient education websites for refractive surgery through an analysis of readability, accountability, subjective quality, and visual accessibility.
METHODS: 50 patient education websites for five refractive surgery modalities were gathered from an incognito Google search and categorized by authorship category: institutional, medical organization, or private practice. Each website was assessed for readability, accountability using the Journal of the American Medical Association (JAMA) benchmark, subjective quality using the DISCERN instrument, and visual accessibility was assessed using the Web Accessibility Evaluation Tool (WAVE).
RESULTS: The mean reading grade across all websites was 11.02, exceeding the American Medical Association's recommended 6th-grade level (p < .001). Institutional websites were the most readable (10.32, p = 0.005) while private practice sites were the least (11.74, p = 0.015). The average JAMA score was 1.52 with no website meeting all four accountability criteria. Websites from medical organizations had significantly higher JAMA scores (1.94, p = 0.049). The average DISCERN score was 51.97 with no differences between authorship categories. Websites had an average of 87.84 visual accessibility violations.
CONCLUSIONS: Available patient education websites for refractive surgery may suffer from poor readability, quality, and visual accessibility which may limit their usability.

Keywords:  DISCERN; LASIK; readability; refractive surgery; visual accessibility

DOI:  https://doi.org/10.1080/09286586.2025.2500014
Hernia. 2025 May 03. 29(1): 155

A comparative evaluation of laparoscopic TAPP videos on YouTube and WebSurg: surgical steps and educational value.

Alisina Bulut, Muhammed İkbal Akın.

   PURPOSE: Laparoscopic Transabdominal Preperitoneal (TAPP) videos on YouTube and WebSurg have an important place in surgical training. However, there are differences between these platforms in terms of training quality and compliance with standard surgical steps. The aim of this study was to compare laparoscopic TAPP videos on YouTube (limited to individually uploaded content) and WebSurg in terms of surgical technique and educational quality.
METHODS: Twelve videos meeting specific criteria were selected from both platforms. The 2 groups were compared using the 9 Commandments, which assessed compliance with surgical steps, and the Procedure Presentation Score (PPS), which assessed video quality/educational content. Data on video characteristics, such as view count, publication date, and duration, were also collected.
RESULTS: Although YouTube videos reached more viewers, WebSurg videos had higher compliance with the 9 Commandments (WebSurg median score 8/9 vs. YouTube median score 5/9, p < 0.01). In addition, WebSurg videos had higher PPS scores (median: 8) than YouTube videos (median: 5) (p = 0.02).
CONCLUSION: When utilizing online video platforms for surgical training, institutional training platforms such as WebSurg should be preferred. When the videos included here were compared with YouTube in the light of the defined criteria; it was seen that YouTube videos were not of sufficient quality.

Keywords:  Hernia repair; Inguinal hernia; Laparoscopic TAPP; WebSurg; YouTube

DOI:  https://doi.org/10.1007/s10029-025-03341-8
Ann Chir Plast Esthet. 2025 May 07. pii: S0294-1260(25)00036-6. [Epub ahead of print]

Is YouTube™ a useful resource of information about bichectomy? A cross-sectional study.

H Ɪ Durmuş, B Ege, S Bayazıt, M Koparal.

   OBJECTIVES: This study sought to evaluate the content and quality value of bichectomy-related YouTube™ videos.
MATERIAL AND METHODS: Four keywords (bichectomy, buccal fat removal, buccal lipectomy, Bichat's fat pad) related to bichectomy were searched on YouTube™. The top 100 videos were analyzed according to the number of views, taking into account video exclusion criteria for each keyword. The top 100 videos with the most views out of the 400 total videos included were used in our study. Video source, runtime, viewers, likes, dislikes, reliability, popularity, visibility, and usefulness were recorded and analyzed for each video. Among these videos, the top 10 videos with the best values according to all these criteria were also evaluated.
RESULTS: Of the 100 videos included, 60 were from health professionals, 21 from health sites, 14 from individual sources, and the remaining 5 from other sources. In the statistical analysis, based on their sources, reliability values were the only factor among videos that significantly differed (P<0.05). Nonetheless, no significant difference was determined in the values of views, likes, quality, and usefulness (P>0.05).
CONCLUSION: After bichectomy-related content and video quality analysis, YouTube™ videos did not meet the criteria for being an adequate and trustworthy source of information for patients. Particularly the videos that are uploaded by health professionals and organizations ought to be of better quality with a higher level of information content.

Keywords:  Bichat's fat pad; Bichectomie; Bichectomy; Buccal lipectomy; Coussinet adipeux de Bichat; Internet; Lipectomie buccale; Patient; Video; Vidéo; YouTube™

DOI:  https://doi.org/10.1016/j.anplas.2025.03.006
Neurourol Urodyn. 2025 May 04.

Bridging the Gap in Patient Guidance: Quality Analysis of Kegel Exercise Videos on YouTube.

Nazım Furkan Gunay, Sedat Cakmak, Mucahit Gelmis, Caglar Dizdaroglu.

   OBJECTIVE: This study aimed to evaluate the content and quality of YouTube videos on Kegel exercises, focusing on their reliability for patients managing conditions such as premature ejaculation, urinary incontinence, and post-pelvic surgery recovery.
METHODS: A cross-sectional analysis was conducted using the keyword "Kegel exercises" on YouTube. The top 100 videos were screened, and 60 met the inclusion criteria. Videos were evaluated using the Kegel Video Evaluation Score (KVES) and the Global Quality Scale (GQS). Video metrics such as duration, views, likes, and comments were recorded, and statistical analyses were performed to assess correlations and inter-rater reliability.
RESULTS: The mean KVES and GQS scores were 17.26 ± 4.38 and 3.36 ± 1.09, respectively, with 60% of videos categorized as high quality. Nonphysician health professionals uploaded the highest-quality videos for instructional content. A strong correlation between KVES and GQS was observed (Spearman's ρ = 0.924, p < 0.001), validating the evaluation tool. Video duration positively impacted content comprehensiveness, while views and likes did not correlate with quality.
CONCLUSIONS: In contrast to previous studies highlighting low-quality medical content on YouTube, this study shows that high-quality Kegel exercise videos are prevalent, suggesting their potential as reliable educational resources. Clinician involvement is essential in guiding patients to trustworthy content and fostering collaboration with content creators to improve digital health information.

Keywords:  Urinary incontinence; kegel exercises; pelvic floor muscle training

DOI:  https://doi.org/10.1002/nau.70070
Niger J Clin Pract. 2025 Jan 01. 28(1): 27-32

Rabies in Social Media Videos: Comparison of Instagram and YouTube.

M F Baran, N İ Işik.

OBJECTIVE: This study aims to elucidate the informational content related to post-exposure patient education for this disease, emphasizing the significance of social media platforms as sources of information. The goal is to uncover and compare the information available on various social media platforms.
METHODOLOGY: Searches were conducted on Instagram and YouTube using the search terms "Rabies," "Rabies disease," and "Rabies vaccine." A total of 274 videos were examined, with 150 from YouTube and 124 from Instagram. The content of the videos was assessed based on 10 criteria determined by researchers according to the National Rabies Prophylaxis Guidelines, and a scoring system was applied.
RESULTS: Instagram videos had more exclusion criteria. When examined based on uploader characteristics, the number of healthcare professionals on Instagram was higher than on YouTube. For questions related to "What is rabies," "What are the symptoms in animals," and "How should pre-exposure prophylaxis be," Instagram videos received higher scores. Videos uploaded by healthcare professionals received higher scores in questions related to "What is rabies," "How does it spread to humans," "How should wound care be," "Pre-exposure prophylaxis," "Post-exposure prophylaxis," and total score compared to videos uploaded by other independent users.
CONCLUSION: A significant portion of the videos uploaded by various users on social media about rabies were found to be unrelated and lacking in informative content. It was observed that videos on Instagram were more informative compared to YouTube. Health professionals were found to provide more informative and directive content in videos related to rabies.

DOI: https://doi.org/10.4103/njcp.njcp_70_24
Front Public Health. 2025 ;13 1563188

Hip fractures in Chinese TikTok (Douyin) short videos: an analysis of information quality, content and user comment attitudes.

Zhuoxin Li, Yashi Lin, Kairou Zhang, Ran Li, Mei Ju, Yanhua Chen, Jing Fu, Ruiyu Huang, Ling Zhu, Junjun Sun, Yanxia Guo, Min Gao, Yue Hu, Gang Liu, Baolu Zhang.

   Background: Hip fracture presents a major healthcare challenge globally. While numerous Douyin videos address hip fracture, their information quality and factors affecting user comment attitudes remain uncertain.
Objective: This study aims to analyze the content, information quality, and user comment attitudes of videos depicting hip fractures on Chinese TikTok (Douyin).
Methods: The search term "hip fracture" was used on Douyin, which resulted in 170 samples being included. Video information quality was assessed using the GQS and PEMAT scales. Video content was analyzed using DivoMiner. User comments were extracted using Gooseeker, and user comment attitudes were interpreted as positive, neutral, or negative using the Weiciyun website. Data analysis was performed using IBM SPSS version 29.0, including non-parametric tests for continuous variables and chi-square tests for categorical variables. The identified factors were then included in a multivariate logistic regression analysis to examine their impact on user comment attitudes.
Results: Health professionals were the primary source of videos (136/138, 98.6%). The overall information quality of the videos was moderate (median 3, IQR 2.00-4.00). Douyin videos were relatively high in understandability (median 72.70%, IQR 63.60-81.80%) but low in actionability (median 33.33%, IQR 0-66.67%). Most videos focused on treatment (139/170, 81.8%). Regarding user comment attitudes, the majority of videos were received with positive comments (113/170, 66.5%), followed by negative comments (39/170, 22.9%) and neutral comments (18/170, 10.6%). The multivariate logistic regression analysis revealed three factors influencing positive attitudes: the GQS score (OR 13.824, 95% CI 6.033-31.676), understandability (OR 2.281, 95% CI 1.542-5.163) and not mentioning risk factors in videos (OR 0.291, 95%CI 0.091-0.931).
Conclusion: The majority of hip fracture videos on Douyin were created by health professionals and had intermediate information quality, with user comment attitudes remaining positive. However, these videos often lacked actionability and had insufficient mention of prevention and rehabilitation content. Videos with higher information quality that addressed hip fracture risk factors received more positive user comments. This study suggests that publishers of hip fracture-related videos should improve actionability while simultaneously paying attention to both prevention and rehabilitation content to enhance the educational value of these videos.

Keywords:  Douyin; hip fractures; information quality; short video; user comment attitudes

DOI:  https://doi.org/10.3389/fpubh.2025.1563188
BMC Public Health. 2025 May 09. 25(1): 1713

The content quality and educational significance of early childhood caries on short video platforms.

Ming-Na Huang, Hong Lu, Ming-Yue Huang, Cai-Yu Li, Yue-Mei Zheng, Dan Wang, Shi-Jun Tang.

   BACKGROUND: Early clinical screening and prevention can reduce the incidence and severity of early childhood caries (ECC). With the development of social media, TikTok and Douyin were used as important tools for ECC popularization and early screening. The purpose of this study was to evaluate the educational impact from the integrity, accuracy and quality of ECC-related short videos on TikTok and Douyin.
METHODS: We searched for short videos related to ECC on the mobile application TikTok and Douyin on April 15, 2024. The search keywords were as follows: "Early childhood caries" on TikTok in both English and Japanese, and Chinese search on Douyin. The first 100 short videos were selected as samples for each group. we applied an instrument called DISCERN, which consisted of 3 sections and a total of 16 questions to evaluate the quality of each short video, and used a checklist to rate the content of videos. The accuracy of the content was evaluated based on the Children's Caries Risk Assessment and Management Guidelines.
RESULTS: A total of 115 short videos were assessed for the useful information quality of ECC, including 78 Chinese, 26 English, and 11 Japanese. The score for the content quality of short videos showed that each of the three groups assigned the highest scores to the sections on symptoms and treatment, with Chinese short videos achieving the top ratings. The DISCERN scores for useful short videos in each group were 33.10 ± 3.49 in Chinese, 29.54 ± 2.37 in English, and 28.27 ± 2.61 in Japanese, respectively. Compared with English and Japanese videos, Chinese videos had the highest DISCERN score with significant differences (p < 0.05). Meanwhile, in Chinese short videos, healthcare professionals or organizations uploaded videos with higher DISCERN scores, which were more comprehensive and extensive than those uploaded by private users.
CONCLUSIONS: It is necessary for more healthcare professionals and institutions to join in to improve the quality of content on short video platforms and solve more health problems for patients through short videos.

Keywords:  Early childhood caries; Public health; Quality evaluation; Short video

DOI:  https://doi.org/10.1186/s12889-025-22962-3
Arthroscopy. 2025 May 07. pii: S0749-8063(25)00345-7. [Epub ahead of print]

Author Reply to "Letter to the Editor Regarding 'YouTube is an Inconsistent Source of Information on Orthobiologics: Implications for Content Quality, Reliability, Comprehensiveness, and Patient Decision-Making'".

Jared P Sachs, Alexander C Weissman, Kyle R Wagner, Kaitlyn M Joyce, Trice Pickens, Andrew S Bi, Brian J Cole.

DOI: https://doi.org/10.1016/j.arthro.2025.04.048
Cureus. 2025 Apr;17(4): e81883

Understanding Breast Cancer (BRCA) Mutations Through TikTok.

Tamara Lalovic, Alexandar Lalovic, Priyanka Raju.

  TikTok is one of the biggest social media platforms where people look for support, seek advice, and educate themselves through videos created by other users. In today's world, people frequently turn to the internet for easy access to health and medical information; it is no surprise that TikTok has helped facilitate the creation of online support groups for those who are overcoming various illnesses and diseases. Breast cancer (BRCA) is worldwide the most common cancer in women, and while BRCA discussions on TikTok have been researched previously, discussions surrounding the BRCA susceptibility gene mutations have not. This study aims to explore content on TikTok related to BRCA mutations and assess its impact on public awareness and physician involvement. One hundred videos were watched on TikTok after searching "BRCA", BRCA1", or "BRCA2". These videos were categorized into groups based on content creator and type of content. The content included discussion regarding preventative measures with BRCA mutations, BRCA and BRCA mutation, ovarian cancer and BRCA mutation, and various treatment options. The videos were also evaluated based on the number of views each video received. This study found that healthcare workers participated in BRCA-related TikTok videos at only 2.3% of all evaluated videos; 97.4% were created by laypersons. Of those videos created by non-medical personnel, 17.3% were educational. The majority (49.3%) were centered around preventative measures and treatments undergone after a BRCA mutation diagnosis. The treatment modality most discussed was double mastectomy at 68%, with 41.3% of those videos being preventative double mastectomies without a current cancer diagnosis. The video with the most views was with regard to motherhood and BRCA prevention at 2.3 million views; 10.4% of the videos discussed difficulties with BRCA in motherhood. Overall, this study highlights the importance of physician involvement in social media platforms. It also showcases how medical providers can use TikTok to better understand patients' needs and discussions outside of the office with regard to their diagnosis.

Keywords:  brca 1/2; brca gene mutation; health-related social media; healthcare social media; social media platform

DOI:  https://doi.org/10.7759/cureus.81883
JMIR Form Res. 2025 May 05. 9 e56028

Online Health Information-Seeking Behaviors Among the Chongqing Population: Cross-Sectional Questionnaire Study.

Honghui Rong, Lu Lu, Miao He, Tian Guo, Xian Li, Qingliu Tao, Yixin Li, Chuanfen Zheng, Ling Zhang, Fengju Li, Dali Yi, Enyu Lei, Ting Luo, Qinghua Yang, Ji-An Chen.

   Background: With the rapid development of the internet and its widespread use, online health information-seeking (OHIS) has become a popular and important research topic. Various benefits of OHIS are well recognized. However, OHIS seems to be a mixed blessing. Research on OHIS has been reported in Western countries and in high-income regions in eastern China. Studies on the population in the western region of China, such as Chongqing, are still limited.
Objective: The aim of the study was to identify the prevalence, common topics, and common methods of health information-seeking and the factors influencing these behaviors among the Chongqing population.
Methods: This cross-sectional questionnaire study was conducted from September to October 2021. A web-based questionnaire was sent to users aged 15 years and older in Chongqing using a Chinese web-based survey hosting site (N=14,466). Data on demographics, web-based health information resources, and health topics were collected. Factors that may influence health literacy were assessed using the chi-square test and multivariate logistic regression models.
Results: A total of 67.1% (9704/14,466) of the participants displayed OHIS behaviors. Participants who were younger, had a higher educational level, and worked as medical staff or teachers were more likely to engage in OHIS, while those living in rural areas, ethnic minorities, and farmers were less likely to seek health information on the web (P<.01). Among the Chongqing population, the most common topic searched on the internet was health behavior and literacy (87.4%, 8483/9704), and the most popular method of seeking health information on the web was through WeChat (77.0%, 7468/9704).
Conclusions: OHIS is prevalent in Chongqing. Further research could be performed based on the influencing factors identified herein and high-priority, effective ways of improving the OHIS behaviors of the Chongqing population.

Keywords:  China; Chongqing; Internet; health behavior; online health information seeking

DOI:  https://doi.org/10.2196/56028
Digit Health. 2025 Jan-Dec;11:11 20552076251334438

Appropriate trust in online health information is associated with information platform, commercial status, and misinformation in patients with high cardiovascular risk.

Woei Xian Lim, Hooi Min Lim, Yew Kong Lee, Carmen Jia Wen Chuah, Adina Abdullah, Chirk Jenn Ng, Adam G Dunn.

   Background: The quality of online health information (OHI) on cardiovascular health is highly variable. Trusting poor quality OHI can lead to poorer health decisions. This study examined information characteristics associated with appropriate trust in OHI among patients with high cardiovascular risk.
Methods: This is a secondary analysis from a cohort study of 270 participants with high cardiovascular risk from a primary care clinic in Malaysia. Participants recorded OHI entries and their trust levels over 2 months using a digital diary. Overall, 1194 OHI entries were included and categorised by platform, commercial status, content focus, and presence of misinformation, and assessed for quality using the DISCERN tool. Appropriate trust was determined by trust-quality matching (trusting high quality or distrusting low quality OHI). The association between information characteristics and appropriate trust was analysed using multiple logistic regression.
Results: Most entries were from websites (62%) and non-commercial sources (88.2%). Misinformation was found in 23.3% (278 of 1194) of entries; 30.8% (367 of 1194) were of good or excellent quality; 51.5% (615 of 1194) were appropriately trusted. Information from websites (vs social media) (AOR 4.31, 95% CI 3.14-5.91, P < .001), non-commercial source (vs commercial) (AOR 1.59, 95% CI 1.01-2.50, P = .047), and absence of misinformation (vs presence of misinformation) (AOR 2.11, 95% CI 1.40-3.20, P < .001) were associated with higher appropriate trust.
Conclusions: OHI from websites, non-commercial sources, and information without misinformation has higher appropriate trust among patients with high cardiovascular risk. This study highlighted the need for good-quality OHI and dissemination through reliable sources.

Keywords:  Online health information; cardiovascular; misinformation; quality; trust

DOI:  https://doi.org/10.1177/20552076251334438
Patient Educ Couns. 2025 Apr 24. pii: S0738-3991(25)00166-1. [Epub ahead of print]137 108799

Linking internet health information seeking to psychological distress among Chinese adults with a family cancer history: The attenuating role of patient-clinician communication of internet searches.

Yuyuan Kylie Lai, Jizhou Francis Ye, Changhao Yan, Xinshu Zhao.

   OBJECTIVE: The psychological distress faced by individuals with a family cancer history (FCH) has emerged as a significant concern. This study examined the relationship between Internet health information seeking (IHIS) and psychological distress, focusing on the mediating roles of cancer information overload and cancer worry and the moderating role of patient-clinician communication of Internet searches.
METHODS: In 2023, a nationally representative online survey was conducted. This research encompassed a cohort of 580 Chinese adults with FCH and no personal cancer history. Moderated mediation analysis was employed.
RESULTS: IHIS was not directly associated with psychological distress. However, it increased cancer information overload and cancer worry, which in turn exacerbated psychological distress. In addition, among people who discussed their online searches with healthcare providers, the positive association between IHIS and cancer information overload became nonsignificant.
CONCLUSION: The current study elucidates a mediation mechanism of cancer information overload and cancer worry in understanding the association between IHIS and psychological distress among individuals with FCH. Patient-clinician communication serves a prophylactic role by attenuating these adverse effects.
PRACTICAL IMPLICATIONS: Encouraging patient-clinician dialogues about Internet health information may be promising in curbing cancer information overload. Healthcare providers should proactively engage in such discussions to support patients' psychological health.

Keywords:  Cancer information overload; Cancer worry; Family cancer history; Internet health information seeking; Patient–clinician communication; Psychological distress

DOI:  https://doi.org/10.1016/j.pec.2025.108799
J Med Libr Assoc. 2025 Apr 18. 113(2): 133-142

Non-clinician involvement in interprofessional health sciences education: educator experiences and attitudes.

Rachel R Helbing, Robert C Hausmann.

   Objective: The objective of this study was to assess educator views on the knowledge, skills, and abilities needed by IPE facilitators and to explore their attitudes toward and experiences with non-clinician facilitators of IPE activities, particularly health sciences librarians.
Methods: This qualitative study utilized a novel questionnaire that included both multiple-choice and free-text questions. The latter were grounded in critical incident technique (CIT), a methodology that uses direct observations of human behavior to solve practical problems. The questionnaire was distributed electronically to the study's population of health sciences administrators, faculty, and staff in Texas who were involved with IPE. Multiple-choice data were analyzed via descriptive statistics, while free-text data were coded and analyzed via inductive thematic analysis principles.
Results: There were 48 responses out of 131 individuals contacted directly for a response rate of 36.64%. Educators recognized a wide range of characteristics needed by IPE facilitators but viewed interpersonal skills as most important. While many reported experience with non-clinician facilitators of IPE activities, fewer had experience working with health sciences librarians in these roles. Educator attitudes toward non-clinician facilitators of IPE, including librarians, were largely positive.
Conclusions: The findings of this study indicated that educators view interpersonal skills and the ability to elicit engagement as more important skills for IPE facilitators than a relevant clinical background. With proper facilitator training, non-clinicians could build upon their existing skillsets and increase their involvement with IPE, creating a larger pool of potential facilitators. A greater availability of skilled facilitators could increase the incidence of IPE, potentially resulting in more collaborative care and improved patient outcomes.

Keywords:  Interprofessional education; collaborative practice; critical incident technique; facilitation; inductive thematic analysis; qualitative research

DOI:  https://doi.org/10.5195/jmla.2025.1763