bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–06–15
twenty-six papers selected by
Thomas Krichel, Open Library Society



  1. Health Info Libr J. 2025 Jun 11.
      This study explores the application of user experience (UX) strategies to enhance the learning environment within the Leeds General Infirmary Library, part of the Leeds Teaching Hospitals NHS Trust. Despite the growing importance of UX in digital services and academic libraries, its adoption in health libraries has been limited. This paper details the implementation of three UX techniques-Graffiti walls, observations and behavioural mapping, and love and breakup letters-adapted from Andy Priestner's toolkit. The findings highlight user preferences and behaviours, leading to practical, low-cost improvements in the library space. The study underscores the value of UX methodologies in optimising library services to better meet user needs, even in resource-limited settings.
    Keywords:  communication; libraries (health care); library space utilisation; participant observation; stakeholders
    DOI:  https://doi.org/10.1111/hir.12574
  2. Fam Pract. 2025 Jun 04. pii: cmaf037. [Epub ahead of print]42(4):
      Primary care researchers and clinicians are facing an ever-growing evidence base, more options to access research evidence, and increasingly limited time. Incorporating search filters into primary care systematic reviews can significantly improve the efficiency and confidence of the search process. Search filters, or hedges, are predeveloped search strategies that combine controlled vocabulary and free text terms using Boolean operators (words like "AND," "OR"). Search filters help to manage the diverse terminology in the literature, such as the various synonyms for primary care, and can be tailored to the specific needs of the review, whether it aims to be exhaustive or more focussed. Resources such as specialized librarians, databases such as PubMed, and repositories such as the InterTASC Information Specialists Sub-Group provide access to these valuable tools. However, as primary care terminology continues to evolve, regular updates to these filters are necessary to maintain their relevance and effectiveness. This method brief presents search filters and highlights their value for finding research literature in primary care.
    Keywords:  databases as topic; information science; medical subject headings; methods; systematic review
    DOI:  https://doi.org/10.1093/fampra/cmaf037
  3. Health Info Libr J. 2025 Jun 09.
      This article examines how direct engagement with credible resources significantly enhances students' practical skills in clinical scenarios. The program prepares students to utilize authoritative resources, fostering confidence in real-world clinical settings, especially in resource-limited environments. Data from student evaluations indicate a marked improvement in perceived difficulty of EBM cases and overall performance scores following the implementation of librarian-led learning sessions. This approach not only meets accreditation standards but also equips future healthcare professionals with essential skills for effective patient care. This underscores the crucial role of librarians in enhancing students' abilities to appraise and apply evidence-based knowledge.
    Keywords:  evidence‐based medicine (EBM); health science; librarians; literature searching; medical education; medical students; rural health services
    DOI:  https://doi.org/10.1111/hir.12575
  4. JAMA Netw Open. 2025 Jun 02. 8(6): e2515160
       Importance: Systematic reviews are the criterion standard for evidence synthesis in the life sciences, yet their reliability and integrity are threatened by citation contamination from fabricated publications produced by paper mills. Despite growing awareness, the extent and implications of this issue remain unclear.
    Objectives: To analyze the prevalence, characteristics, affected subject areas, and citation patterns of retracted paper mill articles cited in systematic reviews.
    Design, Setting, and Participants: This cross-sectional study analyzed systematic reviews published between 2013 and 2024, indexed in Web of Science (WoS). References were matched against the Retraction Watch dataset, and full texts were reviewed to identify retracted paper mill articles incorporated into the evidence synthesis.
    Main Outcomes and Measures: The study assessed (1) contamination prevalence, defined as the proportion of systematic reviews incorporating retracted paper mill articles into the evidence synthesis; (2) geographic distribution of citing authors according to institutional affiliations; (3) citation timing and trends, including the time lag between incorporation and article retraction; (4) affected research areas, categorized by WoS subject classifications; and (5) citation patterns, including highly contaminated reviews (≥3 incorporations of retracted articles).
    Results: Of the total of 200 000 systematic reviews, 299 incorporated at least 1 retracted paper mill article into the evidence synthesis (contamination rate, 0.15%). Among them, 256 (85.6%) included a single retracted article, and 43 (14.4%) included multiple such articles. Of 1802 author affiliations associated with the contaminated reviews, 660 (36.6%) were from institutions in China. Of 385 total citations, 124 (32.2%) occurred after retraction, including 13 occurring more than 500 days after the retraction date. Oncology was the most affected field (48 of 299 [16.1%]). Five reviews each included 5 or more retracted articles, all published in journals under questionable publishers.
    Conclusions and Relevance: In this cross-sectional study of life sciences systematic reviews, contamination remained low but increased over time, posing a risk to research integrity. Continued citation of retracted articles, even after retraction, highlights the need for rigorous screening practices. Correcting contaminated reviews and developing automated detection tools are essential to preserving the credibility of systematic reviews.
    DOI:  https://doi.org/10.1001/jamanetworkopen.2025.15160
  5. Med Ref Serv Q. 2025 Jun 13. 1-19
      Osteopathic practitioners and researchers face a scarcity of readily accessible scientific literature that bridges evidence-based research with clinical practice. Additionally, there is an absence of libraries specifically dedicated to osteopathic manipulative medicine. Created to fill this gap, Osteoevidence is an online bibliographic database dedicated to advancing osteopathic manipulative medicine by providing streamlined access to scientific literature. Designed in collaboration with osteopaths, this free and user-centric platform indexes 7,391 peer-reviewed reviews, guidelines, and clinical trials from leading research repositories. It integrates a search interface with customizable sorting and lateral filtering tailored for osteopathic contexts. Since its launch in 2022, Osteoevidence aims to support clinicians, students, and researchers worldwide. This paper examines its development, functionality, and its role in osteopathic research and practice, and support information services in clinical and academic settings, including those offered by specialized medical librarians.
    Keywords:  Osteoevidence; Osteopathic manipulative treatment; Osteopathic medicine; Osteopathy; database; evidence-based osteopathy
    DOI:  https://doi.org/10.1080/02763869.2025.2510448
  6. Ann Bot. 2025 Jun 07. pii: mcaf062. [Epub ahead of print]
       BACKGROUND: Wikidata is a multilingual linked open knowledge base to which anyone can contribute that contains multitudes of botany-related information. Wikidata reveals interactions between entities and connects botany-related information from multiple institutions and other sources, benefiting the botanical community in numerous ways. The aim of this article is to give an overview of Wikidata from a botany perspective and issue a call to action to the botanical community to collectively improve the quantity and quality of information related to botany, botanists, and botanical collections, in Wikidata. Here, we use a broad definition of botany to include the study of many different taxa and specialisations.
    SCOPE: Wikidata contains botany-related data and identifiers for botanists and botanical collectors, botanical taxa, natural history institutions and collections, botany-related publications, geographical locations, research expeditions, as well as genes, genetic variants, chemical compounds, diseases, and more. As an open, collaborative, and community-curated knowledge base, Wikidata enables different communities to add and link data related to botany and empowers the querying and reuse of this data via digital tools such as the Wikidata Query Service, Bionomia, Scholia, TL-2, and Expeditia.
    CONCLUSIONS: Collaboration is key in botany and Wikidata, and the sharing and enriching of botany-related Linked Open Data benefits us all. Several resources, including ethical and legal guidelines, are available for botanists to edit, use, reuse, roundtrip, and teach Wikidata. We call on all botanists to be active participants in Wikidata, improving the quality, quantity, and linking of botany-related data. Our individual and collective actions can help harness the power of Linked Open Data to answer important queries in the field, improve accessibility of herbaria, increase visibility of botanists and their scientific contributions, integrate Wikidata into the classroom, support the Madrid Declaration strategic actions, achieve our collective goals, and ultimately make botany-related information more FAIR and equitable.
    Keywords:  Bionomia; Linked Open Data (LOD); Wikidata; biodiversity data; botanists; collaboration; digital outreach; digital tools; herbaria; identifier; knowledge graph; open science
    DOI:  https://doi.org/10.1093/aob/mcaf062
  7. Assist Technol. 2025 Jun 11. 1-7
      Alternative and augmentative communication (AAC) systems enable interaction by persons with speech impairments, yet access to these devices is limited. In 2019, the Government of Canada introduced "The Accessible Canada Act" to reduce barriers. Availability of information online about AAC systems can reduce barriers to many Canadians who have difficulty attending in-person appointments. While Ontario's Assistive Device Program has been reviewed, other government-funded and charitable organizations across Canada have not been assessed for readability and accessibility. This research aims to evaluate the websites of organizations across Canada that provide AAC technology access, either through equipment loans or financial assistance programs. Forty-three eligible organizations were identified. Web Content Accessibility Guidelines scores (A, AA, and AAA) and four readability scores (Flesch Kincaid Reading Ease, Flesch Kincaid Grade Level, Gunning Fog, and age range) for each website were determined. Thirteen of 43 sites scored below the recommended standard of 75 for WCAG score, and Flesch Kincaid Reading Ease score indicated 86% were more difficult to read than standard recommendations for web content. To enhance equity in AAC device access, online availability of information and forms of government programs and charitable organizations must be easily understood and barrier-free.
    Keywords:  Accessibility; WCAG; augmentative and alternative communication (AAC) technology; equipment loan; funding assistance; readability
    DOI:  https://doi.org/10.1080/10400435.2025.2499621
  8. BMC Med Inform Decis Mak. 2025 Jun 10. 25(Suppl 1): 211
       BACKGROUND: Use of the FAIR principles (Findable, Accessible, Interoperable and Reusable) allows the rapidly growing number of biomedical datasets to be optimally (re)used. An important aspect of the FAIR principles is metadata. The FAIR Data Point specifications and reference implementation have been designed as an example on how to publish metadata according to the FAIR principles. Metadata can be added to a FAIR Data Point with the FDP's web interface or through its API. However, these methods are either limited in scalability or only usable by users with a background in programming. We aim to provide a new tool for populating FDPs with metadata that addresses these limitations with the FAIR Data Point Populator.
    RESULTS: The FAIR Data Point Populator consists of a GitHub workflow together with Excel templates that have tooltips, validation and documentation. The Excel templates are targeted towards non-technical users, and can be used collaboratively in online spreadsheet software. A more technical user then uses the GitHub workflow to read multiple entries in the Excel sheets, and transform it into machine readable metadata. This metadata is then automatically uploaded to a connected FAIR Data Point. We applied the FAIR Data Point Populator on the metadata of two datasets, and a patient registry. We were then able to run a query on the FAIR Data Point Index, in order to retrieve one of the datasets.
    CONCLUSION: The FAIR Data Point Populator addresses the limitations of the other metadata publication methods by allowing the bulk creation of metadata entries while remaining accessible for users without a background in programming. Additionally, it allows efficient collaboration. As a result of this, the barrier of entry for FAIRification is lower, which allows the creation of FAIR data by more people.
    Keywords:  Data sharing; FAIR; FAIR data point; Metadata; RDF; Semantic web
    DOI:  https://doi.org/10.1186/s12911-025-03022-7
  9. J Womens Health (Larchmt). 2025 Jun;34(6): 804-809
      Adolescents face increasing obstacles to abortion services and information. Limited research post-Dobbs suggests that adolescents turn to social media for information, but more research is needed to understand where adolescents expect to find reliable information about abortion and access to care. We conducted in-depth interviews with participants aged 16-19 years from Midwestern states (Illinois, Wisconsin, Indiana, Iowa, Missouri, Michigan, Ohio, and Minnesota) to explore how they engage with abortion information and which sources they trust and prefer. Interviews were conducted via Zoom between April and June 2023 and lasted from 30 to 45 minutes. Interviews were deidentified, transcribed, and coded in Dedoose to identify emergent themes. We interviewed 39 participants from 7 states. Social media emerged as the primary information source for participants, with specific mentions of platforms such as TikTok, Reddit, and YouTube. Many participants expressed skepticism of information on social media and often looked to verify from other sources, including friends or family. If a friend was seeking abortion, most participants emphasized helping them find safe and reliable information online or through trusted sources, like a local Planned Parenthood. In general, participants preferred engaging with reliable, easily accessible abortion online and through social media as well as in school health classes. Adolescents in the Midwestern United States primarily encounter abortion information via social media but rely on more trustworthy sources (e.g., clinics, government websites) for practical information. Trusted organizations and providers supporting youth access to abortion should consider outreach to adolescents via social media and leveraging this trust to direct youth to other validated sources.
    Keywords:  abortion; adolescent; internet; social media
    DOI:  https://doi.org/10.1089/jwh.2024.0563
  10. Interv Pain Med. 2025 Jun;4(2): 100592
       Background: Artificial intelligence (AI) is becoming more integrated into healthcare, with large language models (LLMs) like ChatGPT being widely used by patients to answer medical questions. Given the increasing reliance on AI for health-related information, it's important to evaluate how well these models perform in addressing common patient concerns, especially in procedural medicine. To date, no studies have specifically examined AI's role in addressing patient questions related to epidural steroid injections (ESIs), making this an important area for investigation.
    Objective: This study examines ChatGPT's ability to answer patient questions about epidural steroid injections (ESIs), focusing on response accuracy, readability, and overall usefulness. Our aim was to evaluate and compare the content, accuracy, and user-friendliness of AI-generated information on common peri-procedural questions and complications associated with ESIs, thereby extending the application of AI as a triage tool into pain management and interventional spine procedures.
    Methods: We formulated and compiled 29 common patient questions about ESIs and tested ChatGPT's responses in both general and specific formats. Two interventional pain specialists reviewed the AI-generated answers, assessing them for accuracy, clarity, empathy, and directness using a Likert scale. Readability scores were calculated using Flesch-Kincaid Reading Level and Flesch Reading Ease scales. Statistical analyses were performed to compare general versus specific responses.
    Results: General queries led to longer, more detailed responses, but readability was similar between general and specific formats. Subjective analysis showed that general responses were rated higher for accuracy, clarity, and responsiveness. However, neither format demonstrated strong empathy, and some general queries resulted in off-topic responses, underscoring the importance of precise wording when interacting with AI.
    Conclusion: ChatGPT can provide clear and largely accurate answers to patient questions about ESIs, with general prompts often producing more complete responses. However, AI-generated content still has limitations, particularly in conveying empathy and avoiding tangential information. These findings highlight the need for thoughtful prompt design and further research into how AI can be integrated into clinical workflows while ensuring accuracy and patient safety.
    DOI:  https://doi.org/10.1016/j.inpm.2025.100592
  11. JMIR Perioper Med. 2025 Jun 12. 8 e70047
       Background: Large language models (LLMs) are revolutionizing natural language processing, increasingly applied in clinical settings to enhance preoperative patient education.
    Objective: This study aimed to evaluate the effectiveness and applicability of various LLMs in preoperative patient education by analyzing their responses to superior capsular reconstruction (SCR)-related inquiries.
    Methods: In total, 10 sports medicine clinical experts formulated 11 SCR issues and developed preoperative patient education strategies during a webinar, inputting 12 text commands into Claude-3-Opus (Anthropic), GPT-4-Turbo (OpenAI), and Gemini-1.5-Pro (Google DeepMind). A total of 3 experts assessed the language models' responses for correctness, completeness, logic, potential harm, and overall satisfaction, while preoperative education documents were evaluated using DISCERN questionnaire and Patient Education Materials Assessment Tool instruments, and reviewed by 5 postoperative patients for readability and educational value; readability of all responses was also analyzed using the cntext package and py-readability-metrics.
    Results: Between July 1 and August 17, 2024, sports medicine experts and patients evaluated 33 responses and 3 preoperative patient education documents generated by 3 language models regarding SCR surgery. For the 11 query responses, clinicians rated Gemini significantly higher than Claude in all categories (P<.05) and higher than GPT in completeness, risk avoidance, and overall rating (P<.05). For the 3 educational documents, Gemini's Patient Education Materials Assessment Tool score significantly exceeded Claude's (P=.03), and patients rated Gemini's materials superior in all aspects, with significant differences in educational quality versus Claude (P=.02) and overall satisfaction versus both Claude (P<.01) and GPT (P=.01). GPT had significantly higher readability than Claude on 3 R-based metrics (P<.01). Interrater agreement was high among clinicians and fair among patients.
    Conclusions: Claude-3-Opus, GPT-4-Turbo, and Gemini-1.5-Pro effectively generated readable presurgical education materials but lacked citations and failed to discuss alternative treatments or the risks of forgoing SCR surgery, highlighting the need for expert oversight when using these LLMs in patient education.
    Keywords:  informed consent process; large language models; massive rotator cuff tear; preoperative patient education; superior capsular reconstruction
    DOI:  https://doi.org/10.2196/70047
  12. ANZ J Surg. 2025 Jun 11.
       BACKGROUND: Artificial intelligence-based large language models (AI-based LLMs) have gained popularity over traditional search engines for obtaining medical information. However, the accuracy and reliability of these AI-generated medical insights remain a topic of debate. Recently, a new AI-based LLM, DeepSeek-V3, developed in East Asia, has been introduced. The aim of this study is to evaluate the appropriateness, accuracy, and readability of responses and the usability of these answers for patient education provided by ChatGPT-4o and DeepSeek-V3 AI-based LLMs to frequently asked questions by patients regarding laparoscopic cholecystectomy (LC).
    METHODS: The 20 most frequently asked questions by patients regarding LC were presented to the DeepSeek-V3 and ChatGPT-4o chatbots. Before each question, the search history was deleted. The comprehensiveness of the responses was evaluated based on clinical experience by two board-certified general surgeons experienced in hepatobiliary surgery using a Likert scale. Paired sample t-test and Wilcoxon signed rank test were used. Inter-rater reliability was analyzed with Cohen's Kappa test.
    RESULTS: The DeepSeek-V3 chatbot provided statistically significantly more suitable responses compared to ChatGPT-4o (p = 0.033). On the Likert scale, DeepSeek-V3 received a 5-point rating for 19 out of 20 questions (95%), whereas ChatGPT-4o achieved a 5-point rating for only 13 questions (65%). Based on the evaluation conducted according to the reviewers' clinical experience, DeepSeek-V3 provided statistically significantly more appropriate responses (p = 0.008).
    CONCLUSION: Released in January 2025, DeepSeek-V3 provides more suitable responses to patient inquiries regarding LC compared to ChatGPT-4o.
    Keywords:  ChatGPT‐4o; DeepSeek‐V3; artificial intelligence; laparoscopic cholecystectomy; patient education
    DOI:  https://doi.org/10.1111/ans.70198
  13. Oncology. 2025 Jun 10. 1-15
       BACKGROUND/OBJECTIVES: This study aimed to evaluate AI-based chatbots (GPT, DeepSeek, Copilot, Gemini) in disseminating information on liver cancer, emphasizing content quality, adherence to established guidelines, and ease of comprehension.
    METHODS: Between January and February 2025, four chatbots were examined us-ing publicly accessible free versions lacking independent reasoning capabilities. Three frequently searched Google Trends questions ("What is liver cancer awareness?", "What are the symptoms of liver cancer?", and "Is liver cancer treatable?") were posed. Their responses were assessed via the DISCERN instrument, Cole-man-Liau Index, Patient Education Materials Assessment Tool for Print, and alignment with American Asso-ciation for the Study of Liver Diseases, National Comprehensive Cancer Network, and European Society for Medical Oncology recommendations. Statistical analysis was performed using SPSS 22.
    RESULTS: All chatbots largely provided relevant and impartial information. GPT and DeepSeek scored lower on specifying infor-mation sources and update timelines, whereas Copilot omitted local therapies (e.g., Radiofrequency Ablation, Transarterial Chemoembolization, Transarterial Radioembolization), resulting in reduced scientific accuracy. Gemini and Copilot performed better in "Understandability," while GPT and DeepSeek excelled in "Actiona-bility." Although GPT demonstrated consistency across multiple treatment options, it did not explicitly refer-ence international guidelines. Study limitations included language constraints, variations in chatbot updates, and reliance on a single inquiry round.
    CONCLUSIONS: AI chatbots show potential as initial informational tools for liver cancer but cannot replace professional medical consultation. In complex diseases requiring multidis-ciplinary management, frequent guideline-based updates, expert validation, and diverse data sources are critical to enhancing clinical relevance and patient outcomes.
    Keywords:  Artificial Intelligence; Chatbots Oncology.; Clinical Decision Support; Liver Cancer
    DOI:  https://doi.org/10.1159/000546726
  14. J Craniofac Surg. 2025 Jun 09.
      Mandibular distraction osteogenesis (MDO) is a craniofacial procedure frequently performed in pediatric patients with micrognathia airway obstruction. Preoperative and postoperative counseling for families undergoing this procedure is essential involving a multistage surgical course, device management, feeding changes, and airway considerations. This study evaluates the trustworthiness and readability of AI (artificial intelligence) chatbot responses to questions about operative care for MDO. Study was conducted using ChatGPT, Google Gemini, Microsoft Copilot, and Open Evidence. Twenty common preoperative and postoperative care questions relating to MDO were developed. The authors used a modified DISCERN tool to assess quality and the SMOG (Simple Measure of Gobbledygook) test to evaluate response readability. Data underwent statistical analysis using descriptive statistics, 1-way ANOVA, and Tukey HSD. Modified DISCERN analysis revealed clear aims and relevancy scored the highest (mean=4.92 SD=0.31; mean=4.64, SD=0.62). Additional sources provided and citation of sources had the lowest means (mean=2.19 SD=1.52; mean=2.93 SD=1.96). Microsoft Copilot scored the highest in overall quality (mean=38.10 versus ChatGPT=29.90, P<0.001). Open Evidence scored lowest in shared decision-making (mean=1.80 SD=1.10). Effect sizes were large for source-related variables, with eta-squared values >0.75. Significant differences in readability across all AI models were found (mean=17.31 SD=3.59, P<0.001), indicating the average response was at a graduate school reading level. Open Evidence (mean=22.24) produced higher SMOG reading scores than ChatGPT (mean=15.89), Google Gemini (mean=15.66), and Microsoft Copilot (mean=15.44) (P<0.001). These findings highlight a need for reviewing the reliability of AI chatbots in preoperative and postoperative counseling for MDO.
    Keywords:  Artificial intelligence; chatbots; mandibular distraction osteogenesis; patient counseling; readability assessment
    DOI:  https://doi.org/10.1097/SCS.0000000000011543
  15. Psychiatr Q. 2025 Jun 12.
      The current study aimed to evaluate the quality, usefulness, and reliability of three large language models (LLMs)-ChatGPT-4, DeepSeek, and Gemini-in answering general questions about specific learning disorders (SLDs), specifically dyslexia and dyscalculia. For each learning disorder subtype, 15 questions were developed through expert review of social media, forums, and professional input. Responses from the LLMs were evaluated using the Global Quality Scale (GQS) and a seven-point Likert scale to assess usefulness and reliability. Statistical analyses were conducted to compare model performance, including descriptive statistics and one-way ANOVA. Results revealed no statistically significant differences in quality or usefulness across models for both disorders. However, ChatGPT-4 demonstrated superior reliability for dyscalculia (p < 0.05), outperforming Gemini and DeepSeek. For dyslexia, DeepSeek achieved 100% maximum reliability scores, while GPT-4 and Gemini scored 60%. All models provided high-quality responses, with mean GQS scores ranging from 4.20 to 4.60 for dyslexia and 3.93 to 4.53 for dyscalculia, although variability existed in their practical utility. While LLMs show promise in delivering dyslexia and dyscalculia-related information, GPT-4's reliability for dyscalculia highlights its potential as a supplementary educational tool. Further validation by professionals remains critical.
    Keywords:  Artificial intelligence; Dyslexia; Learning disabilities; Psychiatric diagnosis
    DOI:  https://doi.org/10.1007/s11126-025-10170-6
  16. Int Ophthalmol. 2025 Jun 12. 45(1): 244
       PURPOSE: Artificial Intelligence (AI) is rapidly advancing and profoundly influencing healthcare, offering the potential to revolutionize access to medical information. As medical misinformation proliferates and online searches for health-related advice increase, there is an escalating need for dependable patient information. This study evaluates the effectiveness of an AI chatbot in delivering information for blepharoplasty candidates.
    MATERIALS AND METHODS: Numerous frequently asked questions on blepharoplasty sourced from the ASPS website were asked to ChatGPT. The responses were then rigorously cross-referenced with relevant scholarly literature and meticulously reviewed by the research team to determine their accuracy. Additionally, the questions were classified into three categories, and the responses were evaluated using Flesch-Kincaid readability metrics, along with ANOVA and trend analysis tests.
    RESULTS: Despite minor variations, ChatGPT's responses to blepharoplasty FAQs largely aligned with current literature. Overall readability analysis showed a Flesch Reading Ease score of 31.48, indicating high school complexity with a Flesch-Kincaid Grade Level at 10.92, and 20.97% use of passive voice. The ANOVA results showed that there were no significant differences in readability between the categories (p-values: Flesch Reading Ease = 0.816, Flesch-Kincaid = 0.616, Passive Sentences = 0.115). Trend analysis also showed that the level of response complexity stayed the same across the questions.
    CONCLUSION: ChatGPT is an evolving tool that holds potential for patients in accessing and comprehending medical information. While it can offer accurate insights, it's important to recognize that its answers might not consistently reach complete accuracy or be suitable for patients with varying educational levels.
    Keywords:  ASPS; Artificial intelligence; Blepharoplasty; Database; Knowledge base; Medical information systems
    DOI:  https://doi.org/10.1007/s10792-025-03611-5
  17. J Orthop Surg (Hong Kong). 2025 May-Aug;33(2):33(2): 10225536251350411
      IntroductionUnicondylar knee arthroplasty (UKA) is a minimally invasive surgical technique that replaces a specific compartment of the knee joint. Patients increasingly rely on digital tools such as Google and ChatGPT for healthcare information. This study aims to compare the accuracy, reliability, and applicability of the information provided by these two platforms regarding unicondylar knee arthroplasty.Materials and MethodsThis study was conducted using a descriptive and comparative content analysis approach. 12 frequently asked questions regarding unicondylar knee arthroplasty were identified through Google's "People Also Ask" section and then directed to ChatGPT-4. The responses were compared based on scientific accuracy, level of detail, source reliability, applicability, and consistency. Readability analysis was conducted using DISCERN, FKGL, SMOG, and FRES scores.ResultsA total of 83.3% of ChatGPT's responses were found to be consistent with academic sources, whereas this rate was 58.3% for Google. ChatGPT's answers of 142.8 words, compared to Google's 85.6-word average. Regarding source reliability, 66.7% of ChatGPT's responses were based on academic guidelines, whereas Google's percentage was 41.7%. The DISCERN score for ChatGPT was 64.4, whereas Google's was 48.7. Google had a higher FRES score.ConclusionChatGPT provides more scientifically accurate information than Google, while Google offers simpler and more comprehensible content. However, the academic language used by ChatGPT may be challenging for some patient groups, whereas Google's superficial information is a significant limitation. In the future, the development of Artificial Intelligence-based medical information tools could be beneficial in improving patient safety and the quality of information dissemination.
    Keywords:  artificial intelligence; chatgpt; google; unicondylar knee arthroplasty
    DOI:  https://doi.org/10.1177/10225536251350411
  18. Healthcare (Basel). 2025 May 27. pii: 1271. [Epub ahead of print]13(11):
       BACKGROUND/OBJECTIVES: Large-language modules facilitate accessing health information instantaneously. However, they do not provide the same level of accuracy or detail. In pediatric orthopedics, where parents have urgent concerns regarding knee deformities (bowlegs and knock knees), the accuracy and dependability of these chatbots can affect parent decisions to seek treatment. The goal of this study was to analyze how AI chatbots addressed parental concerns regarding pediatric knee deformities.
    METHODS: A set of twenty standardized questions, consisting of ten questions each on bowlegs and knock knees, were designed through literature reviews and through analysis of parental discussion forums and expert consultations. Each of the three chatbots (ChatGPT, Gemini, and Copilot) was asked the same set of questions. Five pediatric orthopedic surgeons were then asked to rate each response for accuracy, clarity, and comprehensiveness, along with the degree of misleading information provided, on a scale of 1-5. The reliability among raters was calculated using intraclass correlation coefficients (ICCs), while differences among the chatbots were assessed using a Kruskal-Wallis test with post hoc pairwise comparisons.
    RESULTS: All three chatbots displayed a moderate-to-good score for inter-rater reliability. ChatGPT and Gemini's scores were higher for accuracy and comprehensiveness than Copilot's (p < 0.05). However, no notable differences were found in clarity or in the likelihood of giving incorrect answers. Overall, more detailed and precise responses were given by ChatGPT and Gemini, while, with regard to clarity, Copilot performed comparably but was less thorough.
    CONCLUSIONS: There were notable discrepancies in performance across the AI chatbots in providing pediatric orthopedic information, which demonstrates indications of evolving potential. In comparison to Copilot, ChatGPT and Gemini were relatively more accurate and comprehensive. These results highlight the persistent requirement for real-time supervision and stringent validation when employing chatbots in the context of pediatric healthcare.
    Keywords:  AI chatbots; health information accuracy; knee deformities; parental concerns
    DOI:  https://doi.org/10.3390/healthcare13111271
  19. Healthcare (Basel). 2025 Jun 05. pii: 1344. [Epub ahead of print]13(11):
      Background and Objectives: Artificial intelligence (AI) chatbots are increasingly employed for the dissemination of health information; however, apprehensions regarding their accuracy and reliability remain. The intricacy of sarcoidosis may lead to misinformation and omissions that affect patient comprehension. This study assessed the usability of AI-generated information on sarcoidosis by evaluating the quality, reliability, readability, understandability, and actionability of chatbot responses to patient-centered queries. Methods: This cross-sectional evaluation included 11 AI chatbots comprising both general-purpose and retrieval-augmented tools. Four sarcoidosis-related queries derived from Google Trends were submitted to each chatbot under standardized conditions. Responses were independently evaluated by four blinded pulmonology experts using DISCERN, the Patient Education Materials Assessment Tool-Printable (PEMAT-P), and Flesch-Kincaid readability metrics. A Web Resource Rating (WRR) score was also calculated. Inter-rater reliability was assessed using intraclass correlation coefficients (ICCs). Results: Retrieval-augmented models such as ChatGPT-4o Deep Research, Perplexity Research, and Grok3 Deep Search outperformed general-purpose chatbots across the DISCERN, PEMAT-P, and WRR metrics. However, these high-performing models also produced text at significantly higher reading levels (Flesch-Kincaid Grade Level > 16), reducing accessibility. Actionability scores were consistently lower than understandability scores across all models. The ICCs exceeded 0.80 for all evaluation domains, indicating excellent inter-rater reliability. Conclusions: Although some AI chatbots can generate accurate and well-structured responses to sarcoidosis-related questions, their limited readability and low actionability present barriers for effective patient education. Optimization strategies, such as prompt refinement, health literacy adaptation, and domain-specific model development, are required to improve the utility of AI chatbots in complex disease communication.
    Keywords:  AI chatbots; health information quality; patient education; readability; sarcoidosis
    DOI:  https://doi.org/10.3390/healthcare13111344
  20. Knee. 2025 Jun 05. pii: S0968-0160(25)00124-3. [Epub ahead of print]56 249-257
       PURPOSE: To examine ChatGPT's effectiveness in responding to common patient questions related to meniscus surgery, including procedures such as meniscus repair and meniscectomy.
    METHODS: We identified 20 frequently asked questions (FAQs) about meniscus surgery from major orthopedic institutions recommended by ChatGPT, which were then refined by two authors into 10 questions commonly encountered in the outpatient setting. These questions were posted to ChatGPT. Answers were evaluated using a scoring system to assess accuracy and clarity and were rated as "excellent answer requires no clarification," "satisfactory requires minimal clarification," "satisfactory requires moderate clarification," or "unsatisfactory requires substantial clarification."
    RESULTS: Four responses were excellent, requiring no clarification, four responses were satisfactory, requiring minimal clarification, two were satisfactory, requiring moderate clarification, none of the answers were unsatisfactory.
    CONCLUSION: As hypothesized, ChatGPT provides satisfactory and reliable information for frequently asked questions about meniscus surgery.
    Keywords:  Arthroscopy; Artificial intelligence; ChatGPT; Meniscus
    DOI:  https://doi.org/10.1016/j.knee.2025.05.018
  21. NPJ Digit Med. 2025 Jun 09. 8(1): 343
      Large language models (LLMs) are used to seek health information. Guidelines for evidence-based health communication require the presentation of the best available evidence to support informed decision-making. We investigate the prompt-dependent guideline compliance of LLMs and evaluate a minimal behavioural intervention for boosting laypeople's prompting. Study 1 systematically varied prompt informedness, topic, and LLMs to evaluate compliance. Study 2 randomized 300 participants to three LLMs under standard or boosted prompting conditions. Blinded raters assessed LLM response with two instruments. Study 1 found that LLMs failed evidence-based health communication standards. The quality of responses was found to be contingent upon prompt informedness. Study 2 revealed that laypeople frequently generated poor-quality responses. The simple boost improved response quality, though it remained below required standards. These findings underscore the inadequacy of LLMs as a standalone health communication tool. Integrating LLMs with evidence-based frameworks, enhancing their reasoning and interfaces, and teaching prompting are essential. Study Registration: German Clinical Trials Register (DRKS) (Reg. No.: DRKS00035228, registered on 15 October 2024).
    DOI:  https://doi.org/10.1038/s41746-025-01752-6
  22. Dent Traumatol. 2025 Jun 09.
       AIM: The aim of this study was to evaluate the performance of ChatGPT-4o and Gemini Advanced artificial intelligence-based chatbots (AI-based chatbots) in providing emergency intervention recommendations for dental trauma with intraoral photographs of patients diagnosed with traumatic dental injuries, and to assess their compatibility with emergency intervention recommendations in the ToothSOS application.
    MATERIAL AND METHODS: In this study, 80 intraoral photographs obtained from patients presenting with dental trauma were uploaded to two different AI-based chatbots (ChatGPT-4o and Gemini Advanced) and the responses generated by these systems were evaluated by four paediatric dentists. The evaluators scored the responses with a Modified Global Quality Score (GQS), referring to the English instructions of the ToothSOS application. In order to analyse the reliability of the responses, a total of three evaluation sessions were conducted 1 week apart.
    RESULTS: The ChatGPT-4o performed better when all injury types were considered together (p = 0.012). It was found that the ChatGPT-4o performed much better in complicated crown fracture cases (p = 0.004) and the Gemini Advanced chatbot performed much better in critical dental injuries such as avulsion (p < 0.001).
    CONCLUSIONS: AI-based chatbots can be a helpful tool in the assessment of dental trauma. However, further development and expert validation are needed to improve their accuracy and consistency, especially in complex cases. Incorporating the International Association of Dental Traumatology (IADT) guidelines into the databases of these systems could improve the reliability of their recommendations. In addition, given the widespread use of AI-based chatbots in many fields, particularly health, they could contribute to public health by supporting access to accurate information.
    Keywords:  ChatGPT; Gemini; artificial intelligence; dental trauma; dentistry
    DOI:  https://doi.org/10.1111/edt.13078
  23. JMIR Infodemiology. 2025 Jun 11. 5 e66416
       Background: There is breast cancer-related medical information on social media, but there is no established method for objectively evaluating the quality of this information. Principles for Health-Related Information on Social Media (PRHISM) is a newly developed tool for objectively assessing the quality of health-related information on social media; however, there have been no reports evaluating its reliability and validity.
    Objective: The purpose of this study was to statistically examine the reliability and validity of PRHISM using videos about breast cancer treatment on YouTube (Google).
    Methods: In total, 60 YouTube videos were selected on January 5, 2024, with the Japanese words for "breast cancer," "treatment," and "chemotherapy," and assessed by 6 Japanese physicians with expertise in breast cancer. These evaluators independently evaluated the videos using PRHISM and an established tool for assessing the quality of health-related information, DISCERN, as well as through subjective assessments. We calculated interrater and intrarater agreement among evaluators with CIs, measuring agreement using weighted Cohen kappa.
    Results: The interrater agreement for PRHISM overall quality was κ=0.52 (90% CI 0.49-0.55), indicating that the expected level of agreement, statistically defined by the lower limit of the 90% CI exceeding 0.53, was not achieved. However, PRHISM demonstrated higher agreement compared with DISCERN overall quality, which had a κ=0.45 (90% CI 0.41-0.48). In terms of validity, the intrarater agreement between PRHISM and subjective assessments by breast experts was κ=0.37 (95% CI 0.14-0.60), while DISCERN showed an agreement of κ=0.27 (95% CI 0.07-0.48), indicating fair agreement and no significant difference in validity.
    Conclusions: PRHISM has demonstrated sufficient reliability and validity for evaluating the quality of health-related information on YouTube, making it a promising new metric. To further enhance objectivity, it is necessary to explore the use of artificial intelligence and other approaches.
    Keywords:  Japan; PRHISM; Principles for Health-Related Information on Social Media ; YouTube; breast cancer treatment; cancer treatment; information quality; instrument validation study; medical information; online health information; reliability; social media; validity; videos
    DOI:  https://doi.org/10.2196/66416
  24. Health Sci Rep. 2025 Jun;8(6): e70880
       Background and Aims: Nowadays, the social media video-sharing website YouTube is globally accessible and used for sharing news and information. It also serves as a tool for migraine sufferers seeking guidance about Daith piercing (DP) as a potential migraine treatment; however, shared and disseminated video content is rarely regulated and does not follow evidence-based medicine. This study aims to investigate the content, quality, and reliability of YouTube videos on DP for the treatment of migraine.
    Methods: YouTube videos were systematically searched from the video portal inception until 17th January 2024. "Daith piercing" AND "migraine" were the applied search terms. The primary outcome of interest was assessing the Global Quality Scale (GQS) and DISCERN to evaluate each video blog's quality, flow, and reliability. Secondary outcomes included the relapse time of migraine after DP, and further outcomes related to DP.
    Results: In the final analysis, 246 videos were included (N = 69 categorized as Personal Experience; N = 176 as Others, defined as videos from bloggers, piercers, or other persons; and N = 1 as Healthcare Professionals). The GQS rating in the category Personal Experience revealed that the quality of 50.7% of videos was very poor; 29.0% poor; 11.6% moderate, and 8.7% good. In the category Others, GQS rating showed that the quality of 60.8% of videos was very poor; 25.6% poor; 11.9% moderate, and 1.7% good. The one video in the category Healthcare Professionals was rated "poor quality". Ratings applying the DISCERN tool were comparable. Overall, 111 (45.1%) videos recommended and 14 (5.7%) discouraged DP for migraine relief.
    Conclusion: Based on the GQS and DISCERN scores, the information, usefulness, and accuracy of most YouTube content on DP for migraine treatment are generally of poor quality and reliability. The lack of high-quality and reliable videos might expose users to potentially misleading information and dissemination of unproven medical interventions.
    Keywords:  YouTube; acupuncture; complementary and alternative medicine; headache disorders; migraine piercing; pain
    DOI:  https://doi.org/10.1002/hsr2.70880
  25. J Am Acad Orthop Surg Glob Res Rev. 2025 Jun 01. 9(6):
       INTRODUCTION: With the rise of social media as a source for health information, there is concern about the spread of unregulated, potentially misleading content. This study aimed to evaluate the quality of knee osteoarthritis (OA) treatment information on TikTok, YouTube, and Instagram platforms where patients often seek medical advice.
    METHODS: TikTok videos, Instagram posts, and YouTube videos focusing on knee OA treatment and meeting specific engagement thresholds were identified using a standardized search. Six reviewers, including orthopaedic faculty and residents, assessed the content's accuracy and reliability using a 10-question Social Media Outreach Content Assessment & Review Tool (SOCART), adapted from the DISCERN instrument. Data were analyzed using analysis of variance, linear regression, and mixed methods.
    RESULTS: The study reviewed 130 social media posts (YouTube: 30, TikTok: 50, Instagram: 50). YouTube had the highest median number of followers/subscribers, whereas TikTok had the most likes/day and comments/day. Most TikTok (66.7%) and Instagram (92.0%) content creators were from private practices, whereas YouTube creators were mainly affiliated with academic institutions (40.0%). YouTube scored the highest in SOCART assessments (32.86 ± 0.89/50), markedly outperforming Instagram (21.30 ± 0.69/50) and TikTok (20.34 ± 0.87/50; P < 0.001). Content from academic institutions scored higher than that from nonacademic sources (28.04 ± 1.05 vs. 21.77 ± 0.859, P = 0.014).
    CONCLUSION: YouTube's high ratings in all SOCART instrument categories suggest that it presents higher-quality information about knee OA treatments relative to Instagram and TikTok. However, YouTube content was still found to be inaccurate and unreliable, making it unsatisfactory for dissemination of important health information. In addition, despite having the lowest SOCART scores, TikTok received the most engagement. This study highlights two important findings: social media presents a risk for patient misinformation when seeking medical advice, and it creates opportunities for physicians to connect with patients using platforms with higher user engagement. Physicians and medical societies can use this information during educational content creation to inform platform choice and dissemination strategies.
    DOI:  https://doi.org/e24.00335
  26. JMIR Hum Factors. 2025 Jun 13. 12 e60628
       BACKGROUND: Anxiety and depression symptoms have been rising among college students, with many increasingly meeting the criteria for 1 or more mental health problems. Due to a rise in internet access and lockdown restrictions associated with the COVID-19 pandemic, online mediums, such as teletherapy, repositories for mental health information, discussion forums, self-help programs, and online screening tools, have become more popular and used by college students to support their mental health. However, there is limited information about individual-level factors that lead college students to use these online tools to support their mental health.
    OBJECTIVE: This mixed methods study aimed to examine the associations between demographics, symptom severity, mental health literacy, stigma, attitudes, and self-efficacy and the use of online tools to seek psychological information and services among racially and ethnically diverse college students. This study also aimed to qualitatively characterize types of online tools used, reasons for using tools or lack thereof, and perceived helpfulness of tools.
    METHODS: Undergraduate students (N=123) completed validated measures and provided open-ended descriptions of the types of online tools they used to seek psychological information and services and their reasons for using those tools. Logistic regression analyses were used to test associations of online tool use to seek mental health information and hypothesized predictors. Descriptive statistics were conducted to examine online tool types, reasons for using online tools, and helpfulness explanations.
    RESULTS: In total, 49.6% (61/123) of the participants used online tools (eg, search engines) to seek mental health information, while 30.1% (37/123) used online tools (eg, medical websites) to seek mental health services. Mental health literacy (P=.002; odds ratio 1.14, 95% CI 1.05-1.24) was associated with greater use of online tools to seek mental health information. None of the hypothesized variables predicted online tool use to seek mental health services. In total, 82% (50/61) of participants who sought information found online tools somewhat helpful, while 49% (18/37) of participants who sought services found online tools very helpful. Of the students who did not use online tools to seek information, 19% (12/62) reported it was because they did not know which online tools to use and 31% (19/62) stated they would be encouraged to use online tools if it was recommended by professionals, therapists, family, or friends. Of the students who did not use online tools to seek services, 33% (28/86) reported it was because they did not think mental health help was necessary.
    CONCLUSIONS: These findings highlight the use of online tools to provide mental health information and connect to professional services, suggesting that online tools are widely used to access mental health support.
    Keywords:  attitude; college students; diverse; experience; help-seeking behavior; literacy; mental health; online information; online tool; perspective; qualitative; self-efficacy; survey
    DOI:  https://doi.org/10.2196/60628