bims-librar Biomed News
on Biomedical librarianship
Issue of 2026–01–18
twenty-two papers selected by
Thomas Krichel, Open Library Society



  1. Med Ref Serv Q. 2026 Jan 14. 1-12
      Examining the strategic selection and utilization of library resources by medical students could enhance library services for this demographic. A narrative case study focused on a medical student who served as a medical library assistant for three years. Analyses were directed by particular questions. Discussions address the effectiveness of library resources in fulfilling medical students' information needs, potential areas for enhancement, and directions for future research. Libraries are essential for addressing the information needs of medical students. This case study demonstrates methods to enhance library services in response to the evolving educational needs of future physicians.
    Keywords:  Academic medical librarians; healthcare students; information management; knowledge management; library case study; library collections; library resources; medical libraries; medical library assistant; medical students
    DOI:  https://doi.org/10.1080/02763869.2025.2599823
  2. Health Info Libr J. 2026 Jan 14.
      This editorial explains the history of the Health Information and Libraries Journal from 1984 to 2025. Since its first issue, the Health Information and Libraries Journal has published over 1400 manuscripts, from reviews and original articles to editorials, brief communications, regular features, and obituaries of key members of the health library sector with links to the journal. The contributions of its four Editor-in-Chiefs are celebrated: Shane Godbolt (1984-1994), Judy Palmer (1999-2002), Graham Walton (2003-2008), and Maria J. Grant (2009-2025).
    Keywords:  librarians, clinical; librarians, embedded; librarians, health science; librarians, international; librarians, medical
    DOI:  https://doi.org/10.1111/hir.70009
  3. PLoS One. 2026 ;21(1): e0341307
      With the rapid development of digitalization, university libraries find themselves under great pressure in adapting to data management, whereas traditional management has limitations in meeting personalized needs for information. This study constructs a knowledge graph-based intelligent data management and information innovation service model for university library systems, which adopts a hierarchical design philosophy encompassing five core layers: data source layer, data processing layer, knowledge construction layer, service application layer, and user interaction layer. By integrating multi-source heterogeneous data resources and establishing a unified knowledge representation framework, the model facilitates semantic organization as well as automatic management of library information. The model employs dynamic fusion methods combining large language models and graph embedding to address heterogeneous data integration challenges, while leveraging knowledge graph semantic association capabilities to provide precise personalized information recommendation services. A systematic evaluation conducted for a period of six months shows that score for an user experience is 4.40, pointing to an improvement of 45.2% from 3.03, with accuracy in search results increasing by 41.1%, as well as enhancement of service quality and learning effectiveness by 32.5% and 41.7% respectively, and all 16 technical indexes having met and exceeded set standards. This study proposes a realistic solution to help university libraries deal with challenges brought by big data, which also facilitates intelligent service transformation for university libraries.
    DOI:  https://doi.org/10.1371/journal.pone.0341307
  4. Acad Med. 2025 Dec 03. pii: wvaf008. [Epub ahead of print]
       PROBLEM: Medical education scholars struggle to join ongoing conversations in their field due to the lack of a dedicated medical education corpus. Without such a corpus, scholars must search too widely across thousands of irrelevant journals or too narrowly by relying on PubMed's Medical Subject Headings (MeSH). In tests conducted for this study, MeSH missed 34% of medical education articles.
    APPROACH: From January to December 2024, the authors developed the Medical Education Corpus (MEC), the first dedicated collection of medical education articles, through a 3-step process. First, using the core-periphery model, they created the Medical Education Journals (MEJ), a collection of 2 groups of journals based on participation and influence in medical education discourse: the MEJ-Core (formerly the MEJ-24, 24 journals) and the MEJ-Adjacent (127 journals). Second, they developed and evaluated a machine learning model, the MEC Classifier, trained on 4,032 manually labeled articles to identify medical education content. Third, they applied the MEC Classifier to extract medical education articles from the MEJ-Core and MEJ-Adjacent journals.
    OUTCOMES: As of December 2024, the MEC contained 119,137 medical education articles from the MEJ-Core (54,927 articles) and MEJ-Adjacent journals (64,210 articles). In an evaluation using 1,358 test articles, the MEC Classifier demonstrated significantly improved sensitivity compared with MeSH (90% vs 66%, P = .001), while maintaining a similar positive predictive value (82% vs 81%).
    NEXT STEPS: The MEC provides a focused corpus that enables medical education scholars to more easily join conversations in the field. Scholars can rely on the MEC when reviewing literature to frame their work, and the MEC also creates opportunities for field-wide analyses and meta-research. The core methodology also underlies the MedEdMentor Paper Database (mededmentor.org), a separately maintained online tool that complements the versioned MEC snapshot with a web-based search interface.Teaser text: Medical education scholars often struggle to effectively "join the conversation" because relevant literature is buried within biomedical databases like PubMed or general academic search engines like Google Scholar. This article introduces the Medical Education Corpus (MEC), a dedicated collection of 119,137 medical education articles curated using a specialized machine-learning classifier. In head-to-head testing, the MEC significantly outperformed PubMed's MeSH terms, capturing 90% of medical education articles compared with MeSH's 66%. By assembling these articles into a single, focused dataset, the MEC allows scholars to more easily find the literature they need to frame their work. The core methodology also underlies MedEdMentor, a separately maintained online tool that makes these optimized searches accessible to the wider medical education community.
    Keywords:  bibliometrics; information retrieval; machine learning; medical education; scholarship
    DOI:  https://doi.org/10.1093/acamed/wvaf008
  5. Account Res. 2026 Jan 14. 2614062
       PURPOSE/SIGNIFICANCE: This study investigates the awareness, perceptions, and responses of library and information science (LIS) researchers toward retracted papers, aiming to inform the improvement of research integrity governance.
    METHOD/PROCESS: A questionnaire survey of 280 LIS researchers examined their sources of retraction information, understanding of causes, perceived consequences, and attitudes toward evaluation. The influence of academic background, publication volume, and discipline was also explored.
    RESULT/CONCLUSION: Findings indicate generally low retraction awareness and a primary reliance on informal channels. Critically, the analysis reveals several nuanced patterns: (1) Significant disciplinary differences exist in perceiving retraction causes; (2) Opinions are sharply divided on including retraction records in research evaluation, reflecting concerns about uniform responsibility attribution; (3) A considerable proportion of researchers mistakenly view retraction's impact as reversible. These attitudes are strongly associated with educational background and publication experience. In response, this paper proposes five key recommendations: establishing authoritative retraction platforms, improving journal retraction mechanisms, differentiating retraction types in evaluation, strengthening integrity education, and building a coordinated governance framework. These measures contribute to fostering a more transparent, fair, and sustainable scholarly correction ecosystem.
    Keywords:  Retraction; evaluation; institutional governance; research integrity; scientific research
    DOI:  https://doi.org/10.1080/08989621.2026.2614062
  6. Orthop J Sports Med. 2026 Jan;14(1): 23259671251402988
       Background: Artificial intelligence (AI) chatbots are increasingly used for medical information provision. However, systematic evaluations of their accuracy and reliability in orthopaedic surgery, particularly in total knee replacement (TKR), remain limited.
    Purpose: To systematically compare and evaluate performances of various AI chatbots, focusing on their ability to provide accurate and reliable information related to TKR.
    Study Design: Cohort study; Level of evidence, 2.
    Methods: A total of 43 clinically relevant TKR-related frequently asked questions (FAQs) were selected based on Google search trends and expert consultation. Questions were categorized into 6 key domains: (1) general/procedure-related information, (2) indications and outcomes, (3) risks and complications, (4) pain and postoperative recovery, (5) specific activities after surgery, and (6) alternatives and variations. Each question was submitted to 5 different chatbot models (GPT-3.5, GPT-4, GPT-4 Omni, Gemini Advanced, and Gemini 1.5) for response generation. Two independent orthopaedic surgeons assessed the chatbot's responses for both accuracy and relevance using a 5-point Likert scale. Responses were anonymized, blinding evaluators to the chatbot identities to prevent bias. Accuracy differences among the chatbot models were analyzed by analysis of variance, and relevance was compared using the Kruskal-Wallis test.
    Results: GPT-3.5 (4.8 ± 0.5), GPT-4 (4.9 ± 0.4), GPT-4 Omni (4.9 ± 0.3), and Gemini 1.5 (4.8 ± 0.4) demonstrated high accuracy, whereas Gemini Advanced scored significantly lower (4.1 ± 1.4) (P < .001). However, general/procedure-related information, risks and complications, pain and recovery, and postoperative activities showed no significant differences among chatbots. Gemini Advanced underperformed in indications and outcomes (P = .04) and alternatives and variations (P = .002). Regarding relevance, all chatbots except Gemini Advanced (36/43; 83.7%) achieved a 100% relevance rate (P < .001).
    Conclusion: This study demonstrates that GPT-3.5, GPT-4, GPT-4 Omni, and Gemini 1.5 can provide highly accurate and relevant responses to TKR-related queries, while Gemini Advanced underperforms.
    Keywords:  ChatGPT; Gemini; artificial intelligence; chatbot; total knee replacement
    DOI:  https://doi.org/10.1177/23259671251402988
  7. Cureus. 2025 Dec;17(12): e99286
       INTRODUCTION: Accurate and up-to-date educational resources are crucial for medical professionals to deliver effective patient care, particularly in conditions like pediatric asthma, which has a high disease burden in children. Timely interventions are essential to manage this condition appropriately and to ensure better outcomes. With the rapid advancement of artificial intelligence in healthcare, AI tools like Google Gemini are being explored as quick and accessible alternatives for generating medical content.  Methods: A cross-sectional observational study was conducted to focus on four core topics related to the management of pediatric asthma. Prompts for each of the core topics were entered in Google Gemini and UpToDate to generate responses. The WebFx Readability Tool was used to assess readability utilizing metrics such as Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), SMOG Index, word count, sentence count, words per sentence, difficult word count, and percentage. The collected data were analyzed using the Mann-Whitney U test, and a p-value of < 0.05 was considered statistically significant.
    RESULTS: When comparing the readability characteristics between UpToDate and Google Gemini, statistically significant differences were found, indicating that Google Gemini is more accessible for individuals with lower literacy skills. UpToDate received higher scores on the Simple Measure of Gobbledygook (SMOG) index across all four core topics, denoting it as hard to understand for the normal population. Google Gemini scored a greater difficulty word percentage across all four topics.
    CONCLUSION: Google Gemini was found to use more complex vocabulary while still maintaining overall accessibility, making it appropriate for patients with lower literacy levels. Although certain readability parameters demonstrated Google Gemini to be a more reader-friendly tool for assessing and understanding medical content, the high percentage of difficult words may make it more challenging for younger individuals and lower socio-economic populations to access.
    Keywords:  artificial intelligence; asthma; clinical decision support; educational content; google gemini; medical education; uptodate
    DOI:  https://doi.org/10.7759/cureus.99286
  8. Ann Card Anaesth. 2026 Jan 01. 29(1): 81-88
       INTRODUCTION: Patient education significantly improves outcomes, especially in high-risk procedures. However, traditional educational resources often fail to address patient literacy and emotional needs adequately. Large language models like ChatGPT (OpenAI) and Gemini (Google) offer promising alternatives, potentially enhancing both accessibility and comprehensibility of procedural information. This study evaluates and compares the effectiveness of ChatGPT and Gemini in generating accurate, readable, and clinically relevant patient education materials (PEMs) for pulmonary artery catheter insertion.
    METHODOLOGY: A comparative, single-blinded study was conducted using structured validation methods using a common prompt for both gen artificial intelligence (AI) chatbots. AI-generated PEMs were assessed by board-certified anesthesiologists and intensivists. Face validity was determined using a 5-point Likert scale evaluating appropriateness, clarity, relevance, and trustworthiness. Content validity was measured by calculating content validity index. Accuracy and completeness were evaluated by a separate expert panel using a 10-point Likert scale. Readability and sentiment analysis were assessed via automated online tools.
    RESULTS: Both chatbots achieved robust face and content validity (S-CVI = 0.91). ChatGPT scored significantly higher on accuracy [9.00 vs. 8.00; P = 0.021] and perceived trustworthiness, while Gemini outperformed in readability (Flesch Reading Ease score: 65 vs. 54; Flesch-Kincaid Grade Level: 7.58 vs. 8.64) and clarity. Both outputs maintained a neutral emotional tone.
    CONCLUSION: AI chatbots show promise as innovative tools for patient education. By leveraging the strengths of both AI-driven technologies and human expertise, healthcare providers can enhance patient education and empower individuals to make informed decisions about their health and medical care involving complex clinical procedures.
    Keywords:  Face validity; generative artificial intelligence; patient education; pulmonary arteries; readability; sentiment analysis
    DOI:  https://doi.org/10.4103/aca.aca_145_25
  9. BMC Anesthesiol. 2026 Jan 12.
       STUDY OBJECTIVE: Large language models (LLMs) are used in all areas of life and have become one of the information sources for those seeking healthcare. Although ChatGPT is the most well-known, Claude, CoPilot, and GEMINI are also among the other LLMs. Some of these models have been studied in terms of their response quality metrics to frequently asked questions (FAQs) about broad content areas like anesthesia and to specific FAQs related to obstetric analgesia. However, no studies have yet been conducted on questions related to nerve blocks. In this study, we evaluated the quality of the answers given by the four LLMs to frequently asked questions related to 'nerve block'.
    DESIGN: Prospective, Delphi study, Survey.
    INTERVENTION: Ten FAQs were identified and presented to four LLMs. A Delphi study was conducted to develop an assessment tool. A survey study was then conducted using the developed tool, in which the evaluators, selected through a thorough process, evaluated the LLM responses.
    MEASUREMENTS: The quality of LLM responses was assessed by raters using the ARQuAT (Assessing Response Quality in AI Texts) tool, determined through Delphi rounds. Evaluation criteria included content criteria such as accuracy, comprehensiveness, security, timeliness, and relevance, as well as communication criteria such as understandability, empathy, ethical considerations, readability, and neutrality.
    MAIN RESULTS: ChatGPT and Claude demonstrated superior performance in ARQuAT-Overall scores compared to GEMINI and CoPilot (p < 0.001). ChatGPT and Claude achieved satisfaction rates above 80% in both content and communication quality metrics, significantly outperforming GEMINI (p < 0.001 for both comparisons), while CoPilot showed intermediate performance.
    CONCLUSION: Responses to FAQs related to nerve blocks were well and acceptably addressed by ChatGPT, Claude, and, to a lesser extent, CoPilot. GEMINI performed poorly compared to the others, exhibiting subpar performance on several questions, particularly in terms of safety and relevance.
    DOI:  https://doi.org/10.1186/s12871-025-03596-9
  10. Cureus. 2025 Dec;17(12): e98901
      Introduction Stroke is a major cause of global morbidity and mortality. Readability of educational material is critical for rapid clinical decision-making among healthcare professionals. UpToDate (UpToDate, Inc., Waltham, MA) is a widely used, peer-reviewed point-of-care clinical resource, while ChatGPT (OpenAI, San Francisco, CA) is an emerging AI-based educational support tool. However, a formal comparison of their linguistic accessibility has not been performed. Objective To compare the readability and linguistic complexity of educational material on stroke generated by ChatGPT (GPT-4o) versus content retrieved from UpToDate, using validated readability metrics. Design, setting, and participants This cross-sectional study was conducted between May 27 and June 4, 2025. ChatGPT (GPT-4o, accessed May 27, 2025) was prompted to generate educational content on stroke. A corresponding section from UpToDate (accessed May 27, 2025) was extracted. Only prose content was analyzed. Readability parameters assessed included total word count, sentence count, word/sentence ratio (average words per sentence), Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG) Index, difficult word count, and difficult word percentage. Data were analyzed using IBM SPSS v25 (IBM Corp., Armonk, NY) and R v4.3.2 (R Foundation for Statistical Computing, Vienna, Austria). The Mann-Whitney U test was used. P < 0.05 was considered statistically significant. Results UpToDate content was substantially longer (median = 2772 vs. 304 words; p = 0.008) and used more sentences (median = 134 vs. 23; p = 0.032) and difficult words (median = 857 vs. 88; p = 0.008) compared to ChatGPT. The word/sentence ratio (average words per sentence) was also higher (21.7 vs. 13.2; p = 0.008). However, no statistically significant differences were observed for FRE (p = 1.000), FKGL (p = 0.222), SMOG Index (p = 0.151), or difficult word percentage (p = 0.690). Conclusions ChatGPT produces shorter and more concise educational content on stroke while maintaining comparable readability to UpToDate. The lower linguistic density may enhance rapid orientation for trainees; however, the reduced depth indicates ChatGPT should supplement, not replace, established peer-reviewed resources. Future research should explore multiple medical topics, additional AI models, and assess the clinical applicability and accuracy of AI-generated content.
    Keywords:  artificial intelligence; chatgpt; medical education; readability score; stroke; uptodate
    DOI:  https://doi.org/10.7759/cureus.98901
  11. Digit Health. 2026 Jan-Dec;12:12 20552076251412700
       Background: Artificial intelligence (AI) chatbots are increasingly used for health information dissemination. However, their effectiveness depends on the clarity, reliability, and quality of the content they deliver. This cross-sectional study aimed to evaluate the readability and reliability of kyphosis-related information provided by six major AI chatbots: ChatGPT, Gemini, Copilot, Perplexity, DeepSeek, and Grok.
    Methods: We selected the top 10 kyphosis-related questions from Google's "People also ask" section and submitted them to each chatbot. Readability was assessed using FKGL, FKRS, GFOG, SMOG, CL, ARI, and LW indices. Quality and reliability were evaluated using the DISCERN tool, JAMA benchmark, Global Quality Score (GQS), Ensuring Quality Information for Patients (EQIP), and a kyphosis-specific content score (KSC). Statistical analyses were performed using the Kruskal-Wallis and Mann-Whitney U tests.
    Results: No statistically significant difference was found among chatbots in FKGL, FKRS, SMOG, ARI, or GFOG scores. However, Perplexity had significantly higher DISCERN and EQIP scores, indicating superior content quality. All chatbots presented content at a readability level higher than the AMA-recommended sixth-grade level. While AI models provided more comprehensive and up-to-date information than traditional web sources, their outputs remained challenging for the average patient to comprehend.
    Conclusions: AI chatbots offer promising tools for disseminating health information about kyphosis but require significant improvements in readability. Expert-reviewed and patient-centered refinements are necessary to ensure accessibility and safety in digital health communication.
    Keywords:  Kyphosis; artificial intelligence; information; patient education; readability
    DOI:  https://doi.org/10.1177/20552076251412700
  12. J Sex Med. 2026 Jan 07. pii: qdaf399. [Epub ahead of print]23(2):
      
    Keywords:  digital health; health literacy; online resources; patient education; readability assessment
    DOI:  https://doi.org/10.1093/jsxmed/qdaf399
  13. Ann R Coll Surg Engl. 2026 Jan 12.
       INTRODUCTION: Most patients with Crohn's disease (CD) have at least one bowel resection during their lifetime. Patients considering surgery will probably look for information online, as is common practice among patients with chronic illnesses. The aim of this systematic review is to assess the quality and readability of web-based patient information on bowel resection for CD.
    METHODS: Google was searched using predefined search terms, developed with input from patient experts. For each term, results from the first two pages were screened for eligibility. Patient-focused websites on bowel resection for CD were included. The quality of the information was assessed using the DISCERN tool, and the readability with the Flesch-Kincaid ease of readability (FK) score. The accessibility adjustments of websites were also assessed.
    RESULTS: Of the 118 sources identified, 91 were excluded and 27 sources were analysed. One-third (n = 10) did not discuss the different types of resections. Ileocolic resection (the most commonly performed resection) was described in eight sources. Discussion of management post-resection (n = 6) and of lifestyle changes (n = 11) was sparse. There were some instances of factually incorrect information. The mean DISCERN score was 3.1 ± 0.80 (range 1-5), indicating moderate quality information. The mean FK score was 51.9 ± 8.70 (corresponding to patients requiring A levels or equivalent to fully understand the text).
    CONCLUSIONS: The study findings highlighted the limitations of the current online patient information surrounding bowel resection in CD. The involvement of patients, working alongside professional bodies and clinicians, in the development of health-related websites is recommended.
    Keywords:  Bowel resection; Crohn’s disease; Patient empowerment.; Shared decision making
    DOI:  https://doi.org/10.1308/rcsann.2025.0108
  14. Br Dent J. 2026 Jan 16.
      Objectives To explore linguistic characteristics of patient education materials (PEM) in paediatric dentistry.Methods A convenience sample of 52 PEM articles (2013-2023) was obtained from four sources: plain language summaries of Cochrane systematic reviews (n = 25), Journal of American Dental Association patient pamphlets (n = 15), online patient health information from the Canadian Dental Association (n = 7), and MedLine Plus (n = 5) websites. Two investigators manually evaluated articles using the Patient Education Materials Assessment Tool (PEMAT) - printed materials. Additional computerised analyses included five Linguistic Inquiry and Word Count (LIWC) measurements and two readability measurements (Flesch Reading Ease and Simple Measure of Gobbledygook). Descriptive and comparative statistics were undertaken.Results PEM articles from all four sources scored above the recommended 70% threshold for both PEMAT composite measurements (understandability/actionability) but minimal use (21%) of visual aids was identified. Mean values of LIWC summary measures (analytical thinking = 84, authenticity = 46, clout = 53, emotional tone = 29, big words = 26) indicated scope for linguistic improvement of PEM articles. Readability analyses indicated PEM articles were generally easy to read (≤Grade 6 level) except for Cochrane articles (Grade 9 level).Conclusions PEM articles evaluated in present study were sub-optimal thereby reducing parental ability to make well-informed decisions for their children.
    DOI:  https://doi.org/10.1038/s41415-025-9248-4
  15. Laryngoscope Investig Otolaryngol. 2026 Feb;11(1): e70335
       Objective: Online searches for medical educational material have continued to increase since the Inspire implant system for treatment of obstructive sleep apnea received FDA approval, and over 50,000 patients have undergone device placement as of 2023. Several professional societies, including the American Medical Association (AMA) and National Institutes of Health (NIH), recommend that patient-facing materials be written at or below a sixth-grade reading level. A review was conducted to evaluate the readability, quality, and transparency of online educational resources concerning the Inspire implant.
    Methods: Fifty websites were identified through a popular search engine and categorized as Academic Medical Center Websites (AMC), Health Journalism/Media Websites (HJ/M), Other Websites (OW), or Private Practice Websites (PP). Readability was assessed using nine formulas from ReadabilityFormulas.com. Three independent reviewers scored each site using the DISCERN Instrument and HONcode criteria, with average scores calculated for analysis.
    Results: Except for the Linsear Write Readability Formula, all sites exceeded a sixth-grade reading level. No statistically significant differences were found between any groups across readability assessments. HJ/M websites had significantly higher overall HONcode scores than AMC and PP websites and a significantly higher DISCERN score than AMC websites.
    Conclusion: All surveyed websites provided information on the Inspire Hypoglossal Nerve Implant at similar reading levels, although these reading levels were well above the recommended sixth-grade target recommended by the AMA and NIH. As Americans increasingly turn to the internet as a source of medical information, healthcare providers must ensure that patient-facing materials are accessible, comprehensible, and transparent.
    Keywords:  DISCERN; HONcode; PS/QI; inspire; readability; websites
    DOI:  https://doi.org/10.1002/lio2.70335
  16. Int J Med Inform. 2026 Jan 10. pii: S1386-5056(25)00463-0. [Epub ahead of print]209 106246
       OBJECTIVE: Large Language Models (LLMs) are increasingly applied to patient education, yet their performance in languages that are relatively underrepresented in medical-domain corpora and large language model training datasets remains underexplored. Psoriasis and psoriatic arthritis (PsA) are chronic, immune-mediated diseases requiring lifelong patient engagement, making them suitable conditions to evaluate the clarity, reliability, and inclusivity of AI-generated educational content. To assess the comprehensibility, scientific reliability, and patient-centered communication of Turkish patient education materials for psoriasis vulgaris and PsA generated by seven state-of-the-art LLMs.
    METHODS: A cross-sectional analysis compared outputs from ChatGPT-4o, Gemini 2.0 Flash, Claude 3.7 Sonnet, Grok 3, Qwen 2.5, DeepSeek R1, and Mistral Large 2. Brochures were produced using standardized zero-shot prompts and evaluated via the Ateşman readability index and the DISCERN instrument. Overall differences in DISCERN scores across the seven models were assessed using a Friedman test, followed by Bonferroni-adjusted Wilcoxon signed-rank post-hoc analyses.
    RESULTS: Readability scores ranged from 61.6 to 80.2 (mean = 71.3 ± 6.9), with ChatGPT-4o and Qwen 2.5 generating the most accessible texts. DISCERN reliability scores ranged from 38.5 to 60.5, with Claude 3.7 Sonnet and Gemini 2.0 Flash showing the highest accuracy. Models prioritizing factual precision produced denser language, while conversational models favored fluency but sacrificed depth. Notable variation was observed, with only Claude 3.7 Sonnet and Gemini 2.0 Flash consistently reflecting patient-centered perspectives.
    CONCLUSION: LLMs showed observable differences in balancing clarity and reliability when generating health education leaflets in Turkish. Most outputs appeared to lack explicit psychosocial framing and emphasis on shared decision-making, which may suggest the need for more culturally adaptive training, clinician oversight, and locally grounded validation frameworks to support safe and inclusive AI-based patient education.
    Keywords:  Arthritis, Psoriatic / education; Artificial Intelligence; Health Literacy; Natural Language Processing; Psoriasis / education
    DOI:  https://doi.org/10.1016/j.ijmedinf.2025.106246
  17. JAMA Netw Open. 2026 Jan 02. 9(1): e2552106
       Importance: The unexplored quality of evidence supporting online video claims by medical professionals creates a credibility-evidence gap that threatens the principles of evidence-based medicine.
    Objective: To systematically evaluate the evidence hierarchy supporting medical claims in health care professional-created online videos using a novel evidence classification framework.
    Design, Setting, and Participants: In this quality improvement study using a cross-sectional analysis, YouTube was searched using cancer- and diabetes-related terms. A total of 309 videos met the inclusion criteria. The video search, data extraction, and archiving were conducted between June 20 and 21, 2025, to create a static dataset. Videos were assessed using the newly developed Evidence-GRADE (E-GRADE [Grading of Recommendations Assessment, Development and Evaluation]) framework, categorizing evidence into 4 levels: grade A (high certainty from systematic reviews and/or guidelines), grade B (moderate certainty from randomized clinical trials, cohort studies, and high-quality observational studies with clear citations), grade C (low certainty from limited observational studies, physiological mechanisms, or case series without critical appraisal), and grade D (very low or no certainty from anecdotal evidence).
    Exposure: Videos that had a minimum of 10 000 views, were created by health care professionals, had a minimum duration of 1 minute, and contained specific health claims.
    Main Outcomes and Measures: Primary outcomes included the distribution of evidence grades (A-D) supporting medical claims. Secondary outcomes included correlations between evidence quality and engagement metrics (views and likes) and traditional quality scores (DISCERN, JAMA benchmark criteria, and Global Quality Scale).
    Results: Among the 309 videos included, which had a median of 164 454 (IQR, 58 909-477 075) views, most medical claims (193 [62.5%]) were supported by very low or no evidence (grade D), while only 61 claims (19.7%) were supported by high-quality evidence (grade A). Moderate (grade B) and low (grade C) evidence levels were found in 45 (14.6%) and 10 (3.2%) videos, respectively. The correlation with view counts was statistically significant for grade D videos, which were associated with a 34.6% higher view count (incidence rate ratio, 1.35; 95% CI, 1.00-1.81; P = .047) than grade A videos. Traditional quality tools showed only weak correlations (range of coefficients, 0.11-0.23) with evidence levels, thus failing to detect important qualitative differences.
    Conclusions and Relevance: In this quality improvement study, a substantial credibility-evidence gap was found in physician-generated video-sharing content, where medical authorities often legitimized claims lacking robust empirical support. These findings emphasize the need for evidence-based content guidelines and enhanced science communication training for health care professionals to maintain scientific integrity in digital health information.
    DOI:  https://doi.org/10.1001/jamanetworkopen.2025.52106
  18. Sci Rep. 2026 Jan 12.
      Sleep Apnea Hypopnea Syndrome (SAHS) is a prevalent sleep disorder associated with substantial health risks, highlighting the need for improved public awareness. This cross-sectional analysis systematically evaluated the quality of SAHS-related videos on YouTube, Bilibili, and TikTok. Of 903 videos initially identified, 227 met the inclusion criteria for analysis. Cross-platform comparisons revealed that long-form platforms hosted higher-quality content, whereas short-form platforms generated greater engagement despite lower informational integrity. This study reveals a structural disconnect between informational quality and audience engagement, consistent with theories of algorithmic filtering. While professional identity remains a reliable predictor of quality, user engagement is largely driven by peripheral cues rather than medical accuracy. This study further contributes to the theoretical understanding of online health communication by situating platform-specific patterns within broader frameworks of algorithmic curation, heuristic processing, and trust formation. By integrating these theoretical perspectives with empirical quality assessments, the study offers a conceptually grounded explanation for why medically accurate content often remains less visible within algorithmic media environments. These findings underscore the need for platform-specific interventions that integrate credibility signals into recommendation algorithms to mitigate the spread of low-quality health information.
    Keywords:  Information quality; Online video; Public health; Sleep apnea hypopnea syndrome; Social media
    DOI:  https://doi.org/10.1038/s41598-025-34182-1
  19. Clin Imaging. 2026 Jan 07. pii: S0899-7071(26)00007-0. [Epub ahead of print]131 110715
       OBJECTIVE: To evaluate the quality and reliability of breast cancer screening information on TikTok using the DISCERN tool, and to compare scores across content creators, including physicians, non-physicians, and private clinics.
    METHODS: A search for the hashtag #BreastCancerScreening on TikTok was conducted March 2025. From 983 videos retrieved, 75 met inclusion criteria after applying filters for language, relevance, and engagement. Each video was evaluated independently by two reviewers using the DISCERN questionnaire. Videos were categorized by content creator type, gender, physician specialty, and video format. Statistical analysis included Kruskal-Wallis tests and weighted-Cohen's-kappa for inter-rater reliability.
    RESULTS: Among 75 analyzed videos, 41% were created by physicians, 31% by non-physicians, and 28% by private clinics. Physician videos received the highest mean DISCERN score (3.12), followed by private clinics (3.07), and non-physicians (2.29). Videos focusing on breast cancer imaging scored highest (3.14), while those based on personal experiences scored lowest (2.35). Kruskal-Wallis testing revealed significant differences in DISCERN scores across creator types (p < 0.001). Post-hoc analysis showed that physician and private clinic videos scored significantly higher than non-physician videos. Inter-rater reliability was moderate for physicians, fair for non-physicians, and very good for private clinics.
    CONCLUSION: Breast cancer screening information on TikTok varies in quality. Content created by physicians and clinics is more reliable/comprehensive. Because DISCERN evaluates quality rather than scientific accuracy, these findings reflect how clearly information is communicated rather than its medical correctness. Improving clarity and reliability of social media health content could enhance public understanding and encourage informed screening behaviors.
    Keywords:  Breast cancer screening; DISCERN tool; Health information quality; Patient education; Social media; TikTok
    DOI:  https://doi.org/10.1016/j.clinimag.2026.110715
  20. Breast J. 2026 ;2026 8821629
       Background and Aims: Sex-/gender-specific health information for men with breast cancer is lacking. Health information supports patients in shared decision-making. When developing evidence-based health information, it is important to identify the patients' information needs and preferences with regard to age, sex or gender, and other diversity aspects, including how the content is provided for the target group. However, studies show that sex/gender differences have rarely been considered. Our study investigates the information needs and preferences of cisgender men with breast cancer.
    Methods: A content-structuring, qualitative content analysis of forum posts was performed. Internet forums and posts were selected according to the following criteria: relevance of the topic, English or German language, and public availability without registration. A qualitative content analysis according to Kuckartz was conducted. The selected posts were coded using MAXQDA.
    Results: A total of 1025 posts from three Internet forums were screened, and 96 posts were included for analysis-most of them from a German Internet forum. We identified seven main categories and 26 subcategories. Information needs and preferences are represented by the following main categories: "Epidemiology and general questions about the disease," "Diagnostics," "Therapy," "Physician specialist services," "Rehabilitation and lifestyle adaption," and "Mental health." Additionally, the "Preference for and access to current information" plays a role for the patients.
    Conclusions: Our study provides new insights into the information needs and preferences of men with breast cancer, mainly from German-speaking countries. Providing accurate and reliable health information that meets patients' needs and preferences is an ethical duty and has to be provided by healthcare systems. Such patient-centered and inclusive health care will empower patients to make informed decisions.
    Keywords:  breast cancer; health information; information needs; male breast cancer; patient information
    DOI:  https://doi.org/10.1155/tbj/8821629