bims-librar Biomed News
on Biomedical librarianship
Issue of 2026–03–08
fifty-one papers selected by
Thomas Krichel, Open Library Society



  1. J Med Libr Assoc. 2026 Jan 01. 114(1): 31-37
       Background: Numerous studies have emphasized the crucial role of library resources in improving educational outcomes. However, there is a significant gap in research on how vocational medical students, a key group in the healthcare workforce, utilize these resources. This gap in the research highlights the need to further investigate the unique challenges and factors influencing library resource utilization in vocational medical students.
    Case Presentation: One hundred and seventeen vocational medical students from a medical vocational college were assessed what influenced their library resource usage. An online survey was conducted to collect data on usage patterns, satisfaction with library resources, and satisfaction with self-reported retrieval abilities. The sample included 48 males and 69 females, with an average age of 19.1±0.7 years. Of the participants, 38.5% (45 students) reported effective library resource utilization. Lasso regression and logistic regression analyses identified two key predictors: satisfaction with library's space capacity (OR 4.26, 95% CI 1.438~12.622) and satisfaction with resource retrieval ability (OR 7.362, 95% CI 1.311~41.341). ROC analysis revealed a high predictive value, with an area under the curve (AUC) of 0.866 (95% CI 0.796~0.936).
    Conclusions: This study identified satisfaction with library's space capacity and satisfaction with resource retrieval ability as key factors influencing library resource utilization by vocational medical students. To enhance library resource utilization, targeted strategies such as strengthening library infrastructure and improving students' information literacy should be considered.
    Keywords:  Library resource utilization; bootstrap; influencing factors; regression analysis; retrieval ability; space capacity; vocational medical students
    DOI:  https://doi.org/10.5195/jmla.2026.2125
  2. J Med Libr Assoc. 2026 Jan 01. 114(1): 46-52
       Background: Knowledge syntheses require complex searches of the literature, but many have poor quality, irreproducible search methods. Academic libraries support researchers conducting knowledge syntheses in many ways, including providing training such as workshops. However, for training to be successful, effective teaching theories and methods need to be used, such as andragogy and instructional design. These can help to develop learning strategies and experiences based on the needs of the learners.
    Case Presentation: At Federation University Australia Library, in response to increasing requests for support from researchers conducting knowledge syntheses, a series of workshops on systematic searching was developed using adult learning methods. We aimed to deliver quality, engaging learning experiences to researchers, and using instructional design was likely to help us meet this goal. Learning outcomes were identified, followed by developing active, collaborative learning strategies and activities. After implementation, the workshops were evaluated informally, resulting in planned changes and improvements to future offerings.
    Conclusions: Using andragogy and instructional design was a successful method of developing the workshops as it provided a structure to follow, and centered researcher needs. While positive feedback was received from workshop participants, there is a need to formally evaluate the learning outcomes to determine if the workshops resulted in improvements in systematic searching practices. The approach to developing the workshops can be adapted by other libraries delivering similar training on systematic searching. It is our aim that by promoting the use of effective teaching methods, the quality of search methods in knowledge syntheses will improve.
    Keywords:  Systematic review; academic library; andragogy; instructional design; knowledge synthesis; searching
    DOI:  https://doi.org/10.5195/jmla.2026.2185
  3. J Med Libr Assoc. 2026 Jan 01. 114(1): 1-10
      The medical or health sciences library professional vocabulary uses many words that start with an I. On the eve of the 60th anniversary of the Janet Doe Lectureship, this lecture highlights and summarizes the 15 lectures (27%) that have included an I in their titles. The most frequent I word was information; this word appeared in four lectures. Only one lecture used more than one I word in the title. A new I word incorporated in this lecture but not its title is Intelligence, Artificial. +Italics were used to emphasize I words within the lecture or titles of published works.
    Keywords:  Janet Doe Lectures; health sciences libraries; information; professional vocabulary; research
    DOI:  https://doi.org/10.5195/jmla.2026.2431
  4. J Med Libr Assoc. 2026 Jan 01. 114(1): 11-20
      In 2019 the Medical Library Association (MLA) transitioned to a community structure composed of caucuses. Four years after the transition, the 2023-2024 MLA Rising Stars cohort was asked to investigate how the caucuses were currently functioning and any challenges to their sustainability. This Special Paper will describe the study conducted by the Rising Stars cohort, and its research findings. Preliminary recommendations include greater standardization of annual reporting, additional guidance and discussion forums for caucus leadership, and an increase in events focused on professional development, networking, and information sharing such as those held during Experience MLA.
    Keywords:  Community Engagement; Health Science Librarians; Medical Library Association; Organizational Commitment; Professional organizations; library association management; organizational change
    DOI:  https://doi.org/10.5195/jmla.2026.2183
  5. J Med Libr Assoc. 2026 Jan 01. 114(1): 21-30
       Objectives: This study examines the experiences of librarians who support physician assistant/associate (PA) programs, describing the unique challenges of these programs and outlining strategies that librarians adopt to engage these programs.
    Method: This mixed-methods study includes two phases: (1) a quantitative survey developed and distributed to library personnel in institutions with established or developing PA programs in the US and Canada, and (2) semi-structured interviews with fifteen selected survey respondents, focusing on their experiences and perceptions related to PA education support. The qualitative data were analyzed using thematic analysis.
    Results: Seventy-five survey responses were collected. Key findings from the survey include: most respondents were from universities with health sciences programs, with nursing and physical therapy being the most common additional programs. Most library-led instruction occurred during the didactic phase and focused on search skills and evidence-based practice. PubMed and UpToDate were the most library-promoted resources. Two thematic elements discovered through the semi-structured interviews were "relationship building as paramount" and "impact of the learning curve on librarian workload."
    Conclusion: Librarians who support PA educational programs face challenges related to relationship building, financial resources, workload, and steep learning curves. The findings underscore the need for targeted professional development programs to equip librarians with the necessary knowledge and skills.
    Keywords:  Physician assistant (associate) education; health sciences librarianship; librarian workload; library instruction; resource management
    DOI:  https://doi.org/10.5195/jmla.2026.2211
  6. IEEE J Biomed Health Inform. 2026 Mar 03. PP
      Academic literature retrieval is constrained by the paradox of "information overload" versus "evidence scarcity", a tension that deepens when researchers iteratively refine their queries in multi-turn conversational settings. To address this challenge, we propose Conversational Literature Personalized Re-ranking (CLPR), a personalized framework that unifies dense semantic retrieval with personalized user profiling. CLPR first performs a broad high-recall retrieval to collect candidate documents, then compresses conversational history into a concise textual profile that encodes sequential continuity, immediate focus, and long-term research background via a large language model. The generated profile serves as a pseudo-query for a neural cross-encoder to produce the final ranking. Cross-domain testing on the public LitSearch (computer science) benchmark confirms its robust generalization, yielding an NDCG@10 of 0.4793. On MedCorpus, a new multi-turn biomedical conversational retrieval benchmark constructed for this study, CLPR attains state-of-the-art performance with P@1 = 0.9497 and NDCG@10 = 0.9271, surpassing the strongest baseline by substantial margins. Ablation shows long-term background cues contribute most, and maintaining a short, up-to-date profile across turns outperforms a static one. CLPR therefore delivers accurate, personalized literature retrieval and can accelerate evidence synthesis across scientific domains.
    DOI:  https://doi.org/10.1109/JBHI.2026.3669741
  7. J Med Libr Assoc. 2026 Jan 01. 114(1): 53-59
       Background: Non-healthcare undergraduate students frequently seek drug-related information online, often relying on unverified sources such as Google or YouTube. Early exposure to professional drug information databases may promote evidence-based information-seeking habits.
    Case Presentation: A one-hour training session on using Lexicomp, a professional drug information database, was conducted for 55 non-healthcare students and 58 pharmacy students at a women's university in South Korea. The session included live demonstrations and guided search tasks. Participants completed pre- and post-training surveys assessing their information-seeking behaviors, perceptions of source reliability, and intention to use Lexicomp. Students also ranked drug information types they typically searched for and anticipated using Lexicomp to find. Only 1.8% of non-healthcare students had prior knowledge of Lexicomp, compared to 100% of pharmacy students. After the training, 100% of non-healthcare students rated Lexicomp as more reliable than their usual sources, and over 90% expressed willingness to use it in the future. A marked shift in information-seeking priorities was observed, with greater emphasis on clinically relevant topics such as adverse effects and contraindications. Students reported increased confidence and found the platform easier to use than expected.
    Conclusion: A brief educational intervention was effective in improving drug information literacy among non-healthcare students. Early training in professional resources may foster long-term adoption of evidence-based practices in personal health information use.
    Keywords:  Drug information database; Evidence based practice; Health professionals; health literacy; non-healthcare students
    DOI:  https://doi.org/10.5195/jmla.2026.2165
  8. Bioinform Adv. 2026 ;6(1): vbag058
       Motivation: The exponential growth of open-access scientific literature presents researchers with unprecedented opportunities but also poses a significant challenge: how to efficiently identify and prioritize relevant publications in a transparent and customizable manner. Existing search engines index large volumes of biomedical literature but rarely provide user-defined ranking options, reproducibility, or integration of domain-specific criteria. This gap is particularly limiting for specialized fields, where nuanced keyword combinations, literature recency, and contextual interpretation are critical.
    Results: We present HERMES, an open-source literature mining tool for targeted retrieval and ranking of full-text open-access publications from PubMed Central (PMC). HERMES employs a composite scoring algorithm that integrates keyword frequency, citation counts, and publication age to prioritize publications. It further supports summarization, biomedical entity recognition, and PDF report generation. An intuitive graphical user interface (GUI) allows researchers without programming expertise to perform complex literature mining tasks, while multithreaded processing ensures efficiency for large-scale queries. HERMES provides a reproducible and adaptable framework for literature discovery, empowering researchers to rapidly identify relevant literature and promoting transparency and community-driven extension.
    Availability and implementation: HERMES (version 1.2) is implemented in Python (3.11). The source code is freely available on GitHub at https://github.com/julien-charest/hermes and is distributed under the GPL-3 license.
    DOI:  https://doi.org/10.1093/bioadv/vbag058
  9. J Med Libr Assoc. 2026 Jan 01. 114(1): 60-66
       Background: Health sciences librarians frequently engage in discussions about the appropriate assignment of evidence synthesis reviews (ES) for graduate students as course, thesis, or capstone projects. Such reviews are often assigned to build the research skills needed in a clinical environment. In the assignment of these reviews, it has become apparent that health sciences faculty are often not familiar with required standardized methodologies. Incorrect methodologies can contribute to research waste and produce evidence that cannot be applied for its intended purpose.
    Case Presentation: Health sciences librarians at an R1 institution ventured to address the ES review knowledge gap through a continuing education webinar for health sciences faculty and graduate students. The webinar provided guidance on systematic review (SR) methodology, optional alternative research assignments, and discussions encouraging the use of these assignments. The alternative assignments were developed based on those presented by Lipke & Price (2025), each with specific learning objectives and grading rubrics. Pre- and post-webinar surveys were conducted to gauge any changes in participants' knowledge, skills, or abilities related to the presented information.
    Conclusions: Study participants included six faculty and a graduate student. Survey results showed that participants had an improved understanding of, and placed increased importance on, ES method guidelines, with an equal understanding of the need for alternative assignments. The authors of this study will further evaluate the impact of this webinar and assess its effectiveness in changing health sciences research assignments.
    Keywords:  Cognitive load theory; Evidence Synthesis; Graduate assignments; Health Sciences; Research Instruction; Systematic Review
    DOI:  https://doi.org/10.5195/jmla.2026.2056
  10. J Med Libr Assoc. 2026 Jan 01. 114(1): 68-74
       Background: Use of evidence-based medicine (EBM) can improve patient outcomes, but translating classroom learning of EBM to clinical practice is challenging. Training students to utilize and apply principles of EBM is critical but data and methods for evaluating students' EBM skills are lacking.
    Case Presentation: The Hackensack Meridian School of Medicine has early curricular introduction of information mastery techniques to combat these challenges. Students create research presentations related to the weekly problem-based-learning (PBL) case to practice applying EBM skills. Medical librarians developed and utilized an assessment tool to evaluate students' weekly presentations. Librarian staff reviewed 595 presentations during the first year of the pre-clerkship curriculum using five criteria: (1) appropriate scope of presentation (2) correct categorization of the question based on the finding information framework (3) appropriate resource used (4) search strategy and (5) bibliographic citations according to American Medical Association (AMA) guidelines.
    Conclusions: Of the evaluated presentations using these criteria, the majority of students routinely and reliably applied EBM skills in their case-based presentations. Further studies will need to look at continued development of these skills throughout other phases of training.
    Keywords:  Evidence-based medicine; assessment; health systems science; problem-based learning
    DOI:  https://doi.org/10.5195/jmla.2026.2203
  11. J Med Libr Assoc. 2026 Jan 01. 114(1): 79-82
      bims: Biomed News. February 5, 2017-Present. https://biomed.news/, Created by Thomas Krichel and directed by Gavin P. McStay. Free. Accessible via any web browser.
    DOI:  https://doi.org/10.5195/jmla.2026.2288
  12. J Med Libr Assoc. 2026 Jan 01. 114(1): 86-87
      OpenEvidence. AI based Medical Information platform. Released 2023. OpenEvidence Inc. Cambridge. Massachusetts. https://www.openevidence.com/; Founder& CEO: DR. Daniel Nadler. Free of cost for Healthcare Professionals. Registration is required to use Open Evidence.
    DOI:  https://doi.org/10.5195/jmla.2026.2247
  13. BMC Med Res Methodol. 2026 Mar 04.
       BACKGROUND: Integrating artificial intelligence (AI) into literature searching has the potential to enhance research synthesis by improving the identification of conceptually rich or otherwise difficult-to-locate evidence. Theoretical or conceptual literature reviews, including realist reviews, often involve resource-intensive searches because they aim to trace nuanced ideas, mechanisms, or conceptual relationships across multiple sources. This case study illustrates the use of AI-powered tools to support and streamline such literature searching, using a realist review as an example.
    METHODS: We applied AI tools-Scite and Undermind-in the context of a realist review to facilitate the identification of relevant studies. Seed papers and key informant papers guided the search, and a novel classification system (grandparent, parent, and child papers) was used to systematically organise studies for developing and refining theoretical constructs. Transparent screening procedures and decision-making frameworks were employed to ensure methodological rigour and reproducibility.
    RESULTS: The integration of AI tools supported the retrieval of conceptually relevant literature and helped manage complex datasets. The classification system enabled structured organisation of studies, supporting iterative testing and refinement of theoretical constructs. The workflow demonstrated flexibility and adaptability, suggesting potential applicability beyond realist review.
    CONCLUSIONS: Our findings suggest that AI-powered tools can support literature searching, particularly in identifying conceptually relevant studies. However, these tools do not replace the critical interpretive work required by researchers. Human judgement remains essential to assess relevance, evaluate nuanced concepts, and make informed decisions throughout the search process, with AI serving as a valuable adjunct rather than a substitute.
    Keywords:  Artificial intelligence; Evidence appraisal; Literature screening; Literature searches; Realist reviews; Review methodology
    DOI:  https://doi.org/10.1186/s12874-026-02814-3
  14. Sci Rep. 2026 Mar 05.
      Conversational AI, such as ChatGPT, is increasingly used for information seeking. However, little is known about how ordinary users actually prompt and how ChatGPT adapts its responses in real-world conversational information seeking (CIS). In this study, a nationally representative sample of 937 U.S. adults engaged in multi-turn CIS with ChatGPT on both controversial and non-controversial topics across science, health, and policy contexts. We analyzed both users' prompting strategies and the communication styles of ChatGPT's responses. The findings revealed behavioral signals of digital divide: only 19.1% of users employed prompting strategies, and these users were disproportionately more educated and Democrat-leaning. Further, ChatGPT demonstrated contextual adaptation: responses to controversial topics contain more cognitive complexity and more external references than to non-controversial topics. Notably, cognitively complex responses were perceived as less favorable but produced more positive issue-relevant attitudes. This study highlights disparities in user prompting behaviors and shows how user prompts and AI responses together shape information-seeking with conversational AI.
    Keywords:  Conversational AI; Digital Divide; Information-seeking; Prompting Strategies
    DOI:  https://doi.org/10.1038/s41598-026-42465-4
  15. Zhonghua Kou Qiang Yi Xue Za Zhi. 2026 Mar 06. 61(3): 332-338
      Objective: To investigate the current application status and potential of artificial intelligence (AI) large language models (LLMs) in oral mucosal disease health consultation. Methods: A questionnaire survey was conducted to inform the utilization of AI for oral mucosal disease-related consultations among patients attending the Department of Oral Medicine, West China Hospital of Stomatology, Sichuan University in November 2025, and to compare the factors influencing AI usage behavior and satisfaction. Nine standardized clinical questions concerning the etiology, symptoms, treatment, care, and prognosis of oral leukoplakia (OLK) were input into major LLM platforms. The responses were quantitatively scored by ten oral medicine specialists for accuracy, clarity, relevance, completeness, and practicality using the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool. Concurrently, the readability of the responses was assessed using the Alpha Readability Chinese (ARC) tool. Results: A total of 200 patients with oral mucosal diseases were included. Only 37.5% (75/200) had ever used AI for related consultations. AI usage rate was significantly correlated with younger age and higher education level (P<0.001). Merely 40.0% (30/75) of users were relatively satisfied with current AI consultations, and only 21.3% (16/75) would adopt AI's treatment or care suggestions. However, 96.0% (72/75) expressed positive willingness to continue using AI for future consultations. Based on the QAMAI total scores for the nine typical OLK-related clinical questions, DeepSeek (25.4 points) and Tencent Hunyuan (25.3 points) performed best, rated as "very good quality", while the other models were rated "good quality." All models scored relatively low on the "sources and references" dimension. ARC readability analysis indicated that ByteDance Doubao had the best readability (weighted total score 0.511), while DeepSeek and Tencent Hunyuan had relatively poor readability (0.358 and 0.369, respectively). Conclusions: This study indicates that while current usage rates and satisfaction with AI consultation among patients with oral mucosal diseases need improvement, the future willingness to use it is strong. The systematic evaluation of six mainstream Chinese LLMs reveals significant disparities in their professional information quality and text readability for OLK consultation, alongside a prevalent lack of reliable evidence-based support. This underscores that enhancing the comprehensive quality of AI-generated responses is crucial for realizing its clinical application value.
    DOI:  https://doi.org/10.3760/cma.j.cn112144-20251201-00482
  16. J Med Internet Res. 2026 Mar 06. 28 e85516
       BACKGROUND: Surveys show that many people are willing to use generative artificial intelligence (AI) for health questions. Prior research has largely focused on chatbot accuracy, with some studies finding that both physicians and consumers overwhelmingly prefer chatbot-generated text over physician responses.
    OBJECTIVE: This study aimed to characterize and compare the emotional content of responses from physicians and 2 AI chatbots (OpenAI's ChatGPT and Google's Gemini) and to assess differences in reading level and use of medical disclaimers.
    METHODS: A public, patient-deidentified telehealth website was used to compile 100 physician-answered questions. The same questions were posed to both chatbots between May 18 and 19, 2025. Two coders classified the emotional content of each sentence using a predefined codebook and reviewed for agreement. Emotions were ranked as primary, secondary, and tertiary by the proportion of sentences classified as each emotion per response. Multinomial logistic regression compared emotional rankings using physician responses as the reference. Word count, Flesch Reading Ease, and Flesch-Kincaid Grade Level were analyzed via ANOVA with the Tukey honestly significant difference test. Disclaimer use was compared between chatbots using a χ2 test.
    RESULTS: Primary emotions were overwhelmingly neutral, except for one response from each chatbot in which anger was primary. For secondary emotions, the odds ratio of hope was 80.28% (95% CI 37.71%-93.76%) lower for ChatGPT, while the odds ratio of fear was 3.29 (95% CI 1.44-7.49) times higher for Gemini. For tertiary emotions, the odds ratio of compassion was 1.94 (95% CI 1.06-3.54) times higher, and the odds ratio of having no tertiary emotion was 84.33% (95% CI 64.72%-93.04%) lower for Gemini. Gemini responses averaged 889.1 (SD 305.7) words, ChatGPT 476.5 (SD 109.5), and physicians 193.5 (SD 113.6). Gemini had the lowest average Flesch Reading Ease score at 39.9 (SD 8.8), followed by ChatGPT at 45.8 (SD 12.8), while physicians had the highest at 51.9 (SD 13.6). Gemini had the highest average Flesch-Kincaid Grade Level at 11.3 (SD 1.5), followed by ChatGPT at 9.9 (SD 1.9), and physicians at 9.2 (SD 2.4). Gemini was significantly more likely to include a disclaimer than ChatGPT (χ21=49.2; P<.001).
    CONCLUSIONS: Chatbot responses were significantly (P<.001) longer and more difficult to read than physician responses and were more likely to contain a wider range of emotions. Qualitatively, chatbot responses were more varied in their presentation as well as in the breadth of the emotions themselves. The findings of this study could be used to inform more emotionally connected physician responses to patient message queries.
    Keywords:  AI; ChatGPT; artificial intelligence; chatbot; emotional; health; physicians; responses
    DOI:  https://doi.org/10.2196/85516
  17. BMC Oral Health. 2026 Mar 04.
      
    Keywords:  Accuracy; Anesthesia; Artificial intelligence; Chatbots; Large language models; Readability; Sedation
    DOI:  https://doi.org/10.1186/s12903-026-08026-x
  18. Front Cell Infect Microbiol. 2026 ;16 1773593
       Background: The gut-liver axis integrates intestinal barrier function, microbial ecology, metabolism, immune regulation, and hepatic feedback, yet remains causally non-closed and strongly context dependent. As large language models (LLMs) increasingly mediate biomedical explanation, their ability to preserve evidentiary structure within such epistemically open frameworks requires systematic evaluation.
    Methods: We conducted a cross-platform, mixed-methods infodemiology analysis of five widely accessible LLMs. Twenty clinically grounded questions spanning five hierarchical domains from basic mechanisms to intervention and evaluation generated 100 single-turn responses. Linguistic accessibility was assessed using seven established readability indices, while epistemic integrity was evaluated using the Journal of the American Medical Association Benchmark Criteria, Global Quality Score, and a modified DISCERN framework.
    Results: Linguistic complexity increased as prompts progressed toward intervention and evaluation, without corresponding gains in transparency, reliability, or educational quality. Informational integrity clustered primarily by platform rather than domain. Readability indices showed strong internal concordance, whereas integrity metrics aligned only moderately and correlated weakly with readability. Item-level analysis revealed consistently high narrative clarity but systematic under-signaling of source attribution and uncertainty, resulting in over-coherent explanations that compressed conditional associations into mechanism-like claims.
    Conclusions: LLM explanations of the gut-liver axis are susceptible to epistemic compression driven by narrative fluency rather than factual error. Readability does not reliably indicate epistemic robustness in decision-adjacent contexts. These findings support shifting evaluation and governance from platform comparison toward concept-conditioned requirement engineering that enforces provenance, calibrated uncertainty, and explicit separation of correlation, mechanism, and actionability as generative outputs approach clinical relevance.
    Keywords:  epistemic compression; gut–liver axis; host–microbe interaction; informational reliability; intestinal microbiome; large language models
    DOI:  https://doi.org/10.3389/fcimb.2026.1773593
  19. Thyroid. 2026 Feb;36(2): 162-168
       BACKGROUND: Artificial intelligence (AI) chatbots are increasingly being used by patients to obtain medical information. Comparison between platforms with specialty-specific physician assessment remains limited. This study compares the quality, factual accuracy, readability, and consistency of responses generated by four publicly available AI chatbots when answering patient-centered questions about thyroid radiofrequency ablation (RFA).
    METHODS: We conducted a cross-sectional analysis of chatbot-generated responses using 20 standardized clinical questions about thyroid RFA. Responses from ChatGPT-4, Gemini, Copilot, and Perplexity were evaluated by six blinded physician reviewers experienced in thyroid RFA using 5-point Likert scales for global quality and factual accuracy. Higher Likert scale scores indicated better performance. Readability and response length were analyzed with established metrics. Statistical significance was defined as p < 0.05.
    RESULTS: Gemini achieved the highest mean scores for global quality (4.08 ± 0.87) and accuracy (3.76 ± 1.05), with significantly better performance than ChatGPT and Copilot (p < 0.005). ChatGPT responses were significantly longer and more readable. Score variability across questions was lowest for Gemini. Copilot and Perplexity ranked lowest across most domains. Question-level analysis identified specific prompts that best discriminated between platforms.
    CONCLUSIONS: AI chatbot performance varied across platforms for thyroid RFA queries. Chatbots were generally reliable for straightforward factual information but were less dependable for judgment or context-dependent assessments. These AI tools should supplement, not replace, clinician-vetted patient education and institutional materials.
    Keywords:  artificial intelligence; chatbot; large language model; patient education; radiofrequency ablation; thyroid nodule
    DOI:  https://doi.org/10.1177/10507256251414974
  20. J Craniofac Surg. 2026 Mar-Apr 01;37(3-4):37(3-4): 797-802
       INTRODUCTION: Craniofacial injuries from racquetball sports in the United States remain high, despite guidelines for protective equipment. While national injury data describe injury patterns in squash, badminton, and tennis, they fail to provide actionable, age- and sport-specific management, treatment, or preventive strategies.
    METHODS: Using the National Electronic Injury Surveillance System (NEISS) database, this study evaluated ChatGPT-4o's ability to risk-stratify patients and provide preventive strategies based on demographic characteristics, injury mechanisms, and patient needs. Standardized clinical vignettes reflected craniofacial injuries requiring stratification and counseling. AI-generated responses were scored using the validated DISCERN criteria by 2 board-certified plastic surgeons. Readability was evaluated with the Flesch-Kincaid grade level, and specificity was rated on a 5-point Likert scale by 2 independent medical student reviewers.
    RESULTS: DISCERN score was 32.5/75, with a mean reliability score of 2.9/5, and a treatment quality score of 1.4/5. Readability averages an 11th-grade readability across sports. Specificity ratings indicated moderately high specificity (3.9-4/5).
    DISCUSSION/CONCLUSION: While ChatGPT4-o can provide accessible, structured information, its performance in this study demonstrated moderate reliability, low treatment guidance quality, a reading level above AMA recommendations, and moderately high specificity. These findings underscore the need for cautious integration of AI tools in patient education and clinical decision-making. As LLMs evolve, there is potential for risk stratification and injury prevention tools to improve. Careful development and validation will be integral to ensure safe and effective clinical use, as well as HIPAA compliance, lack of bias, and accurate information.
    Keywords:  ChatGPT; craniofacial injuries; large language models; patient education; racquet sports
    DOI:  https://doi.org/10.1097/SCS.0000000000012095
  21. J Craniofac Surg. 2026 Mar-Apr 01;37(3-4):37(3-4): 812-817
       INTRODUCTION: Large language models (LLMs) like ChatGPT have the potential to improve patient education. Their role in pediatric plastic surgery counseling remains underexplored. This study evaluated ChatGPT-4o's responses to common parent questions across 4 pediatric craniofacial procedures using 5 metrics: DISCERN, specificity, Flesch-Kincaid Grade Level (FKGL), emotion scoring, and Patient Education Materials Assessment (PEMAT).
    METHODS: Twelve standardized vignettes were developed for cleft lip and palate, craniosynostosis, facial trauma from a dog bite, and otoplasty. Each case featured prompts on surgical risks, recovery, and procedure-specific concerns. All were submitted on the same day using the same ChatGPT-4o profile. DISCERN scores were rated by 2 board-certified plastic surgeons. Specificity and emotion were rated on a 5-point Likert scale by 2 medical students. Readability was calculated with FKGL. PEMAT was used to assess understandability and actionability.
    RESULTS: Mean DISCERN score was 43.7/75 (reliability 23.8/40, treatment quality 20.3/35). Mean specificity ranged from 1.7 (craniosynostosis) to 3.0 (otoplasty and dog bite). Average FKGL was 9.5 (10th-grade level). Mean emotion score was 3.1. PEMAT scores averaged 62% for understandability and 27% for actionability. Facial trauma demonstrated the highest in both domains.
    CONCLUSIONS: ChatGPT-4o produced organized, accessible responses, but underperformed in reliability, quality, specificity, and actionability. Reading level exceeds recommended patient education standards of sixth to eighth grade. Emotional tone was moderate but not consistently tailored to sensitive pediatric contexts. These findings suggest ChatGPT is insufficient for unsupervised use. With refinement, LLMs may serve as support, but not replace, physician-led counseling in pediatric craniofacial surgery.
    Keywords:  Artificial intelligence; cleft-lip and palate; health communication; large language models; pediatric craniofacial surgery
    DOI:  https://doi.org/10.1097/SCS.0000000000012116
  22. JMIR Med Inform. 2026 Feb 27. 14 e78838
       Background: Scars and keloids impose significant physical and psychological burdens on patients, often leading to functional limitations, cosmetic concerns, and mental health issues such as anxiety or depression. Patients increasingly turn to online platforms for information; however, existing web-based resources on scars and keloids are frequently unreliable, fragmented, or difficult to understand. Large language models such as GPT-4 show promise for delivering medical information, but their accuracy, readability, and potential to generate hallucinated content require validation for patient education applications.
    Objective: This study aimed to systematically evaluate GPT-4's performance in providing patient education on scars and keloids, focusing on its accuracy, reliability, readability, and reference quality.
    Methods: This study involved collecting 354 questions from Reddit communities (r/Keloids, r/SCAR, and r/PlasticSurgery), covering topics including treatment options, pre- and postoperative care, and psychological impacts. Each question was input into GPT-4 in independent sessions to mimic real-world patient interactions. Responses were evaluated using multiple tools: the Patient Education Materials Assessment Tool-Artificial Intelligence for understandability and actionability, DISCERN-AI for treatment information quality, the Global Quality Scale for overall information quality, and standard readability metrics (Flesch Reading Ease score, and Gunning Fog Index). Three plastic surgeons used the Natural Language Assessment Tool for Artificial Intelligence to rate the accuracy, safety, and clinical appropriateness, while the Reference Evaluation for Artificial Intelligence tool validated references for reference hallucination, relevance, and source quality. We conducted the same analysis to assess the quality of GPT-4-generated content in response to questions from 3 medical websites.
    Results: GPT-4 demonstrated high accuracy and reliability. The Patient Education Materials Assessment Tool-Artificial Intelligence showed 75.5% understandability, DISCERN-AI rated responses as "good" (26.3/35), and the Global Quality Scale score was 4.28 of 5. Surgeons' evaluations averaged 3.94 to 4.43 out of 5 across dimensions (accuracy 3.9, SD 0.7; safety 4.3, SD 0.8; clinical appropriateness 4.4, SD 0.5; actionability 4.1, SD 0.8; and effectiveness 4.1, SD 0.8). Readability analyses indicated moderate complexity (Flesch Reading Ease Score: 50.13; Gunning Fog Index: 12.68), corresponding to a 12th-grade reading level. Reference Evaluation for Artificial Intelligence identified 11.8% (383/3250) hallucinated references, while 88.2% (2867/3250) of references were real, with 95.1% (2724/2867) from authoritative sources (eg, government guidelines and the literature). The overall results about questions from medical websites were consistent with the answers to Reddit questions.
    Conclusions: GPT-4 has serious potential as a patient education tool for scars and keloids, offering reliable and accurate information. However, improvements in readability (to align with sixth to eighth grade standards) and reduction of reference hallucinations are essential to enhance accessibility and trustworthiness. Future large language model optimizations should prioritize simplifying medical language and strengthening reference validation mechanisms to maximize clinical utility.
    Keywords:  GPT-4; generative AI; generative artificial intelligence; keloid; large language model; patient education; readability; scar
    DOI:  https://doi.org/10.2196/78838
  23. Reprod Biomed Online. 2025 Aug 14. pii: S1472-6483(25)00428-6. [Epub ahead of print]52(4): 105221
       RESEARCH QUESTION: Is the quality, relevance and empathy of the answers provided by large language models (LLMs) in response to the most frequently asked patient questions in reproductive medicine comparable to those provided by human specialists?
    DESIGN: This monocentric, double blind, prospective study involved two clinicians and two embryologists who answered 13 frequently asked questions in their respective field. The same questions were asked to a free online LLM, with the same constraint of text length as practitioners. All answers were blindly evaluated by four assessors (two gynaecologists and two embryologists depending on the topic) for quality and accuracy. A psychologist also evaluated empathy.
    RESULTS: The mean number of words per answer was significantly higher (P < 0.001) for LLM than for humans. The average quality of answers was not statistically different between LLM and professionals. No answer provided by LLM was evaluated as completely aberrant, and only a minority contained false or inappropriate information or was scored as being very poor by assessors. Answers provided by embryologists, but not clinicians, ranked significantly higher (P = 0.02) than LLM. The psychologist chose LLM answers as most empathetic, clear, or both, in 14 out of 26 questions.
    CONCLUSIONS: LLMs could be used as an educational tool within assisted reproductive technology centres to answer frequently asked patient questions. Although the potential applications of LLMs' capabilities in answering medical questions are numerous, this should be carefully evaluated and regulated to prevent the dissemination of inaccurate information to patients.
    Keywords:  Artificial intelligence; Frequently asked questions; Infertility patients; Large language models
    DOI:  https://doi.org/10.1016/j.rbmo.2025.105221
  24. JMIR Cancer. 2026 Feb 27. 12 e72839
       Background: Artificial intelligence (AI) is increasingly used to generate medical content, yet its performance in delivering clinically relevant and reliable information remains underexplored, especially in complex areas such as breast cancer.
    Objective: This study aimed to compare ChatGPT-4.0 and DeepSeek-V3 in generating breast cancer information, focusing on readability, content quality, and citation reliability.
    Methods: On the basis of publicly available patient education materials, 10 frequently asked questions were selected. Each model generated 60 responses. Three expert reviewers rated each response using a 7-point Likert scale across 5 dimensions (ie, accuracy, completeness, clarity, depth and insight, and alignment with expert answers). Readability was assessed using Flesch-Kincaid Grade Level scores. Information reliability was evaluated through interrater agreement metrics, including Cohen κ and Fleiss κ. Paired t tests were used for statistical comparisons.
    Results: AI models produced significantly more readable content than expert references (mean Flesch-Kincaid Grade Level difference -2.60; P<.001). ChatGPT-4.0 responses were more stylistically consistent with a median Flesch-Kincaid Grade Level score of 10.66 (IQR 0.98), whereas DeepSeek-V3 showed greater variability with a median Flesch-Kincaid Grade Level score of 10.17 (IQR 1.41). Content quality scores were DeepSeek-V3 achieving a higher mean score than ChatGPT-4.0 (6.22 [SD 0.43] vs 6.01 [SD 0.49]). In the multiresponse analysis, DeepSeek-V3 demonstrated a statistically significant advantage in accuracy (P=.041), while differences across other criteria were not statistically significant (P>.05). Human raters showed almost perfect agreement when judging source reliability (Fleiss κ=0.842 for ChatGPT's citations and 0.935 for DeepSeek's citations). Agreement between each model's citation reliability scores and the expert majority was substantial for ChatGPT (Cohen κ=0.665) and higher for DeepSeek (Cohen κ=0.782).
    Conclusions: Both models generated readable and clinically relevant content with comparable overall performance. ChatGPT provided more consistent readability, while DeepSeek offered more diverse references with stronger alignment to expert ratings. Continued evaluation and quality assurance are essential for the responsible clinical use of AI-generated content.
    Keywords:  AI; ChatGPT; DeepSeek; LLM; artificial intelligence; breast cancer; large language models
    DOI:  https://doi.org/10.2196/72839
  25. Front Public Health. 2026 ;14 1776697
       Background: The advent of LLM (large language model) has seen extensive application in health information consultation, enabling interactive responses to complex queries; however, their reliability and readability warrant further investigation. This study aims to assess the reliability and readability of cross-disciplinary responses generated by artificial intelligence platforms regarding thunderstorm asthma, including ChatGPT-4, Deepseek-V3.2-V3.2, Perplexity Pro, and Microsoft Copilot.
    Methods: This study uses Google Trends to identify and filter topic-specific information on thunderstorm asthma. This study analyses cross-disciplinary responses generated by ChatGPT-4, Deepseek-V3.2, Perplexity Pro, and Microsoft Copilot in response to conversational inputs. The 29 selected responses exhibit varying levels of meteorological forecasting accuracy concerning thunderstorms, as well as prevalent themes related to asthma symptomatology and therapeutic interventions. The study employed reliability assessment tools, including the DISCERN instrument, the Ensuring Quality Information for Patients Scale (EQIP), the JAMA benchmarks, and the Global Quality Scoring (GQS), in conjunction with six authoritative readability metrics-namely, the Automated Readability Index (ARI), Coleman-Liau Grade Level (CL), Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease Score (FRES), Gunning Fog Index (GFI), and SMOG-to enable a comprehensive evaluation.
    Results: Research findings indicate statistically significant differences in the reliability of various artificial intelligence programmes when responding to complex interdisciplinary information queries. Microsoft Copilot demonstrates superior performance in terms of information reliability and structural quality, consistently achieving higher scores than ChatGPT-4-4o and Perplexity Pro, thereby providing more dependable information. However, all programme-generated informational responses were excessively complex for the general public, failing to meet sixth-grade reading comprehension standards, as the majority of outputs were written at a secondary education level or higher.
    Conclusion: This study reveals that while LLM demonstrate some reliability in handling complex health consultations, none meet the recommended readability benchmark for a sixth-grade reading level. Future efforts should focus on improving the reliability and readability of LLM generated health information to enhance comprehension amongst broader audiences.
    Keywords:  information response; large language model; readability; reliability; thunderstorm asthma
    DOI:  https://doi.org/10.3389/fpubh.2026.1776697
  26. PLoS One. 2026 ;21(3): e0327148
    MAPPinfo project group
       BACKGROUND AND AIM: Health literacy refers to the ability to use relevant information to make informed choices. However, the quality of the available information influences how well individuals can make those choices. Evidence-based recommendations for the development and design of health information have recently been published. In this study, we aimed to map the quality of Norwegian web-based health information across selected public health domains.
    METHODS: Using a multiple-cross-sectional design, we assessed information in 16 health domains relevant to infants, children, and youth. Convenience samples were drawn using structured Google searches. Three independent raters conducted the quality appraisal by applying the 19 criteria of the Mapping the quality of health information checklist. Inter-rater reliability was calculated using T-coefficients. Information quality was statistically described. To explain variance in quality, mean quality scores were compared across three independent variables: the type of the health problem, target group, and provider class.
    RESULTS: Across the surveys, 1,948 health information materials from 64 subdomains were assessed. Inter-rater reliability was excellent (mean T = .89/.90). On average, the materials complied with 22% (range: 0-73%, standard deviation = .09) of the current minimal standard. Differences between types of problems or target groups were marginal. No differences were found between information provided by health authorities, health services, or commercial entities.
    CONCLUSION: Norwegian web-based health information is not of sufficient quality to facilitate informed health choices made by citizens. These findings apply across a wide range of public health domains relating to infants, children, and youth. In the absence of appropriate health information of acceptable quality, estimates of the public's level of health literacy may need reconsideration. Further research is needed to appraise the quality of information in other health domains and countries.
    DOI:  https://doi.org/10.1371/journal.pone.0327148
  27. Health Care Sci. 2026 Feb;5(1): 19-28
       Background: Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology-Head and Neck Surgery systematic literature reviews.
    Methods: Three PRISMA-based systematic reviews (Jabbour et al. 2017, Wong et al. 2018, and Wu et al. 2021) were replicated using ChatGPTv3.5 and Bard. Outputs (author, title, publication year, and journal) were compared to the original references and cross-referenced with medical databases for authenticity and recall.
    Results: Several themes emerged when comparing Bard and ChatGPT across the three reviews. Bard generated more outputs and had greater recall in Wong et al.'s review, with a broader date range in Jabbour et al.'s review. In Wu et al.'s review, ChatGPT-2 had higher recall and identified more authentic outputs than Bard-2.
    Conclusion: Large language models (LLMs) failed to fully replicate peer-reviewed methodologies, producing outputs with inaccuracies but identifying relevant, especially recent, articles missed by the references. While human-led PRISMA-based reviews remain the gold standard, refining LLMs for literature reviews shows potential.
    Keywords:  Bard; ChatGPT; artificial intelligence; large language models; systematic review
    DOI:  https://doi.org/10.1002/hcs2.70048
  28. PEC Innov. 2026 Jun;8 100461
       Objective: This study evaluated the validity and reliability of large language model (LLM) responses on dietary supplements (DS), a domain marked by scientific controversy and misinformation. The goal was to support informed consumer decisions and guide improvements in LLM performance.
    Methods: We collected responses from GPT-4 and GPT-4o on the effects of 30 DS on six diseases. Two medical professionals categorized each response as "Effective," "Uncertain," or "Not Effective." They also created a guideline to assess evidence-based effectiveness and compared it with LLM-generated responses to determine accuracy. Additionally, we conducted qualitative content analysis to identify response patterns and misleading content.
    Results: GPT-4 and GPT-4o affirmed DS effectiveness in only 10% of cases, with 40% rated as "Uncertain" and 50% as "Not Effective." Accuracy was about 57%, considerably lower than that observed in nutrition-related studies (57% in DS vs. 80% ∼ in structured nutrition tasks"). Content analysis showed templated responses, frequent ambiguity, and occasional inclusion of irrelevant or incorrect information.
    Conclusion: Our findings suggest that ChatGPT's responses on dietary supplements are generally cautious but often ambiguous, with a moderate risk of misinformation. As generative AI becomes a common source for health advice, these limitations could mislead users. Enhancing LLMs' evidence-based accuracy and ensuring consistent professional guidance are essential.
    Innovation: This is the first study to assess the validity and reliability of LLM-generated responses on dietary supplements using both quantitative and qualitative methods. We also developed a novel evidence-based framework to evaluate supplement effectiveness, providing a new tool for future research and supporting safer AI-assisted health communication.
    Keywords:  ChatGPT; Dietary supplements; Health information reliability; Large language models; Misinformation
    DOI:  https://doi.org/10.1016/j.pecinn.2026.100461
  29. J Immigr Minor Health. 2026 Mar 03.
      
    Keywords:  Evaluation; Health websites; Immigrants and ethnic minorities; Multilingual accessibility; Online health information dissemination
    DOI:  https://doi.org/10.1007/s10903-026-01867-2
  30. Semin Radiat Oncol. 2026 Feb 27. pii: S1053-4296(26)00008-1. [Epub ahead of print]37 151006
      This review presents a summary of radiation therapy patient education materials. The purpose is to evaluate the quality, readability, and relevance of print-format pamphlets used to support patients undergoing radiation therapy; one large, urban academic cancer center is discussed as a reference for key materials. A literature review identified patient information needs, an environmental scan compared the cancer centre's pamphlet collection with those of leading cancer centres, and assessments used the Patient Education Materials Assessment Tool (PEMAT) and the Flesch-Kincaid Grade Level formula. The review revealed gaps in technical content, inconsistent formatting, and readability levels that exceed recommended standards. The review highlights opportunities to improve clarity, consistency, and accessibility by incorporating plain language, standardizing content, and integrating multimedia resources. These findings offer practical guidance for enhancing patient education and health literacy in oncology care and will directly inform ongoing quality improvement efforts, including a broader review of all patient education materials.
    DOI:  https://doi.org/10.1016/j.semradonc.2026.151006
  31. Sci Rep. 2026 Mar 06.
      Despite the exponential increase in the availability of online health information, its quality remains questionable, presenting a significant challenge to address. This study addresses this issue by using artificial intelligence techniques, such as deep learning, to evaluate the quality of health information and to mimic human-level evaluation capabilities. The key methodologies used in the study included an enhanced version of Arabic BERT for medical data, feature extraction techniques incorporating Principal Component Analysis (PCA) and Independent Component Analysis (ICA), and modified loss functions using information entropy to improve the model's certainty and calibration during document classification. The results of the study were encouraging: the proposed PCA-based model achieved higher accuracy than the competing models and reached 94.7% on the dataset used, comparable to reported human-level performance. Finally, these findings may contribute to improving the reliability of online health information in Arabic contexts and provide a foundation for future efforts aimed at supporting healthcare decision-making. The methodologies and results presented here offer policymakers and researchers valuable tools to assess and ensure the trustworthiness of online health information.
    Keywords:  Arab countries; Arabic online health information; Calibrating AI systems; Deep learning; Online health information; Quality assessment; Trustworthiness
    DOI:  https://doi.org/10.1038/s41598-026-43158-8
  32. Cureus. 2026 Jan;18(1): e102398
       BACKGROUND: Diet and nutritional therapy are treatment options for children with inflammatory bowel disease (IBD). Parents of children with medical conditions often turn to the internet for medical guidance. However, the quality and readability of internet dietary information for pediatric IBD are currently unknown. The objective of this study was to evaluate the quality and readability of websites about diet for pediatric IBD.
    METHODS: Top internet websites for the searches "IBD, diet, children," "Crohn's disease, diet, children," and "ulcerative colitis, diet, children" were rated using the DISCERN instrument, a validated tool for rating consumer health information, on a scale of 1-5 to assess reliability, information quality, and overall quality (5 = highest reliability or quality). The Flesch-Kincaid grade level (FKGL) was used to determine website readability.
    RESULTS: The mean reliability scores were 3.1 for searches on "IBD, diet, children," 3.0 for "Crohn's disease, diet, children," and 3.1 for "ulcerative colitis, diet, children." The corresponding mean information quality scores were 2.5, 2.5, and 2.4, and the mean overall quality scores were 2.8, 2.7, and 2.9. The mean reading grade levels required to understand the content per FKGL were 11.9, 10.8, and 11.1. Across the websites, 23 highly variable dietary recommendations were made.
    CONCLUSIONS: For internet search results about diet and pediatric IBD, mean scores indicated moderate website reliability but poor information quality and overall quality. Websites were written at approximately an 11th-grade reading level, above the recommended standard for patient education. Dietary recommendations were numerous and inconsistent.
    Keywords:  crohn's disease; diet; internet; nutrition; pediatric inflammatory bowel disease; readability; ulcerative colitis
    DOI:  https://doi.org/10.7759/cureus.102398
  33. Cad Saude Publica. 2026 ;pii: S0102-311X2026000105021. [Epub ahead of print]42 e00102225
      Wide access to online information favors the search for health content. The aims of the present study were to identify the profile of internet use and degree of difficulty in the search for health information by adults, describing the main contents sought and means used, and to investigate associations with sociodemographic/medical characteristics and degree of health literacy. A household survey was conducted in five municipalities in the South and Central-West regions of Brazil. The questionnaire included demographic and socioeconomic data, internet use, and degree of health literacy. Descriptive analyses and Poisson regression were performed to estimate prevalence ratios. A total of 1,181 individuals were included. Women, age between 18 and 39 years, eight or more years of schooling, high economic class, White race, good self-rated health, and problematic health literacy predominated in the sample. A total of 92.3% had access to the internet and 77.1% of these individuals used the internet to search for health information. The most searched topics were symptoms (89.1%) and medications (84.5%). The search tools most used were Google (94.6%) and YouTube (41.7%). Most participants reported ease in using correct words (68.6%) and finding information (70.2%), but difficulty in assessing reliability (44.8%) and applying information to health-related decisions (25.9%). In the adjusted analysis, a higher education, younger age, and higher levels of health literacy were associated with searching for health information online. The use of the internet was widely reported, despite difficulties in assessing reliability and applying the information. The findings underscore the need for accessible online health content of adequate quality.
    DOI:  https://doi.org/10.1590/0102-311XPT102225
  34. Sci Rep. 2026 Feb 28.
      Correctable refractive errors are a major, preventable cause of visual impairment. Refractive surgery is widely promoted online, yet how Turkish-language YouTube videos frame benefits, risks, recovery, and long-term outcomes-and how this framing differs between patient and physician narrators-remains underexplored. We aimed to qualitatively compare patient- and physician-generated Turkish-language YouTube videos on refractive surgery and to describe audience engagement. We conducted a reflexive thematic analysis (Braun & Clarke) of 64 publicly available videos (29 patient, 35 physician) meeting predefined criteria (Turkish; primarily refractive surgery; ≥1 min; ≥1,000 views; ≥240p). Searches were performed on 15 July 2025 using predefined search strings. Videos were transcribed verbatim and inductively coded in NVivo by two researchers. Between-group differences in engagement metrics were assessed with Mann-Whitney U tests, with a one-video-per-channel sensitivity analysis to address potential clustering. Patient narratives foregrounded lived experience (decision-making, perioperative discomfort, postoperative visual fluctuations, and symptoms such as dry eye and glare/halos) and often raised concerns about commercialization. Physician narratives emphasized candidacy assessment, procedure selection, recovery timelines, and risk mitigation. In the one-video-per-channel sensitivity analysis (patient n = 27; physician n = 23), patient videos received more likes (median 300 [IQR 1,639] vs. 59 [269], p = 0.009) and showed a higher like-to-view ratio (0.013 [0.01] vs. 0.006 [0.01], p < 0.001), whereas view counts were not significantly different (24,000 [98,900] vs. 18,000 [48,500], p = 0.224). Turkish-language YouTube narratives share experiential touchpoints but diverge systematically in how risks, commercialization, and expectations are framed by patients versus physicians. Findings support the need for balanced, accurate, and discoverable patient-facing materials tailored to platform dynamics.
    Keywords:  LASIK; Patient narratives; Physician narratives; Qualitative research; Refractive surgery; SMILE; YouTube
    DOI:  https://doi.org/10.1038/s41598-026-41997-z
  35. Cureus. 2026 Jan;18(1): e102610
      Background Intrahepatic cholestasis of pregnancy (ICP) is a high-risk liver disorder complicating pregnancy. Short-video platforms are a major health information source, yet the quality of ICP-related content is unknown. This study aimed to explore the quality, reliability, and dissemination of ICP-related information online. Methods A cross-sectional study was conducted on December 11, 2025. The top 100 videos for the Chinese ICP term from Kwai, Red Notes, and TikTok were screened. Video basic characteristics and engagement metrics were extracted. Quality was assessed using the Global Quality Scale (GQS), a modified Decision-making Information Support Criteria for Evaluating the Reliability of Non-randomised Studies (mDISCERN), Journal of the American Medical Association (JAMA) Benchmark criteria, and a Content Completeness Score (CCS). Non-parametric data were summarized using medians and interquartile ranges (IQRs). Group comparisons were conducted with the Kruskal-Wallis test, and correlations were assessed via Spearman's correlation analysis. Statistical significance was set at p < 0.05, with analyses performed using IBM SPSS 30.0 (IBM Corp., Armonk, NY) and a few online platforms. Results A total of 174 videos were included for systematic analyses. Video quality was moderate (GQS median 3.00) and varied across three platforms. Content completeness was suboptimal (CCS median 5.00). Videos from healthcare professionals scored higher. User engagement metrics (likes, comments) were significantly higher on TikTok (ByteDance Ltd., Beijing, China) but showed negligible or weak correlations with quality scores across all platforms. Conclusion The quality of ICP information on short-video platforms is inconsistent and often incomplete, despite high social engagement. Professional sources are more reliable, but significant informational gaps persist. This highlights a public health need for improved platform governance, professional content creation, and enhanced digital health literacy for pregnant women.
    Keywords:  cross-sectional; global quality score (gqs); intrahepatic cholestasis of pregnancy (icp); social-media; tiktok
    DOI:  https://doi.org/10.7759/cureus.102610
  36. Digit Health. 2026 Jan-Dec;12:12 20552076261430066
       Background: Short-form video platforms have become primary channels for the public to access health information; therefore, their influence in disseminating knowledge about psychosomatic disorders has garnered increasing attention. We aimed to systematically evaluate the quality, presentation formats and emotional narrative characteristics of short videos concerning cardiovascular disease co-occurring with anxiety and depression on the TikTok and Bilibili platforms.
    Methods: A cross-sectional content analysis approach was employed. Popular Chinese-language short videos relevant to the research theme were selected from both platforms. Content quality was assessed using the Global Quality Score, modified DISCERN, and Journal of the American Medical Association frameworks, while the video content was analysed.
    Results: Although certain videos conveyed basic medical knowledge, overall quality proved to be inconsistent. Videos from professional sources scored significantly higher than non-professional accounts, with Bilibili content generally demonstrating greater depth and scientific rigour than TikTok. However, in user engagement metrics, non-professional content outperformed professionally produced material. Most videos lacked a thorough discussion of multifactorial disease causes and individual variations, with some exhibiting excessive simplification.
    Conclusion: Short-form video platforms hold potential for enhancing health awareness; however, significant tension exists between user preferences and scientific rigour. Multi-stakeholder collaboration and technological support are necessary to improve platform content quality and scientific accuracy.
    Keywords:  Cardiovascular disease; anxiety; content quality; depression; short-form videos
    DOI:  https://doi.org/10.1177/20552076261430066
  37. Sci Rep. 2026 Mar 04.
      Amid the rising prevalence of hyperlipidemia in China, the public's growing reliance on unregulated short-video platforms poses a significant risk of misinformation. This study systematically evaluated the quality of 233 hyperlipidemia-related videos across TikTok, Bilibili, and RedNote, platforms selected for their distinct content formats, audience profiles, and information environments. Video quality was assessed using the Global Quality Score (GQS), reliability with modified DISCERN and JAMA benchmarks, and educational value with the Patient Education Materials Assessment Tool (PEMAT); content completeness was also scored. Overall video quality was moderate, with significant disparities observed across platforms and creator types. Bilibili videos offered greater comprehensiveness (higher GQS and content completeness), while TikTok and RedNote content was more understandable. Science communicators produced the highest overall quality videos, whereas physicians excelled in reliability. Critically, user engagement metrics like 'likes' showed virtually no correlation with video quality (|ρ|≤ 0.09). In contrast, content completeness was a strong independent predictor of higher quality (OR 1.59), underscoring its importance over popularity. In conclusion, the quality of hyperlipidemia videos on Chinese platforms is inconsistent, and popular videos are not necessarily reliable. This highlights a critical need for platforms to shift from popularity-based algorithms to ones that prioritize creator expertise and content completeness, and for healthcare professionals and science communicators to collaborate on producing scientifically accurate yet accessible content.
    Keywords:  Digital health; Hyperlipidemia; Information quality; Short-video platforms; Social media
    DOI:  https://doi.org/10.1038/s41598-026-42412-3
  38. Digit Health. 2026 Jan-Dec;12:12 20552076261430074
       Background: Hodgkin lymphoma (HL) is a malignant tumor of the lymphatic system. With the rapid expansion of short video platforms, the public increasingly relies on them for medical information, yet the scientific rigor and reliability of such content remain inconsistent. This study aimed to systematically evaluate the content and quality of HL-related videos on TikTok and Bilibili.
    Methods: In August 2025, videos related to "Hodgkin lymphoma" were searched on TikTok and Bilibili. After applying predefined inclusion and exclusion criteria, video sources, engagement metrics, and content features were collected. Video quality and reliability were assessed using three validated tools: the Journal of the American Medical Association (JAMA) benchmarks, the Global Quality Score (GQS), and the modified DISCERN (mDISCERN) scale. Differences by uploader type, platform, and audience engagement were also analyzed.
    Results: A total of 225 videos were included (155 from TikTok and 70 from Bilibili). TikTok videos were shorter but had significantly higher engagement (P < .05). Most TikTok videos were uploaded by professional doctors and had a higher GQS median score (3.0) than Bilibili (2.0). Both platforms had a median mDISCERN score of 3.0. Videos uploaded by professionals and institutions scored higher on GQS, mDISCERN, and JAMA compared with non-professional users. Bilibili had a larger share of videos from individual users, which were lower in quality and consistency. Across platforms, epidemiology and prevention were rarely covered, and overall content was fragmented. Spearman correlation analysis revealed strong associations among engagement metrics but no significant relationship with quality scores, suggesting that popularity is driven more by presentation and dissemination than by scientific quality.
    Conclusion: TikTok and Bilibili differ significantly in the sources and quality of HL-related videos. Although high-quality short videos can improve public health literacy, their popularity depends more on style and reach than on scientific quality. Platforms should enhance professional certification and content review to promote the standardized dissemination of evidence-based medical knowledge.
    Keywords:  Hodgkin lymphoma, short videos, GQS, DISCERN
    DOI:  https://doi.org/10.1177/20552076261430074
  39. Clin Psychol Eur. 2026 Feb;8(1): e17279
       Background: The increasing popularity of mental-health information on social media platforms such as TikTok is raising concerns regarding misinformation. Previous research is limited to single disorders and videos in the English language only. Our objective was to investigate the quality of mental health information on German-language TikTok for a broader spectrum of disorders.
    Method: Thirty German-language TikTok-videos of each of the six most viewed hashtags on mental disorders (attention-deficit and hyperactivity disorder (ADHD), depression, autism, anxiety disorder, narcissism and post-traumatic stress disorder (PTSD)) were classified regarding authorship and rated either as "correct", "overgeneralized", "incorrect" or "subjective experience". The modified DISCERN (mDISCERN) and the Global Quality Scale (GQS) were used to rate reliability and quality of information for patients.
    Results: The 177 videos finally included in this study gathered a total of 94,348,220 views and 19.2% (n = 34) of the videos were rated as correct, 33.3% (n = 59) as incorrect, 18.1% (n = 32) as overgeneralized and 29.4% (n = 52) as personal experience. Chi-Square tests and Kruskal-Wallis tests showed significant relationships between either authorship or diagnosis and quality and reliability. Videos on PTSD and videos by expert authors showed the best and videos on narcissism and videos by laypeople the worst overall results.
    Conclusion: With around half of the analyzed videos supplying incorrect information, the quality of German-language TikTok mental health content is insufficient. Differences in the quality of content seem to be influenced by the topic and the authorship. Healthcare institutions and clinicians should be aware of this, educate patients accordingly, and could improve the quality of information by participating in online discourses.
    Keywords:  ADHD; ASD; PTSD; TikTok; anxiety; depression; misinformation; narcissism
    DOI:  https://doi.org/10.32872/cpe.17279
  40. Digit Health. 2026 Jan-Dec;12:12 20552076261428037
       Background: Short videos have emerged as a significant medium for disseminating health information. However, misleading content can lead to poor health decisions, undermining national efforts to enhance health knowledge and public health literacy.
    Objective: This study aims to systematically evaluate the quality and reliability of health-related videos on Chinese short-video platforms and to offer insights for the regulation of digital health on a global scale.
    Methods: A comprehensive search was conducted across the China Biology Medicine Database, PubMed, Wanfang, China National Knowledge Infrastructure, and VIP databases for articles published between January 2021 and December 2024. Twenty-five articles meeting the inclusion criteria were included and analyzed to evaluate the quality and credibility of health-related short videos on Chinese platforms, as well as the correlation between the type of creator and video quality.
    Results: Health-related videos created by healthcare professionals or institutions demonstrated higher reliability and accuracy. The DISCERN was the most commonly used tool in the evaluation of these videos. Overall, video quality was generally substandard, with the prevalence of inaccurate information ranging from 10.1% to 100% across various health topics.
    Conclusions: This review identified substantial deficiencies in the accuracy of health information disseminated through Chinese short-video platforms. The presence of low-quality content has negative impact on public health decision-making. These findings align with evaluation of health-related videos on international platforms, such as YouTube and Instagram. Therefore, it is imperative to adopt comprehensive strategies, including content moderation, creator verification, and responsible algorithm management to improve video quality and ensure the public's access to reliable digital health information.
    Keywords:  Chinese social media; health information; reliability evaluation; short videos
    DOI:  https://doi.org/10.1177/20552076261428037
  41. Front Public Health. 2026 ;14 1668189
       Background: With the proliferation of short video platforms such as BiliBili and TikTok, public reliance on these platforms for medical information has increased substantially. However, the absence of standardized content regulation raises serious concerns about misinformation, oversimplification, and variable quality in health communication. Radiotherapy, a cornerstone of cancer treatment alongside surgery and chemotherapy, is particularly vulnerable to such information quality issues due to its technical complexity and limited public understanding. This necessitates systematic evaluation of the scientific accuracy and reliability of radiotherapy content on these platforms.
    Methods: In this cross-sectional study, the top 100 Chinese-language videos related to "" (radiotherapy) were collected from BiliBili and TikTok (total n = 200). Video quality and reliability were assessed via the Global Quality Score (GQS) and modified DISCERN tools. Nonparametric tests and Spearman correlation analyses were applied. Two independent radiotherapy specialists evaluated the content, with a third resolving discrepancy.
    Results: Overall video quality and reliability were moderate (median GQS = 3; DISCERN = 3). BiliBili demonstrated higher DISCERN scores (p < 0.05), reflecting superior reliability, whereas TikTok had marginally higher GQS scores. The BiliBili videos were significantly longer (median: 1,391.5 s vs. 98 s) and featured more systematic content, whereas the TikTok videos presented greater engagement (e.g., likes, shares, collects, comments). A positive correlation between video duration and DISCERN score was observed for BiliBili (R = 0.47, p < 0.0001), whereas TikTok showed a similar trend for GQS (R = 0.49, p < 0.0001). There was a lack of significant associations between interaction metrics and quality scores.
    Conclusion: This study evaluated 200 radiotherapy videos on BiliBili and TikTok. BiliBili showed higher reliability (DISCERN), whereas TikTok excelled in terms of user engagement. Recommendations include optimizing scientific communication, platform quality-based algorithms prioritizing authoritative content, and enhancing public media literacy. The findings can guide improvements in digital medical education.
    Keywords:  content reliability; health communication; information quality; radiotherapy; short video platforms
    DOI:  https://doi.org/10.3389/fpubh.2026.1668189
  42. JMIR Form Res. 2026 Mar 02. 10 e71584
       BACKGROUND: The internet is increasingly used to find health information, which often contains misinformation. Instagram is a likely source of health information online for many adults worldwide, given that there are more than 2 billion worldwide users. To date, no studies have documented the characteristics of hepatitis B virus (HBV) claims, information accuracy, engagement with, and profitability of HBV information on Instagram.
    OBJECTIVE: We aimed to document the characteristics, accuracy, engagement, and profitability of HBV misinformation on Instagram.
    METHODS: In this cross-sectional formative study, 2 research members searched for publicly available Instagram posts using the terms "hepatitis b" and "hep b" and manually extracted data from the most popular posts and user profiles for each term from December 2021 to January 2022 at varying times of the day and days of the week. We applied an existing and validated health misinformation codebook, adapted for this topic, to 103 posts for 58 variables, including post characteristics, types of HBV claims (eg, treatment, prevention, and cure), accuracy of information (misinformation vs accurate, coded by hepatology clinicians), engagement (number of likes), and profitability (yes or no). We calculated descriptive statistics and applied chi-square, Fisher exact, and z tests to compare posts with certain characteristics, claims, and engagement by accuracy and profitability in Stata (version 18.0) with significance set at an α of .05.
    RESULTS: Of the full sample, most posts had accurate (79/103, 76.7%) versus inaccurate (24/103, 23.3%) information about HBV. Among posts with claims about HBV treatment (18/103, 17.5%), there were more posts that had misinformation than accurate posts (55.6% vs 44.4%; χ²1=12.7; P<.001). Similarly, there were higher proportions of posts with misinformation compared to posts with accurate information about cures (n=12, 75% vs 25%; Fisher P<.001), natural remedies (n=13, 92.3% vs 7.7%; Fisher P<.001), symptoms (n=15, 60% vs 40%; χ²1=13.2; P<.001), and censorship conspiracies (n=9, 66.7% vs 33.3%; Fisher P=.005) related to HBV. Compared to posts with accurate information, posts with misinformation had more likes on average (mean 1459.2, SD 1458.8-1459.6 vs mean 941.8, SD 941.6-942.0; z=-517.4; P<.001). Significantly more posts with misinformation were for profit (39.5% vs 13.8%; χ²1=8.8; P=.003) than accurate posts.
    CONCLUSIONS: HBV misinformation had more engagement than accurate information on Instagram and was more likely to be for-profit than accurate information. HBV misinformation may spread more easily than accurate information, meaning people searching for HBV on Instagram may encounter false, profit-driven claims that could affect health behaviors. Our focus on visual social media misinformation is innovative, as is our use of Instagram, an understudied platform. More research is needed to estimate the prevalence of HBV misinformation and its influence on health beliefs, behaviors, and outcomes. Improving media literacy may help reduce the influence of HBV misinformation online.
    Keywords:  Instagram; hepatitis B; misinformation; social media; vaccine
    DOI:  https://doi.org/10.2196/71584
  43. Front Psychol. 2026 ;17 1722369
       Background: Adolescents are increasingly exposed to online health information promoting supplements as quick solutions for weight management and muscle development, raising public health concerns.
    Objective: This study investigated whether online media engagement-weight-and fitness-related information-seeking behaviors and the internalization of online appearance ideals-predicts adolescents' online purchasing of weight-loss and muscle-building products, with attention to differences between girls and boys.
    Methods: A cross-sectional online survey was conducted among 1,526 Czech adolescents (50% girls) aged 13-18 years (M = 15.4, SD = 1.7). Measures included self-reported online weight- and fitness-related information-seeking behaviors, internalization of online appearance ideals (thin-ideal among girls; muscular ideal among boys), and online purchasing of weight-loss and muscle-building products. Hierarchical regression analyses were conducted separately for girls and boys, controlling for sociodemographic and body image variables.
    Results: Overall, 4.1% (63/1,521) purchased weight-loss and 19.1% (303/1,524) purchased muscle-building products online. For both girls and boys, greater engagement in seeking weight- and fitness-related information was associated with purchasing both product types. Internalization of online appearance ideals was significantly associated with muscle-building but not weight-loss product purchases.
    Conclusion: Findings highlight a novel pathway through which online media may shape youth consumer choices, pointing to the need for media literacy and body image-informed strategies to promote safer decision-making.
    Keywords:  adolescents; appearance ideals; dietary supplements; digital marketing; online information seeking; online media
    DOI:  https://doi.org/10.3389/fpsyg.2026.1722369
  44. Digit Health. 2025 Jan-Dec;11:11 20552076251355195
       Background: Access to health information on the internet has increased significantly, influencing self-care decisions and the use of medications without a prescription.
    Objective: This study aimed to identify the factors associated with the use of online health information and self-medication in a Peruvian sample.
    Method: A cross-sectional study was conducted with 493 Peruvian adults selected through nonprobabilistic convenience sampling. An online questionnaire collected data on sociodemographic characteristics, subjective health status, use of online health information, internet competence, and self-medication. Analyses included correlations, Student's t-tests, one-way analysis of variance, and multiple linear regression. A p-value < 0.05 was considered statistically significant.
    Results: Among participants, 62.5% reported self-medication and 74.2% reported using the internet to search for health information. Use of online health information was significantly associated with self-medication. Predictors of self-medication included being a woman, living in the jungle region, rural residence, current illness, poor perceived health, and higher internet competence (F = 13.536, p < 0.001; R² = 0.189). Significant predictors were internet competence (β = 0.23, p < 0.001), female sex (β = 0.14, p = 0.002), and poor perceived health (β = 0.13, p = 0.003). In a separate model, internet use for health information was associated with younger age, living in the jungle region, and higher internet competence (F = 5.734, p < 0.001; adjusted R² = 0.071), with internet competence (β = 0.18, p < 0.001) and age (β = -0.15, p = 0.002) being the most relevant factors.
    Conclusion: Online health information use is associated with self-medication among Peruvian adults. Internet competence emerged as a key factor for both behaviors.
    Keywords:  Peru; Self-medication; attitudes; health information-seeking behavior; health knowledge; online health information; practice; self-care
    DOI:  https://doi.org/10.1177/20552076251355195
  45. Online J Public Health Inform. 2026 Feb 27. 18 e83642
       Background: Older adults often access traditional media, such as newspapers, magazines, television, and radio, for health information. However, compared with older adults without frailty, older adults with frailty experience greater declines in physical functions and mental health (including depressive symptoms), as well as social functioning, due to reduced interaction with others, which limits their access to these sources of information.
    Objective: This study aimed to identify the health information sources that are less accessible to participants with frailty than to those without frailty.
    Methods: A cross-sectional web-based survey was conducted among independent Japanese adults aged ≥75 years. We assessed frailty using the Questionnaire of Medical Checkup for Old-Old, with a score of ≥4 indicating frailty. Participants were asked whether they had accessed any health information source in the past year, including medical institutions, family members, friends or acquaintances, neighbors, government agencies, long-term care or welfare services, television, radio, the internet, magazines, newspapers, or books. The primary explanatory variable was frailty status. Covariates included age, sex, income, education, living arrangements, and health literacy, measured using the eHealth Literacy Scale.
    Results: In total, 1032 participants (n=518, 50.2% male; median age: 77 y) were analyzed. Multivariable logistic regression analysis revealed that participants with frailty had significantly less access to the following sources of information compared to individuals without frailty: family (odds ratio [OR] 0.69, 95% CI 0.50-0.95), friends/acquaintances (OR 0.70, 95% CI 0.51-0.98), radio (OR 0.50, 95% CI 0.31-0.79), and newspapers (OR 0.66, 95% CI 0.50-0.88). Sex-based subgroup analyses revealed no significant interaction effects, indicating no heterogeneity in the findings.
    Conclusions: Older adults with frailty were less likely to obtain health information from interpersonal and traditional media sources than did individuals without frailty. Health information providers need to devise strategies for delivering accurate information and improving usability to enable older adults with frailty to proactively access diverse health information.
    Keywords:  frailty; health information sources; health literacy; internet; web-based survey
    DOI:  https://doi.org/10.2196/83642
  46. Front Psychol. 2026 ;17 1750050
       Introduction: Intimate partner violence (IPV) affects around one in four women globally, posing substantial health risks. IPV survivors often consult online health communities for anonymous assistance rather than formal services. Though online health community members frequently share websites to answer questions, no studies have investigated the characteristics and relevance of websites shared in IPV online health communities. This study aims to identify the categories of websites shared in IPV online health communities and evaluate associations between post characteristics and website relevance to survivors' help-seeking needs.
    Methods: Data were extracted from posts and comments on the r/domesticviolence, a subreddit (topic-specific community) dedicated to domestic violence support on Reddit (a social media platform), from November 2020 to November 2021. We included English-language posts seeking advice, written by adult women with IPV experiences, with at least one website shared in the comments. Website links were annotated by topics and categorized as relevant or irrelevant to help requested by original posters. Posts were annotated for characteristics including post length, mentions of "red flags" for lethality, and specific versus general help requests. Chi-square and t-test were used to determine association between post characteristics and websites' relevance.
    Results: A total of 170 website links were categorized into eight themes, with General IPV Resources and Support (32.4%) and Understanding IPV (28.2%) being the most common. Approximately 75.3% of the websites were relevant to the types of help sought by original posters. Post characteristics showed no significant association with the relevance of the websites.
    Conclusion: This study sheds light on the types of websites shared within IPV online health communities and informs IPV agencies and clinicians about the addressed and unaddressed needs of women IPV survivors seeking help online. These findings could help optimize the design of online health community platforms, including digital tools that automatically suggest relevant websites to IPV survivors.
    Keywords:  help-seeking behaviors; intimate partner violence; medical informatics; online health communities; violence against women; website-sharing
    DOI:  https://doi.org/10.3389/fpsyg.2026.1750050
  47. OTO Open. 2026 Jan-Mar;10(1):10(1): e70204
       Objective: Chatbots powered by large language models (LLMs) have recently emerged as prominent sources of information. However, their ability to propagate misinformation as well as information, particularly in specialized fields like audiology and otolaryngology, remains underexplored. This study aimed to evaluate the accuracy of 6 popular chatbots-ChatGPT, Gemini, Claude, DeepSeek, Grok, and Mistral-in response to questions framed around a range of unproven methods in audiological and otolaryngological care.
    Study Design: Cross-sectional study.
    Setting: A set of 50 questions was developed based on common conversations between patients and clinicians. We then posed these questions to the chatbots.
    Methods: We tested each chatbot 10 times to account for variable responses, producing a total of 3000 responses. The responses were compared with correct answers based on the general opinion of 11 professionals. The consistency of the responses was evaluated by Cohen's Kappa.
    Results: Most chatbot responses to the majority of questions were deemed accurate. Grok consistently performed best, where its answers aligned perfectly with the opinions of the experts. Deepseek exhibited the lowest accuracy, scoring 95.8%. Mistral exhibited the lowest consistency, scoring 0.96.
    Conclusions: Although all the chatbots generally avoided endorsing unproven methods, some responses deviated from the expert consensus and could therefore be said to contribute to the spread of misinformation. The best performer among the group was Grok, which provided consistently accurate responses, showing it has the potential for clinical and educational use within audiology and otolaryngology.
    Keywords:  ChatGPT; Claude; DeepSeek; Gemini; Grok; audiology; misinformation; mistral; otolaryngology
    DOI:  https://doi.org/10.1002/oto2.70204