bims-librar Biomed News
on Biomedical librarianship
Issue of 2025–02–23
forty-five papers selected by
Thomas Krichel, Open Library Society



  1. J Med Libr Assoc. 2025 Jan 14. 113(1): 88-89
      Health sciences and hospital libraries often face challenges in planning and organizing events due to limited resources and staff. At Stanford School of Medicine's Lane Library, librarians turned to artificial intelligence (AI) tools to address this issue and successfully manage various events, from small workshops to larger, more complex conferences. This article presents a case study on how to effectively integrate generative AI tools into the event planning process, improving efficiency and freeing staff to focus on higher-level tasks.
    Keywords:  Artificial Intelligence; event planning; medical library
    DOI:  https://doi.org/10.5195/jmla.2025.2087
  2. MedEdPORTAL. 2025 ;21 11496
       Introduction: Medical students may arrive at medical school with some research background but not necessarily evidence-based medicine (EBM) skills. First-year preclinical medical students require foundational skills for EBM (formulating background and foreground questions, navigating information sources, and conducting database searches) before critically appraising evidence and applying it to clinical scenarios.
    Methods: We developed a flipped classroom EBM workshop for preclinical students combining prework modules and a 60-minute in-person session. After completing the online modules on foundational EBM skills, students participated in an in-person activity based on patient cases. In small groups, students formulated background and foreground questions based on a case and looked for evidence in resources assigned to each group. Small groups reported back to the whole group how they searched for information for their patient cases. A total of 105 first-year medical students were required to complete this workshop after concluding their basic sciences courses.
    Results: Because current EBM assessment tools do not assess the early steps of EBM, we developed an assessment tool for foundational EBM tools. Before the modules, students completed a pretest on formulating questions and searching for information. After the workshop, students completed a posttest. Students showed improvement in differentiating background and foreground questions (p < .001), formulating answerable clinical questions (p < .001), and developing appropriate database searches (p < .001 and p = .002).
    Discussion: This flipped classroom approach to teaching foundational EBM skills may be adapted for different contexts, but educators should consider time limitations, group size, and tools for interactivity.
    Keywords:  Evidence-Based Medicine; Flipped Classroom; Information Sources; Program Evaluation
    DOI:  https://doi.org/10.15766/mep_2374-8265.11496
  3. J Med Libr Assoc. 2025 Jan 14. 113(1): 9-23
       Objective: A scoping review was undertaken to understand the extent of literature on librarian involvement in competency-based medical education (CBME).
    Methods: We followed Joanna Briggs Institute methodology and PRISMA-ScR reporting guidelines. A search of peer-reviewed literature was conducted on December 31, 2022, in Medline, Embase, ERIC, CINAHL Complete, SCOPUS, LISS, LLIS, and LISTA. Studies were included if they described librarian involvement in the planning, delivery, or assessment of CBME in an LCME-accredited medical school and were published in English. Outcomes included characteristics of the inventions (duration, librarian role, content covered) and of the outcomes and measures (level on Kirkpatrick Model of Training Evaluation, direction of findings, measure used).
    Results: Fifty studies were included of 11,051 screened: 46 empirical studies or program evaluations and four literature reviews. Studies were published in eight journals with two-thirds published after 2010. Duration of the intervention ranged from 30 minutes to a semester long. Librarians served as collaborators, leaders, curriculum designers, and evaluators. Studies primarily covered asking clinical questions and finding information and most often assessed reaction or learning outcomes.
    Conclusions: A solid base of literature on librarian involvement in CBME exists; however, few studies measure user behavior or use validated outcomes measures. When librarians are communicating their value to stakeholders, having evidence for the contributions of librarians is essential. Existing publications may not capture the extent of work done in this area. Additional research is needed to quantify the impact of librarian involvement in competency-based medical education.
    Keywords:  CBME; Competency-Based Education; EBM; Evidence-Based Medicine; Instruction; Problem-based learning; case-based learning; curriculum; education; entrustable professional activities; learning; librarians; libraries; lifelong learning; self-regulated learning; training; undergraduate medical education
    DOI:  https://doi.org/10.5195/jmla.2025.1965
  4. J Med Libr Assoc. 2025 Jan 14. 113(1): 92-93
      This project investigated the potential of generative AI models in aiding health sciences librarians with collection development. Researchers at Chapman University's Harry and Diane Rinker Health Science campus evaluated four generative AI models-ChatGPT 4.0, Google Gemini, Perplexity, and Microsoft Copilot-over six months starting in March 2024. Two prompts were used: one to generate recent eBook titles in specific health sciences fields and another to identify subject gaps in the existing collection. The first prompt revealed inconsistencies across models, with Copilot and Perplexity providing sources but also inaccuracies. The second prompt yielded more useful results, with all models offering helpful analysis and accurate Library of Congress call numbers. The findings suggest that Large Language Models (LLMs) are not yet reliable as primary tools for collection development due to inaccuracies and hallucinations. However, they can serve as supplementary tools for analyzing subject coverage and identifying gaps in health sciences collections.
    Keywords:  ChatGPT; Generative artificial intelligence; Google Gemini; Microsoft Copilot; Perplexity; collection assessment; collection development; health sciences libraries; large language models
    DOI:  https://doi.org/10.5195/jmla.2025.2079
  5. J Med Libr Assoc. 2025 Jan 14. 113(1): 31-38
       Objective: Sexual and gender minority (SGM) populations experience health disparities compared to heterosexual and cisgender populations. The development of accurate, comprehensive sexual orientation and gender identity (SOGI) measures is fundamental to quantify and address SGM disparities, which first requires identifying SOGI-related research. As part of a larger project reviewing and synthesizing how SOGI has been assessed within the health literature, we provide an example of the application of automated tools for systematic reviews to the area of SOGI measurement.
    Methods: In collaboration with research librarians, a three-phase approach was used to prioritize screening for a set of 11,441 SOGI measurement studies published since 2012. In Phase 1, search results were stratified into two groups (title with vs. without measurement-related terms); titles with measurement-related terms were manually screened. In Phase 2, supervised clustering using DoCTER software was used to sort the remaining studies based on relevance. In Phase 3, supervised machine learning using DoCTER was used to further identify which studies deemed low relevance in Phase 2 should be prioritized for manual screening.
    Results: 1,607 studies were identified in Phase 1. Across Phases 2 and 3, the research team excluded 5,056 of the remaining 9,834 studies using DoCTER. In manual review, the percentage of relevant studies in results screened manually was low, ranging from 0.1 to 7.8 percent.
    Conclusions: Automated tools used in collaboration with research librarians have the potential to save hundreds of hours of human labor in large-scale systematic reviews of SGM health research.
    Keywords:  Automation; Health; Methods; Sexual and Gender Minorities; Systematic Review
    DOI:  https://doi.org/10.5195/jmla.2025.1860
  6. J Med Libr Assoc. 2025 Jan 14. 113(1): 39-48
       Objective: To evaluate the appropriateness of indexing of algorithmically-indexed MEDLINE records.
    Methods: We assessed the conceptual appropriateness of Medical Subject Headings (MeSH) used to index a sample of MEDLINE records from February and March 2023. Indexing was performed by the Medical Text Indexer-Auto (MTIA) algorithm. The primary outcome measure is the number of records for which the MTIA algorithm assigned subject headings that represented the main concepts of the publication.
    Results: Fifty-three percent of screened records had indexing that represented the main concepts discussed in the article; 47% had inadequacies in the indexing which could impact their retrieval. Three main issues with algorithmically-indexed records were identified: 1) inappropriate MeSH assigned due to acronyms, evocative language, exclusions of populations, or related records; 2) concepts represented by more general MeSH while a more precise MeSH is available; and 3) a significant concept not represented in the indexing at all. We also noted records with inappropriate combinations of headings and subheadings, even when the headings and subheadings on their own were appropriate.
    Conclusions: The indexing performed by the February-March 2023 calibration of the MTIA algorithm, as well as older calibrations, frequently applied irrelevant or imprecise terms to publications while neglecting to apply relevant terms. As a consequence, relevant publications may be omitted from search results and irrelevant ones may be retrieved. Evaluations and revisions of indexing algorithms should strive to ensure that relevant, accurate and precise MeSH terms are applied to MEDLINE records.
    Keywords:  Abstracting and Indexing; Algorithms; Database Searches; Information Storage; MEDLINE; MeSH; Medical Subject Headings; PubMed; Retrieval; Search Strategies
    DOI:  https://doi.org/10.5195/jmla.2025.1936
  7. J Med Libr Assoc. 2025 Jan 14. 113(1): 49-57
       Objective: This study investigates the effectiveness of bibliographic databases to retrieve qualitative studies for use in systematic and rapid reviews in Health Technology Assessment (HTA) research. Qualitative research is becoming more prevalent in reviews and health technology assessment, but standardized search methodologies-particularly regarding database selection-are still in development.
    Methods: To determine how commonly used databases (MEDLINE, CINAHL, PsycINFO, Scopus, and Web of Science) perform, a comprehensive list of relevant journal titles was compiled using InCites Journal Citation Reports and validated by qualitative researchers at Canada's Drug Agency (formerly CADTH). This list was used to evaluate the qualitative holdings of each database, by calculating the percentage of total titles held in each database, as well as the number of unique titles per database.
    Results: While publications on qualitative search methodology generally recommend subject-specific health databases including MEDLINE, CINAHL, and PsycINFO, this study found that multidisciplinary citation indexes Scopus and Web of Science Core Collection not only had the highest percentages of total titles held, but also a higher number of unique titles.
    Conclusions: These indexes have potential utility in qualitative search strategies, if only for supplementing other database searches with unique records. This potential was investigated via tests on qualitative rapid review search strategies translated to Scopus to determine how the index may contribute relevant literature.
    Keywords:  Database selection; Evidence synthesis; Informative retrieval; Qualitative research
    DOI:  https://doi.org/10.5195/jmla.2025.1591
  8. J Med Libr Assoc. 2025 Jan 14. 113(1): 58-64
       Objective: Use of the search filter 'exp animals/not humans.sh' is a well-established method in evidence synthesis to exclude non-human studies. However, the shift to automated indexing of Medline records has raised concerns about the use of subject-heading-based search techniques. We sought to determine how often this string inappropriately excludes human studies among automated as compared to manually indexed records in Ovid Medline.
    Methods: We searched Ovid Medline for studies published in 2021 and 2022 using the Cochrane Highly Sensitive Search Strategy for randomized trials. We identified all results excluded by the non-human-studies filter. Records were divided into sets based on indexing method: automated, curated, or manual. Each set was screened to identify human studies.
    Results: Human studies were incorrectly excluded in all three conditions, but automated indexing inappropriately excluded human studies at nearly double the rate as manual indexing. In looking specifically at human clinical randomized controlled trials (RCTs), the rate of inappropriate exclusion of automated-indexing records was seven times that of manually-indexed records.
    Conclusions: Given our findings, searchers are advised to carefully review the effect of the 'exp animals/not humans.sh' search filter on their search results, pending improvements to the automated indexing process.
    Keywords:  Abstract and Indexing; Automated Indexing; Evidence Synthesis; Medical Subject Headings (MeSH)
    DOI:  https://doi.org/10.5195/jmla.2025.1972
  9. Healthc Inform Res. 2025 Jan;31(1): 48-56
       OBJECTIVES: The objective of this study was to develop the weightage identified network of keywords (WINK) technique for selecting and utilizing keywords to perform systematic reviews more efficiently. This technique aims to improve the thoroughness and precision of evidence synthesis by employing a more rigorous approach to keyword selection.
    METHODS: The WINK methodology involves generating network visualization charts to analyze the interconnections among keywords within a specific domain. This process integrates both computational analysis and subject expert insights to enhance the accuracy and relevance of the findings. In the example considered, the networking strength between the contexts of environmental pollutants with endocrine function as Q1 and systemic health with oral health-related terms as Q2 was examined, and keywords with limited networking strength were excluded. Utilizing the Medical Subject Headings (MeSH) terms identified from the WINK technique, a search string was built and compared to an initial search with fewer keywords.
    RESULTS: The application of the WINK technique in building the search string yielded 69.81% and 26.23% more articles for Q1 and Q2, respectively, compared to conventional approaches. This significant increase demonstrates the technique's effectiveness in identifying relevant studies and ensuring comprehensive evidence synthesis.
    CONCLUSIONS: By prioritizing keywords with higher weightage and utilizing network visualization charts, the WINK technique ensures comprehensive evidence synthesis and enhances accuracy in systematic reviews. Its effectiveness in identifying relevant studies marks a significant advancement in systematic review methodology, offering a more robust and efficient approach to keyword selection.
    Keywords:  Bibliometrics; Classification; Data Mining; Medical Subject Headings; Search Engine
    DOI:  https://doi.org/10.4258/hir.2025.31.1.48
  10. BMC Neurol. 2025 Feb 19. 25(1): 69
       OBJECTIVE: To evaluate the potential of two large language models (LLMs), GPT-4 (OpenAI) and PaLM2 (Google), in automating migraine literature analysis by conducting sentiment analysis of migraine medications in clinical trial abstracts.
    BACKGROUND: Migraine affects over one billion individuals worldwide, significantly impacting their quality of life. A vast amount of scientific literature on novel migraine therapeutics continues to emerge, but an efficient method by which to perform ongoing analysis and integration of this information poses a challenge.
    METHODS: "Sentiment analysis" is a data science technique used to ascertain whether a text has positive, negative, or neutral emotional tone. Migraine medication names were extracted from lists of licensed biological products from the FDA, and relevant abstracts were identified using the MeSH term "migraine disorders" on PubMed and filtered for clinical trials. Standardized prompts were provided to the APIs of both GPT-4 and PaLM2 to request an article sentiment as to the efficacy of each medication found in the abstract text. The resulting sentiment outputs were classified using both a binary and a distribution-based model to determine the efficacy of a given medication.
    RESULTS: In both the binary and distribution-based models, the most favorable migraine medications identified by GPT-4 and PaLM2 aligned with evidence-based guidelines for migraine treatment.
    CONCLUSIONS: LLMs have potential as complementary tools in migraine literature analysis. Despite some inconsistencies in output and methodological limitations, the results highlight the utility of LLMs in enhancing the efficiency of literature review through sentiment analysis.
    Keywords:  Artificial intelligence; Headaches; Large language model; Literature review; Migraine
    DOI:  https://doi.org/10.1186/s12883-025-04071-1
  11. J Med Libr Assoc. 2025 Jan 14. 113(1): 85
      Beginning in 2012, the Virtual Projects section of the Journal of the Medical Library Association has provided an opportunity for library leaders and technology experts to share with others how new technologies are being adopted by health sciences libraries. From educational purposes to online tools that enhance library services or access to resources, the Virtual Projects section brings technology use examples to the forefront. The new publication issue for future Virtual Projects sections will be January and the call for submissions and Virtual Projects deadline will now take place in June and July.
    DOI:  https://doi.org/10.5195/jmla.2025.2102
  12. J Med Libr Assoc. 2025 Jan 14. 113(1): 24-30
       Objective: This research project sought to identify those subject areas that leaders and researcher members of the Medical Library Association (MLA) determined to be of greatest importance for research investigation. It updates two previous studies conducted in 2008 and 2011.
    Methods: The project involved a three-step Delphi process aimed at collecting the most important and researchable questions facing the health sciences librarianship profession. First, 495 MLA leaders were asked to submit questions answerable by known research methods. Submitted questions could not exceed 50 words in length. There were 130 viable, unique questions submitted by MLA leaders. Second, the authors asked 200 eligible MLA-member researchers to select the five (5) most important and answerable questions from the list of 130 questions. Third, the same 130 MLA leaders who initially submitted questions were asked to select their top five (5) most important and answerable questions from the 36 top-ranked questions identified by the researchers.
    Results: The final 15 questions resulting from the three phases of the study will serve as the next priorities of the MLA Research Agenda. The authors will be facilitating the organization of teams of volunteers wishing to conduct research studies related to these identified top 15 research questions.
    Conclusion: The new 2024 MLA Research Agenda will enable the health information professions to allocate scarce resources toward high-yield research studies. The Agenda could be used by journal editors and annual meeting organizers to prioritize submissions for research communications. The Agenda will provide aspiring researchers with some starting points and justification for pursuing research projects on these questions.
    Keywords:  Artificial Intelligence (AI); Consensus; Delphi Method; Evidence Based Practice; Impact; Leadership; Question Formulation; Research; Research Agenda
    DOI:  https://doi.org/10.5195/jmla.2025.1955
  13. J Med Libr Assoc. 2025 Jan 14. 113(1): 1-3
      In the April 2019 issue (Vol. 106 No. 3), the Journal of the Medical Library Association (JMLA) debuted its Case Report publication category. In the years following this decision, the Case Reports category has grown into an integral component of JMLA. In this editorial, the JMLA Editorial Team highlights the value of case reports and outlines strategies authors can use to draft impactful manuscripts for this category.
    DOI:  https://doi.org/10.5195/jmla.2025.2099
  14. J Med Libr Assoc. 2025 Jan 14. 113(1): 96-97
      A librarian used a large language model (LLM) to revise a dentistry subject LibGuide. Prompts were used to identify methods for optimizing navigational structure for usability, highlight library-specific information students need additional help with, and write summaries of page content. Post-revision, LibGuide access increased, and students provided anecdotal feedback that they perceive the changes positively. LLMs may enhance LibGuide discoverability and usability without adding significant time and resource burdens for librarians.
    Keywords:  Artificial Intelligence (AI); Generative AI; Large Language Models; LibGuides
    DOI:  https://doi.org/10.5195/jmla.2025.2084
  15. J Med Libr Assoc. 2025 Jan 14. 113(1): 98-100
      Given the key role of systematic reviews in informing clinical decision making and guidelines, it is important for individuals to have equitable access to quality instructional materials on how to design, conduct, report, and evaluate systematic reviews. In response to this need, Vanderbilt University Medical Center's Center for Knowledge Management (CKM) created an open-access systematic review instructional video series. The educational content was created by experienced CKM information scientists, who worked together to adapt an internal training series that they had developed into a format that could be widely shared with the public. Brief videos, averaging 10 minutes in length, were created addressing essential concepts related to systematic reviews, including distinguishing between literature review types, understanding reasons for conducting a systematic review, designing a systematic review protocol, steps in conducting a systematic review, web-based tools to aid with the systematic review process, publishing a systematic review, and critically evaluating systematic reviews. Quiz questions were developed for each instructional video to allow learners to check their understanding of the material. The systematic review instructional video series launched on CKM's Scholarly Publishing Information Hub (SPI-Hub™) website in Fall 2023. From January through August 2024, there were 1,662 international accesses to the SPI-Hub™ systematic review website, representing 41 countries. Initial feedback, while primarily anecdotal, has been positive. By adapting its internal systematic review training into an online video series format suitable for asynchronous instruction, CKM has been able to widely disseminate its educational materials.
    Keywords:  Asynchronous learning; Systematic Reviews as Topic; online learning
    DOI:  https://doi.org/10.5195/jmla.2025.2078
  16. J Med Libr Assoc. 2025 Jan 14. 113(1): 94-95
      The Ascension Nurse Author Index is an example of how resource-limited clinical libraries can provide value to their organization by creating a database of peer-reviewed journal article publications authored by their nursing associates. In 2024, Ascension launched a database index to highlight its nurse authors, bring attention to subject matter expertise, foster collaboration among authors, and recognize impact within the profession. The index uses an open access platform, software intended for reference management with a public-facing cloud option, to minimize expenses. This unconventional use of the platform allowed us to capitalize on the software's bibliographic database management capabilities while allowing us to input institutional-specific metadata. By creative use of the open-access platform, librarians can successfully partner to create value for their organization by highlighting the work of its nurses.
    Keywords:  Authorship; Bibliographic Management Software; Clinical Librarians; Collaboration; Hospital Librarians; Nurses; Organizational Value
    DOI:  https://doi.org/10.5195/jmla.2025.2086
  17. J Med Libr Assoc. 2025 Jan 14. 113(1): 90-91
      Prompted by increasing requests for assistance with research evaluation from faculty researchers and university leadership, faculty librarians at the University of Tennessee Health Science Center (UTHSC) launched an innovative Research Impact Challenge in 2023. This Challenge was inspired by the University of Michigan's model and tailored to the needs of health sciences researchers. This asynchronous event aimed to empower early-career researchers and faculty seeking promotion and tenure by enhancing their online scholarly presence and understanding of how scholarship is tracked and evaluated. A team of diverse experts crafted an engaging learning experience through the strategic use of technology and design. Scribe slideshows and videos offered dynamic instruction, while written content and worksheets facilitated engagement and reflection. The Research Impact Challenge LibGuide, expertly designed with HTML and CSS, served as the central platform, ensuring intuitive navigation and easy access to resources (https://libguides.uthsc.edu/impactchallenge). User interface design prioritized simplicity and accessibility, accommodating diverse learning preferences and technical skills. This innovative project addressed common challenges faced by researchers and demonstrated the impactful use of technology in creating an adaptable and inclusive educational experience. The Research Impact Challenge exemplifies how academic libraries can harness technology to foster scholarly growth and support research impact in the health sciences.
    Keywords:  Education Technology; Health Science Libraries; LibGuides; Library Instructions; Outreach; Research Data Management; Research Metrics
    DOI:  https://doi.org/10.5195/jmla.2025.2085
  18. J Med Libr Assoc. 2025 Jan 14. 113(1): 78-84
       Background: Librarians have relied on resource lists for developing nursing collections, but these lists are usually in static or subscription-based formats. An example of this is the 26th edition of the Essential Nursing Resources last published in 2012. The Nursing and Allied Health Resources and Services (NAHRS) Caucus Nursing Essential Resources List (NNERL) Task Force has been working on a new list since Fall 2020. The goal of the Task Force is to create a nursing resource list that represents current materials and formats, uses a selection process that is transparent and reproducible, and will be available to a broad audience.
    Case Presentation: Working from the Essential Nursing Resources 26th edition, the NNERL Task Force updated the purpose statement then began reviewing the resources on the list. Two working groups were formed: 1) an evaluation rubric working group developed a tool to evaluate the resources and 2) a tagging work group developed guidelines for creating metadata and "tags." Volunteers were recruited from the NAHRS Caucus to tag the resources. Lastly, the Task Force finalized the list of resources in the NNERL then cleaned and reconciled the data.
    Conclusions: The final version of the NNERL will be published in Airtable, a cloud-based project management product, that will include metadata for every item on the list. The NNERL will be copyrighted to the NAHRS NNERL Task Force and made available through the Open Science Framework (OSF) under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons License.
    Keywords:  Case Reports; Libraries; Library Collection Development; Nursing; Nursing and Allied Health Resources and Services (NAHRS)
    DOI:  https://doi.org/10.5195/jmla.2025.1964
  19. Med Ref Serv Q. 2025 Feb 17. 1-13
      As an engaging and understandable visual medium, comics can facilitate discussions around difficult topics, including aging and death, and be a useful educational tool for medical students. To achieve this end, a geriatrics clerkship program director implemented a health humanities curriculum that included a partnership with the health science library. The resulting book club gave medical students a place to discuss the clerkship and helped them draw connections between their experiences and a graphic memoir on the perspective of a caregiver to elderly parents. The librarian's background using comics for instruction and the director's expertise in geriatric medicine created an innovative new educational method.
    Keywords:  Geriatric medicine; graphic medicine; medical students
    DOI:  https://doi.org/10.1080/02763869.2025.2463891
  20. Mo Med. 2025 Jan-Feb;122(1):122(1): 67-71
       Introduction: There are barriers that exist for individuals to adhere to cardiovascular rehabilitation programs. A key driver to patient adherence is appropriately educating patients. A growing education tool is using large language models to answer patient questions.
    Methods: The primary objective of this study was to evaluate the readability quality of educational responses provided by large language models for questions regarding cardiac rehabilitation using Gunning Fog, Flesh Kincaid, and Flesch Reading Ease scores.
    Results: The findings of this study demonstrate that the mean Gunning Fog, Flesch Kincaid, and Flesch Reading Ease scores do not meet US grade reading level recommendations across three models: ChatGPT 3.5, Copilot, and Gemini. The Gemini and Copilot models demonstrated greater ease of readability compared to ChatGPT 3.5.
    Conclusions: Large language models could serve as educational tools on cardiovascular rehabilitation, but there remains a need to improve the text readability for these to effectively educate patients.
  21. Otol Neurotol. 2025 Feb 04.
       OBJECTIVE: To examine the quality of information provided by artificial intelligence platforms ChatGPT-4 and Claude 2 surrounding the management of vestibular schwannomas.
    STUDY DESIGN: Cross-sectional.
    SETTING: Skull base surgeons were involved from different centers and countries.
    INTERVENTION: Thirty-six questions regarding vestibular schwannoma management were tested. Artificial intelligence responses were subsequently evaluated by 19 lateral skull base surgeons using the Quality Assessment of Medical Artificial Intelligence (QAMAI) questionnaire, assessing "Accuracy," "Clarity," "Relevance," "Completeness," "Sources," and "Usefulness."
    MAIN OUTCOME MEASURE: The scores of the answers from both chatbots were collected and analyzed using the Student t test. Analysis of responses grouped by stakeholders was performed with McNemar test. Stuart-Maxwell test was used to compare reading level among chatbots. Intraclass correlation coefficient was calculated.
    RESULTS: ChatGPT-4 demonstrated significantly improved quality over Claude 2 in 14 of 36 (38.9%) questions, whereas higher-quality scores for Claude 2 were only observed in 2 (5.6%) answers. Chatbots exhibited variation across the dimensions of "Accuracy," "Clarity," "Completeness," "Relevance," and "Usefulness," with ChatGPT-4 demonstrating a statistically significant superior performance. However, no statistically significant difference was found in the assessment of "Sources." Additionally, ChatGPT-4 provided information at a significant lower reading grade level.
    CONCLUSIONS: Artificial intelligence platforms failed to consistently provide accurate information surrounding the management of vestibular schwannoma, although ChatGPT-4 achieved significantly higher scores in most analyzed parameters. These findings demonstrate the potential for significant misinformation for patients seeking information through these platforms.
    DOI:  https://doi.org/10.1097/MAO.0000000000004410
  22. J Med Libr Assoc. 2025 Jan 14. 113(1): 65-77
       Objective: This study investigated the performance of a generative artificial intelligence (AI) tool using GPT-4 in answering clinical questions in comparison with medical librarians' gold-standard evidence syntheses.
    Methods: Questions were extracted from an in-house database of clinical evidence requests previously answered by medical librarians. Questions with multiple parts were subdivided into individual topics. A standardized prompt was developed using the COSTAR framework. Librarians submitted each question into aiChat, an internally managed chat tool using GPT-4, and recorded the responses. The summaries generated by aiChat were evaluated on whether they contained the critical elements used in the established gold-standard summary of the librarian. A subset of questions was randomly selected for verification of references provided by aiChat.
    Results: Of the 216 evaluated questions, aiChat's response was assessed as "correct" for 180 (83.3%) questions, "partially correct" for 35 (16.2%) questions, and "incorrect" for 1 (0.5%) question. No significant differences were observed in question ratings by question category (p=0.73). For a subset of 30% (n=66) of questions, 162 references were provided in the aiChat summaries, and 60 (37%) were confirmed as nonfabricated.
    Conclusions: Overall, the performance of a generative AI tool was promising. However, many included references could not be independently verified, and attempts were not made to assess whether any additional concepts introduced by aiChat were factually accurate. Thus, we envision this being the first of a series of investigations designed to further our understanding of how current and future versions of generative AI can be used and integrated into medical librarians' workflow.
    Keywords:  Artificial Intelligence; Biomedical Informatics; Evidence Synthesis; Generative AI; Information Science; LLMs; Large Language Models; Library Science
    DOI:  https://doi.org/10.5195/jmla.2025.1985
  23. J Pediatr Ophthalmol Strabismus. 2025 Feb 19. 1-8
       PURPOSE: To assess the medical accuracy and readability of responses provided by ChatGPT (OpenAI), the most widely used artificial intelligence-powered chat-bot, regarding questions about strabismus.
    METHODS: Thirty-four questions were input into ChatGPT 3.5 (free version) and 4.0 (paid version) at three time intervals (day 0, 1 week, and 1 month) in two distinct geographic locations (California and Florida) in March 2024. Two pediatric ophthalmologists rated responses as "acceptable," "accurate but missing key information or minor inaccuracies," or "inaccurate and potentially harmful." The online tool, Readable, measured the Flesch-Kincaid Grade Level and Flesch Reading Ease Score to assess readability.
    RESULTS: Overall, 64% of responses by ChatGPT were "acceptable;" but the proportion of "acceptable" responses differed by version (47% for ChatGPT 3.5 vs 53% for 4.0, P < .05) and state (77% of California vs 51% of Florida, P < .001). Responses in Florida were more likely to be "inaccurate and potentially harmful" compared to those in California (6.9% vs. 1.5%, P < .001). Over 1 month, the overall percentage of "acceptable" responses increased (60% at day 0, 64% at 1 week, and 67% at 1 month, P > .05), whereas "inaccurate and potentially harmful" responses decreased (5% at day 0, 5% at 1 week, and 3% at 1 month, P > .05). On average, responses scored a Flesch-Kincaid Grade Level score of 15, equating to a higher than high school grade reading level.
    CONCLUSIONS: Although most of ChatGPT's responses to strabismus questions were clinically acceptable, there were variations in responses across time and geographic regions. The average reading level exceeded a high school level and demonstrated low readability. Although ChatGPT demonstrates potential as a supplementary resource for parents and patients with strabismus, improving the accuracy and readability of free versions of ChatGPT may increase its utility. [J Pediatr Ophthalmol Strabismus. 20XX;X(X):XXX-XXX.].
    DOI:  https://doi.org/10.3928/01913913-20250110-02
  24. Eur J Ophthalmol. 2025 Feb 19. 11206721251321197
       PURPOSE: To evaluate the appropriateness and readability of the responses generated by ChatGPT-4 and Bing Chat to frequently asked questions about glaucoma.
    METHOD: Thirty-four questions were generated for this study. Each question was directed three times to a fresh ChatGPT-4 and Bing Chat interface. The obtained responses were categorised by two glaucoma specialists in terms of their appropriateness. Accuracy of the responses was evaluated using the Structure of the Observed Learning Outcome (SOLO) taxonomy. Readability of the responses was assessed using Flesch Reading Ease (FRE), Flesch Kincaid Grade Level (FKGL), Coleman-Liau Index (CLI), Simple Measure of Gobbledygook (SMOG), and Gunning- Fog Index (GFI).
    RESULTS: The percentage of appropriate responses was 88.2% (30/34) and 79.2% (27/34) in ChatGPT-4 and Bing Chat, respectively. Both the ChatGPT-4 and Bing Chat interfaces provided at least one inappropriate response to 1 of the 34 questions. The SOLO test results for ChatGPT-3.5 and Bing Chat were 3.86 ± 0.41 and 3.70 ± 0.52, respectively. No statistically significant difference in performance was observed between both LLMs (p = 0.101). The mean count of words used when generating responses was 316.5 (± 85.1) and 61.6 (± 25.8) in ChatGPT-4 and Bing Chat, respectively (p < 0.05). According to FRE scores, the generated responses were suitable for only 4.5% and 33% of U.S. adults in ChatGPT-4 and Bing Chat, respectively (p < 0.05).
    CONCLUSIONS: ChatGPT-4 and Bing Chat consistently provided appropriate responses to the questions. Both LLMs had low readability scores, but ChatGPT-4 provided more difficult responses in terms of readability.
    Keywords:  Artificial intelligence; Bing Chat; ChatGPT; frequently asked questions; glaucoma; readability tests
    DOI:  https://doi.org/10.1177/11206721251321197
  25. J ISAKOS. 2025 Feb 12. pii: S2059-7754(25)00458-4. [Epub ahead of print] 100841
       INTRODUCTION: With over 61% of Americans seeking health information online, the accuracy and readability of this content are critical. AI tools, like ChatGPT, have gained popularity in providing medical information, but concerns remain about their accessibility, especially for individuals with lower literacy levels. This study compares the readability and accuracy of ChatGPT-generated content with information from the American Academy of Orthopedic Surgery (AAOS) OrthoInfo website, focusing on rotator cuff injuries.
    METHODS: We formulated seven frequently asked questions about rotator cuff injuries, based on the OrthoInfo website, and gathered responses from both ChatGPT-4 and OrthoInfo. Readability was assessed using multiple readability metrics (Flesch-Kincaid, Gunning Fog, Coleman-Liau, SMOG Readability Formula, FORCAST Readability Formula, Fry Graph, Raygor Readability Estimate), while accuracy was evaluated by three independent reviewers. Statistical analysis included t-tests and correlation analysis.
    RESULTS: ChatGPT responses required a higher education level to comprehend, with an average grade level of 14.7, compared to OrthoInfo's 11.9 (p < 0.01). The Flesch Reading Ease Index indicated that OrthoInfo's content (52.5) was more readable than ChatGPT's (25.9, p < 0.01). Both sources had high accuracy, with ChatGPT slightly lower in accuracy for the question about further damage to the rotator cuff (p < 0.05).
    CONCLUSION: ChatGPT shows promise in delivering accurate health information but may not be suitable for all patients due to its higher complexity. A combination of AI and expert-reviewed, accessible content may enhance patient understanding and health literacy. Future developments should focus on improving AI's adaptability to different literacy levels.
    LEVEL OF EVIDENCE: IV.
    Keywords:  Accuracy; Artificial Intelligence; ChatGPT; Health Literacy; Readability; Rotator Cuff Injury
    DOI:  https://doi.org/10.1016/j.jisako.2025.100841
  26. J Shoulder Elbow Surg. 2025 Feb 17. pii: S1058-2746(25)00139-9. [Epub ahead of print]
       PURPOSE: This study aims to analyze and compare the quality, accuracy, and readability of information regarding anatomic Total Shoulder Arthroplasty (aTSA) and reverse Total Shoulder Arthroplasty (rTSA) provided by various AI interfaces (Open AI's ChatGPT and Microsoft's Copilot).
    METHODS: Thirty commonly asked questions (categorized by Rothwell criteria into Fact, Policy, and Value) by patients were inputted into ChatGPT 3.5 and Copilot. Responses were assessed with the DISCERN scale, JAMA benchmark criteria, Flesch-Kincaid Reading Ease Score (FRES), and Grade Level (FKGL). The sources of citations provided by CoPilot were further analyzed.
    RESULTS: Both AI interfaces generated DISCERN scores >50 (aTSA and rTSA ChatGPT 57 (Fact), 61 (Policy), 58 (Value), aTSA and rTSA Copilot 68 (Fact), 72 (Policy), 70 (Value)), demonstrating "good" quality of information provided, except for the Policy questions by CoPilot which were scored as "excellent" (>70). CoPilot's higher JAMA score (3 vs. 0) and FRES scores above 30 indicated more reliable, accessible responses, which required a minimum of 12th grade education to read the same. In comparison, the ChatGPT generated more complex texts, with the majority of the FRES scores <20, and FKGL score signifying complexity of academic level text. Finally, CoPilot provided citations and demonstrated the highest percentage of Academic sources (31.1% for rTSA and 26.7% for aTSA) suggesting reliable sources of information.
    CONCLUSION: Overall, the information provided by both AI interfaces ChatGPT and CoPilot was scored as a "good" source of information for commonly asked patient questions regarding shoulder arthroplasty. But the answers to questions pertaining to shoulder arthroplasty provided by CoPilot proved to be more reliable (p=0.0061), less complex, easier to read (p=0.0031), and referenced information from reliable resources including Academic sources, Journal articles, and Medical sites. Although answers provided by CoPilot were "easier" to read, they still required a 12th grade education, which may be too complex for most patients, posing a challenge for patient comprehension. There were a substantial amount of non-medical media sites, and Commercial sources that were cited for both aTSA and rTSA questions by CoPilot. Critically, answers from both AI interfaces should serve as supplementary resources rather than primary sources on perioperative conditions pertaining to shoulder arthroplasty.
    Keywords:  Artificial Intelligence; Patient Education; Reverse Shoulder Arthroplasty; Technology in Shoulder Arthroplasty; Total Shoulder Arthroplasty
    DOI:  https://doi.org/10.1016/j.jse.2024.12.048
  27. J Endod. 2025 Feb 12. pii: S0099-2399(25)00067-6. [Epub ahead of print]
       INTRODUCTION: ChatGPT is an artificial intelligence (AI) chatbot, developed by OpenAI, that uses Deep Learning (DL) technology for information processing. The chatbot uses natural language processing (NLP) and machine learning (ML) algorithms to respond to users' questions. The purpose of this study was to review ChatGPT responses to determine if they were a reliable source of scientific information regarding local anesthesia for endodontics.
    MATERIALS AND METHODS: Sixteen representative questions pertaining to local anesthesia for endodontics were selected. ChatGPT was asked to answer the 16 questions and provide supporting references. Each provided ChatGPT reference was evaluated for accuracy using NLM NIH.GOV(PubMed), Google Scholar, journal citations, and author citations. Peer-reviewed, evidence-based literature citations related to the initial questions were collected by the authors. The two authors independently compared the answers of the ChatGPT to the peer-reviewed, evidence-based literature using a 5-answer Likert-type scale.
    RESULTS: ChatGPT was reliable 50% of the time when compared to the peer-reviewed, evidence-based literature. That is, ChatGPT had the same literature-based response as our peer-reviewed, evidence-based literature in 16 of the 32 questions. Of the 51 total references for Chatbot, 59% (30 of 51) had the wrong reference; 12% (6 of 51) of the references couldn't be retrieved; and 18% (9 of 51) of the references were hallucinations (made up references).
    CONCLUSIONS: AI needs further training in or field to be trusted for accurate information in the filed of endodontic anesthesia. ChatGPT should continue to improve to provide reliable information for providers and patients alike.
    Keywords:  AI; ChatGPT; Endodontics: Anesthesia
    DOI:  https://doi.org/10.1016/j.joen.2025.02.002
  28. World Neurosurg. 2025 Feb 12. pii: S1878-8750(25)00111-1. [Epub ahead of print] 123755
       BACKGROUND: Artificial intelligence (AI) tools like ChatGPT have gained attention for their potential to support patient education by providing accessible, evidence-based information. This study compares the performance of ChatGPT 3.5 and ChatGPT 4.0 in answering common patient questions about LBP, focusing on response quality, readability, and adherence to clinical guidelines, while also addressing the models' limitations in managing psychosocial concerns.
    METHODS: Thirty frequently asked patient questions about LBP were categorized into four groups: Diagnosis, Treatment, Psychosocial Factors, and Management Approaches. Responses generated by ChatGPT 3.5 and 4.0 were evaluated on three key metrics: 1.
    RESPONSE QUALITY: Rated on a scale of 1 (excellent) to 4 (unsatisfactory). 2.
    DISCERN CRITERIA: Evaluating reliability and adherence to clinical guidelines, with scores ranging from 1 (low reliability) to 5 (high reliability). 3.
    READABILITY: Assessed using seven readability formulas, including Flesch-Kincaid and Gunning Fog Index.
    RESULTS: ChatGPT 4.0 significantly outperformed ChatGPT 3.5 in response quality across all categories, with a mean score of 1.03 compared to 2.07 for ChatGPT 3.5 (p < 0.001). ChatGPT 4.0 also demonstrated higher DISCERN scores (4.93 vs. 4.00, p < 0.001). However, both versions struggled with psychosocial factor questions, where responses were rated lower than for Diagnosis, Treatment, and Management questions (p = 0.04).
    CONCLUSION: ChatGPT 3.5 and 4.0 limitations in addressing psychosocial concerns highlight the need for clinician oversight, particularly for emotionally sensitive issues. Enhancing AI's capability in managing psychosocial aspects of patient care should be a priority in future iterations.
    Keywords:  Artificial intelligence; ChatGPT 3.5; ChatGPT 4.0; Low Back Pain
    DOI:  https://doi.org/10.1016/j.wneu.2025.123755
  29. J Med Internet Res. 2025 Feb 20. 27 e53087
       BACKGROUND: Patients and families who have experienced delirium may seek information about delirium online, but the quality and reliability of online delirium-related websites are unknown.
    OBJECTIVE: This study aimed to identify and evaluate online delirium-related websites that could be used for patient and family education.
    METHODS: We searched Microsoft Bing, Google, and Yahoo using the keywords "delirium" and the misspelled "delerium" to identify delirium-related websites created to inform patients, families, and members of the public about delirium. The quality of identified delirium-related website content was evaluated by 2 authors using the validated DISCERN tool and the JAMA (Journal of the American Medical Association) benchmark criteria. Readability was assessed with the Simple Measure of Gobbledygook, the Flesch Reading Ease score, and the Flesch Kincaid grade level. Each piece of website content was assessed for its delirium-related information using a checklist of items co-designed by a working group, which included patients, families, researchers, and clinicians.
    RESULTS: We identified 106 websites targeted toward patients and families, with most hospital-affiliated (21/106, 20%) from commercial websites (20/106, 19%), government-affiliated organizations (19/106, 18%), or from a foundation or advocacy group (16/106, 15%). The median time since the last content update was 3 (IQR 2-5) years. Most websites' content (101/106, 95%) was written at a reading level higher than the recommended grade 6 level. The median DISCERN total score was 42 (IQR 33-50), with scores ranging from 20 (very poor quality) to 78 (excellent quality). The median delirium-related content score was 8 (IQR 6-9), with scores ranging from 1 to 12. Many websites lacked information on the short- and long-term outcomes of delirium as well as how common it is. The median JAMA benchmark score was 1 (IQR 1-3), indicating the quality of the websites' content had poor transparency.
    CONCLUSIONS: We identified high-quality websites that could be used to educate patients, families, or the public about delirium. While most delirium-related website content generally meets quality standards based on DISCERN and JAMA benchmark criteria, high scores do not always ensure patient and family-friendliness. Many of the top-rated delirium content were text-heavy and complex in layout, which could be overwhelming for users seeking clear, concise information. Future efforts should prioritize the development of websites with patients and families, considering usability, accessibility, and cultural relevance to ensure they are truly effective for delirium education.
    Keywords:  accessibility; brain lesions; caregiver; confusion; delirium; disorientation; education; family education; health information; high-quality websites; inattentiveness; information seeking; internet; patient; readability
    DOI:  https://doi.org/10.2196/53087
  30. Pediatr Res. 2025 Feb 22.
       BACKGROUND: Information leaflets in research studies should be age-appropriate to be understood, however the formal readability of children's participant information leaflets (PILs) for research studies has not been assessed.
    METHODS: A single-centre cross-sectional study assessing paediatric PILs. Six readability tests were applied (Gunning Fog Index (GFI), Simple Measure of Gobbledygook (SMOG), Flesch Kincaid Grade Level (FKGL), Coleman-Liau Index (CLI), Automated Readability Index (ARI) and Flesch Reading Ease score (FRE). Results were compared between age groups, and whether the PIL was from either a commercially sponsored or investigator led study.
    RESULTS: 191 paediatric PILs were included. Age categories; <10 years (n = 65), ≤12 (n = 73), ≤15 (n = 73) and ≥16 (n = 61); were used for analysis. There were 39 commercial PILs and 226 non-commercial PILs. For the ≤10 and ≤12 age bands, all 6 median readability scores exceeded the target age group (thus hard to read, p < 0.005), and there was no difference in readability scores between these two age bands. Four scores from the readability tests were considered age-appropriate in the ≤15 year category, and all median scores were age-appropriate in the ≥16 years age groups. Readability scores for children's PILs were significantly higher in commercially sponsored versus non-commercial studies (P < 0.005).
    CONCLUSION: Improvements are required to make children's PILs readable for the target audience, particularly in commercially sponsored research studies.
    IMPACT: Paediatric participant information leaflets may not be readable in research studies, especially in younger age groups. PILs for children participating in commercially sponsored studies were less readable than non-commercial studies. Research teams writing PILs for a paediatric study need to consider the use of readability tools to ensure that the information they are providing is readable by the target audience.
    DOI:  https://doi.org/10.1038/s41390-025-03943-z
  31. Laryngoscope Investig Otolaryngol. 2025 Feb;10(1): e70101
       Introduction: This study evaluates the readability of online patient education materials (OPEMs) across otolaryngology subspecialties, hospital characteristics, and national otolaryngology organizations, while assessing AI alternatives.
    Methods: Hospitals from the US News Best ENT list were queried for OPEMs describing a chosen surgery per subspecialty; the American Academy of Otolaryngology-Head and Neck Surgery (AAO), American Laryngological Association (ALA), Ear, Nose, and Throat United Kingdom (ENTUK), and the Canadian Society of Otolaryngology-Head and Neck Surgery (CSOHNS) were similarly queried. Google was queried for the top 10 links from hospitals per procedure. Ownership (private/public), presence of respective otolaryngology fellowships, region, and median household income (zip code) were collected. Readability was assessed using seven indices and averaged: Automated Readability Index (ARI), Flesch Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Readability (GFR), Simple Measure of Gobbledygook (SMOG), Coleman-Liau Readability Index (CLRI), and Linsear Write Readability Formula (LWRF). AI-generated materials from ChatGPT were compared for readability, accuracy, content, and tone. Analyses were conducted between subspecialties, against national organizations, NIH standard, and across demographic variables.
    Results: Across 144 hospitals, OPEMs exceeded NIH readability standards, averaging at an 8th-12th grade level across subspecialties. In rhinology, facial plastics, and sleep medicine, hospital OPEMs had higher readability scores than ENTUK's materials (11.4 vs. 9.1, 10.4 vs. 7.2, 11.5 vs. 9.2, respectively; all p < 0.05), but lower than AAO (p = 0.005). ChatGPT-generated materials averaged a 6.8-grade level, demonstrating improved readability, especially with specialized prompting, compared to all hospital and organization OPEMs.
    Conclusion: OPEMs from all sources exceed the NIH readability standard. ENTUK serves as a benchmark for accessible language, while ChatGPT demonstrates the feasibility of producing more readable content. Otolaryngologists might consider using ChatGPT to generate patient-friendly materials, with caution, and advocate for national-level improvements in patient education readability.
    Keywords:  otolaryngology; patient education; readability
    DOI:  https://doi.org/10.1002/lio2.70101
  32. J Spinal Cord Med. 2025 Feb 18. 1-7
       CONTEXT: Autonomic dysreflexia (AD), a lethal condition of which patients with spinal cord injury (SCI) are at risk, is under-identified in these patient populations. Research literature is limited on AD-related educational resources provided to patients with SCI.
    OBJECTIVE: The American Medical Association and National Institutes of Health recommend healthcare material be written in a sixth- to eighth-grade reading level for patients. In this study, the authors compared the readability of AD-related materials provided to patients with SCI through Commission on Accreditation of Rehabilitation Facilities (CARF)-accredited websites versus those obtained via Google search.
    METHODS: Online, free materials were obtained from CARF-accredited institutions. These data were compared with top Google search results for the term "autonomic dysreflexia." Materials were assessed using 4 different validated readability scales. The average reading grade level was recorded for each readability index between the two groups.
    RESULTS: For CARF-accredited institutions (n = 21), the mean readability score was at a 10th grade level. For Google search (n = 84), the mean readability score was at a 13th grade level. Further analysis demonstrated a statistically significant difference between the readability of the CARF-accredited and non-accredited websites (P < 0.01). One-way ANOVA demonstrated no significant differences among the four readability calculators for CARF-accredited sites and, separately, for Google websites.
    CONCLUSION: Online information provided to patients with SCI on AD through CARF-accredited institutions is two- to four-reading grade levels higher than recommended. Efforts should be made to modify the readability of CARF-accredited and non-academic website materials to improve patient education.
    Keywords:  Autonomic dysreflexia; Education; Readability; Spinal cord injury
    DOI:  https://doi.org/10.1080/10790268.2024.2448040
  33. J Back Musculoskelet Rehabil. 2025 Feb 19. 10538127251317661
       PURPOSE: This study aimed to evaluate the reliability and quality of Turkish videos about frozen shoulder exercises published on the Youtube platform.
    MATERIAL-METHOD: 54 videos were included in the study as a result of searches and evaluations made on the YouTube platform using the keywords "frozen shoulder exercises" and "adhesive capsulitis exercises". The included videos were evaluated by two independent observers and a final independent 3rd observer, and the given modified DISCERN (mDISCERN), Global Quality Scale (GQS) and JAMA scores were compared with other video parameters. Differences between groups were examined.
    RESULT: Exercises were detected in 10 different categories in the 54 videos included in the study. In the examination made according to GQS scoring, a statistically significant relationship was found between the viewing rate, mDISCERN and JAMA scores, the number of exercise types included in the video and the GQS quality grouping (p < 0.05). In the examination conducted in terms of reliability, a statistically significant relationship was found between mDISCERN scores and the number of exercise types included in the videos, JAMA and GQS scores (p < 0.05).
    CONCLUSION: In our study, more than half of the frozen shoulder exercise videos on the Youtube platform were found to be low in reliability and quality. Due to the variety of content in the videos and the lack of individualized exercises, YouTube does not appear to be a suitable resource for frozen shoulder exercises. Considering all these, we believe that videos with informative content should be created by physicians or should be audited.
    Keywords:  Frozen shoulder; adhesive capsulitis; exercise therapy
    DOI:  https://doi.org/10.1177/10538127251317661
  34. Dis Colon Rectum. 2025 Feb 14.
       BACKGROUND: Hemorrhoidal disease is highly prevalent in the United States and frequently queried online. Unfortunately, health education webpages often lack reliable information.
    OBJECTIVE: To evaluate whether online hemorrhoid education materials in English and Spanish meet national recommendations for readability, actionability, and accessibility, and provide critical clinical guidance on when to seek medical care.
    DESIGN: Using three search engines (Bing, Google, Yahoo), we selected the top 30 results for formal medical and colloquial English and Spanish search terms regarding hemorrhoids. We assessed readability using validated scoring systems for readability in English and Spanish to report median reading levels and assessed Health Literacy Performance on a six-point checklist in three categories: accessibility, actionability, and critical clinical guidance.
    SETTINGS: University of California Los Angeles.
    MAIN OUTCOME MEASURES: Readability and health literacy performance.
    RESULTS: After removing duplicates, 90-95 webpages generated from formal English, Spanish, and colloquial English terms remained. There was minimal overlap of results from the formal and colloquial English searches. Median reading levels were first-year university for formal and colloquial English webpages, and eleventh grade for Spanish webpages. 43.2%, 48.4%, and 18.2% of formal English, Spanish, and colloquial English websites, respectively, had minimal Health Literacy Performance. Health Literacy Performance criteria that were met least often were printability and providing specific, actionable goals for patients to implement.
    LIMITATIONS: Our study represents searches completed at one point in time utilizing specific terms. Colloquial search terms were generated via survey with convenience sampling and may not be representative of all possible searches used by patients seeking information on hemorrhoidal disease.
    CONCLUSIONS: Most English and Spanish hemorrhoid-focused webpages failed to provide appropriate patient education, as they exceeded the recommended sixth-grade reading level, lacked actionable recommendations, were not accessible, and failed to provide critical clinical guidance. Online resources are essential for patients of all health literacy levels; improvement is critical to reduce healthcare disparities. See Video Abstract.
    DOI:  https://doi.org/10.1097/DCR.0000000000003691
  35. Plast Reconstr Surg Glob Open. 2025 Feb;13(2): e6541
       Background: Current recommendations suggest that patient education materials (PEMs) be written at or below the sixth-grade reading level. In a 2010 study, the average readability of PEMs on the American Society of Plastic Surgeons (ASPS) and The Aesthetic Society (AS) websites was found to be at the 11th-grade level or higher. We sought to assess progress made toward providing accessible PEMs.
    Methods: PEMs were obtained from the ASPS and AS websites. The PEMs were entered into an online scoring tool. PEMs were scored on 3 common readability indices: Flesch-Kincaid, Simple Measure of Gobbledygook, and Flesch Reading Ease (FRE).
    Results: The average grade level of ASPS PEMs calculated using the Flesch-Kincaid, Simple Measure of Gobbledygook, and FRE readability models were 9.7 ± 1.1, 12.6 ± 0.7, and 47.6 ± 6.2, respectively. This FRE score corresponds to approximately grade 13-16 reading levels. The average of AS PEMs were 9.3 ± 0.5, 12.3 ± 0.3, and 51.3 ± 3.9, respectively; this FRE corresponds to grade 10-12 reading levels. There were no PEMs written at or below the recommended sixth-grade reading level found on ASPS and AS websites.
    Conclusions: Despite increasing awareness of the need for equitable access to healthcare, PEMs continue to be written at a reading level well above the recommendation. Over the past 14 years, we have seen only modest improvement in readability indices. In addition to advocating for more accessible PEMs, we must gather a deeper understanding of how patients seek information about plastic surgery.
    DOI:  https://doi.org/10.1097/GOX.0000000000006541
  36. Int J Obstet Anesth. 2024 Dec 14. pii: S0959-289X(24)00322-4. [Epub ahead of print]62 104310
       BACKGROUND: Neuraxial labor analgesia is the most effective method of pain relief during childbirth. Despite its proven efficacy and safety, misconceptions about neuraxial analgesia persist. This cross-sectional study aimed to evaluate the accuracy and quality of TikTok videos on neuraxial labor analgesia, hypothesizing that many would contain inaccurate or low-quality information.
    METHODS: Using the "Top" search function in TikTok, we identified the first 150 videos using the following keywords: "epidural," "epidural for labor," "epidural for pregnancy," "epidural experience," "getting an epidural," and "epidural risks." Primary outcomes included the proportion of videos containing inaccurate information and overall quality of videos based on modified DISCERN (mDISCERN) scores.
    RESULTS: Twenty-six (10%) of the 266 included videos contained inaccurate information. Median (interquartile range = IQR) mDISCERN score for all included videos was 1.0 (IQR = 2.0), indicating poor quality. Videos from medical sources scored higher in quality (median = 2.0, IQR = 1.0) than non-medical sources (median = 0.0, IQR = 2.0; P <0.001), however both scores were below the mDISCERN threshold for high video-quality.
    CONCLUSION: This study highlights the presence of inaccurate information on popular social media platforms such as TikTok regarding neuraxial labor analgesia. Many videos are of low quality and lack comprehensive, balanced, and unbiased information. This poses a significant risk to patient understanding and informed decision-making. Medical professionals and organizations should actively engage on platforms such as TikTok to disseminate accurate, high-quality information, thereby helping to combat the spread of misleading information.
    Keywords:  Anesthesia; Communication; Epidural; Information dissemination; Internet; Labor analgesia; Neuraxial analgesia; Social media
    DOI:  https://doi.org/10.1016/j.ijoa.2024.104310
  37. BMC Public Health. 2025 Feb 19. 25(1): 684
       BACKGROUND: YouTube™ ( http://www.youtube.com ), the most widely used video website worldwide, is becoming a competitive platform for patients to gain health information and knowledge. This study aims to evaluate if YouTube™ is a useful source of information on home parenteral nutrition (HPN) for the public.
    METHODS: According to MeSH (Medical Subject Headings), the combinations of search terms related to parenteral (intravenous) and nutrition (feeding) were searched through YouTube™. In total 131 videos were evaluated, which were cataloged into three categories Education, News & Politics, and People & Blogs. Then a usefulness score was devised to assess video quality and to classify all videos into Slightly useful, Useful, and Very useful.
    RESULTS: The majority of videos included are under the Education category (n = 92, 70.23%). 6 videos were identified as Very useful, which were all under the Education category, 27 videos were identified as Useful, while the rest 98 videos were identified as Slightly useful. The number of likes, the number of views, views per day, and the duration of the Very useful videos are significantly higher than those of Slightly useful videos.
    CONCLUSION: YouTube™ is a good source of information on home parenteral nutrition. In this study, videos categorized under Education were rated as the highest in usefulness. Due to the highly technical content of HPN, and the existence of a lot of low-credibility information on the internet, patients and professional staff are supposed to view other reliable videos in the field of healthcare information.
    Keywords:  Home parenteral nutrition; Intravenous feeding; Parenteral feeding; YouTube
    DOI:  https://doi.org/10.1186/s12889-025-21929-8
  38. Laryngoscope Investig Otolaryngol. 2025 Feb;10(1): e70079
       Objective: The ease of access of online videos and the popularity of visual learning have made YouTube a popular educational resource. We analyzed the utility of YouTube videos for graduate medical education about free flap surgery using a cross-sectional study design.
    Methods: Using the phrases "free flap surgery" and "free flap head and neck," YouTube videos for inclusion were identified. Videos were analyzed by free flap surgeons using Modified DISCERN, Global Quality Score (GQS), and JAMA Benchmark metrics of video quality, educational value, and transparency, respectively. Statistical analysis of video metadata and expert-determined scores was performed.
    Results: In total, 44 videos with 517,227 combined views were analyzed. Most videos were intra-operative (63.6%), published by physicians (34.1%) or medical institutions (22.7%), and had health professional target audiences (95.5%). The mean Modified DISCERN score was 15.4/25, with most videos classified as "fair" (54.6%). The mean GQS was 4.17/5 and the mean JAMA Benchmark was 2.7/4. Higher Modified DISCERN scores were significantly associated with health professional target audiences (p = 0.04) and webinars (p = 0.03). Higher GQS was also significantly associated with a health professional target audience (p < 0.01), and higher JAMA scores with YouTube verification (p = 0.04).
    Conclusion: Routine YouTube searches may not yield results ideal for resident education in head and neck free flap surgery. While many videos are of good educational value, lower transparency and reliability scores raise concerns of biased information. It is important to consider vetted educational or health care sources for resident surgical education.
    Level of Evidence: Level IV (cross-sectional study).
    Keywords:  YouTube; free flap surgery; graduate medical education; head and neck cancer; otolaryngology
    DOI:  https://doi.org/10.1002/lio2.70079
  39. Transl Cancer Res. 2025 Jan 31. 14(1): 102-111
       Background: Vulvar cancer is a relatively rare malignant tumor that receives less attention than other gynecological malignancies. Short-video apps are playing an important role in promoting health. This study evaluated the quality of videos about vulvar cancer on YouTube with the aim of making facts-based recommendations and promoting public health engagement.
    Methods: On May 15, 2024, the term "vulvar cancer" was searched on YouTube, and the top 100 videos identified in the search were chosen for our research. We evaluated the completeness of each video using six dimensions. The video quality was evaluated using the DISCERN instrument (Quality Criteria for Consumer Health Information), the Journal of the American Medical Association (JAMA) benchmark criteria, the Patient Education Materials Assessment Tool (PEMAT), and the Global Quality Scale (GQS). Correlations between video data, DISCERN, JAMA, PEMAT, and GQS scores were analyzed.
    Results: Among the 65 videos that were included in this study, the majority (64.6%) were posted by educational and training institutes. The quality of the videos submitted by physicians was comparatively good, as indicated by the GQS scores (P=0.045). The relationships between the DISCERN categorization score and duration (P<0.001), views per day (P=0.02), likes per day (P<0.001), comments per day (P=0.03), PEMAT actionability (P<0.001), and GQS scores (P<0.001) were statistically significant. Additionally, there was a strong positive correlation between the GQS score and video length.
    Conclusions: The quality of videos about vulvar cancer on YouTube is unsatisfactory. However, several measures can be adopted in the future to make YouTube a more practical tool for promoting the prevention and cure of vulvar cancer.
    Keywords:  Video quality; YouTube; vulvar cancer
    DOI:  https://doi.org/10.21037/tcr-24-1411
  40. Int J Occup Saf Ergon. 2025 Feb 17. 1-9
      Companies provide employees with occupational health and safety (OHS) training through videos on YouTube. In this study, the reliability of 118 YouTube videos related to OHS was evaluated by two experts using Journal of the American Medical Association (JAMA) and global quality score (GQS) scales. Six video variables - video duration, number of subscribers, likes, views, publication time and comments - were evaluated based on video source (five groups) and type (seven groups). The correlation analysis found a positive significant relationship between all variables except publication time-number of comments. Scores according to video source and type were 1.9 out of 4 on the JAMA scale and 2.3 out of 5 on the GQS scale. These scores show that the videos are inadequate and of poor quality. There was a statistically significant difference between video durations according to video type. Moreover, video sources differ depending on the number of subscribers and comments.
    Keywords:  YouTube; occupational health and safety; video analysis
    DOI:  https://doi.org/10.1080/10803548.2025.2455284
  41. JMIR Cancer. 2025 Feb 19. 11 e59483
       Background: Breast cancer is the most common malignant tumor and the fifth leading cause of cancer death worldwide, imposing a significant disease burden in China. Mammography is a key method for breast cancer screening, particularly for early diagnosis. Douyin, a popular social media platform, is increasingly used for sharing health information, but the quality and reliability of mammography-related videos remain unexamined.
    Objective: This study aimed to evaluate the information quality and reliability of mammography videos on Douyin.
    Methods: In October 2023, a search using the Chinese keywords for "mammography" and "mammography screening" was conducted on Douyin. From 200 retrieved videos, 136 mammography-related videos were selected for analysis. Basic video information, content, and sources were extracted. Video content was assessed for comprehensiveness across 7 categories: conception, examination process, applicable objects, precautions, combined examinations, advantages, and report. Completeness was evaluated using a researcher-developed checklist, while reliability and quality were measured using 2 modified DISCERN (mDISCERN) tool and the Global Quality Score (GQS). Correlations between video quality and characteristics were also examined.
    Results: Among the video sources, 82.4% (112/136) were attributed to health professionals, and 17.6% (24/136) were attributed to nonprofessionals. Among health professionals, only 1 was a radiologist. Overall, 77.2% (105/136) of the videos had useful information about mammography. Among the useful videos, the advantages of mammography were the most frequently covered topic (53/105, 50.5%). Median values for the mDISCERN and GQS evaluations across all videos stood at 2.5 (IQR 1.63-3) and 2 (IQR 1-2), respectively. Within the subgroup assessment, the median mDISCERN score among the useful and professional groups stood at 2 (IQR 2-3) and 3 (IQR 2-3), respectively, surpassing the corresponding score for the unhelpful and nonprofessional groups at 0 (IQR 0-0) and 0 (IQR 0-0.75; P<.001). Likewise, the median GQS among the useful and professional groups was evaluated at 2 (IQR 1.5-2) and 2 (IQR 1-2), respectively, eclipsing that of the unhelpful and nonprofessional groups at 1 (IQR 1-1) and 1 (IQR 1-1.37; P<.001). The GQS was weak and negatively correlated with the number of likes (r=-0.24; P=.004), comments (r=-0.29; P<.001), and saves (r=-0.20; P=.02). The mDISCERN score was weak and negatively correlated with the number of likes (r=-0.26; P=.002), comments (r=-0.36; P<.001), saves (r=-0.22; P=.009), and shares (r=-0.18; P=.03).
    Conclusions: The overall quality of mammography videos on Douyin is suboptimal, with most content uploaded by clinicians rather than radiologists. Radiologists should be encouraged to create accurate and informative videos to better educate patients. As Douyin grows as a health information platform, stricter publishing standards are needed to enhance the quality of medical content.
    Keywords:  DISCERN; Douyin; Global Quality Score; breast cancer; cancer screening; health information; information quality; mammography; medical content; social media; video; web-based education
    DOI:  https://doi.org/10.2196/59483
  42. BMC Public Health. 2025 Feb 18. 25(1): 656
       BACKGROUND: Migraine is an extremely prevalent and disabling primary neurological disease worldwide. Although multiple forms of patient education for migraine management have been employed in the past decades, the quality and reliability of headache-related online videos targeting migraine patients remained unclear, particularly those in China. Therefore, in this study, our research team aimed to explore the overall quality and credibility of online videos concerning patient education on migraine treatment in China Mainland.
    METHODS: A total of 182 online videos concerning migraine treatment were retrieved from four most popular Chinese language online video platforms, including Douyin, BiliBili, Haokan Video, and Xigua Video. Our research team collected the producer identity and basic information of these videos, and then used two major score instruments, i.e., the Global Quality Score (GQS) scale and the DISCERN questionnaire, to evaluate the quality and reliability of its content. Subsequently, overall descriptive analysis and detailed comparison among specific video platforms and producers were performed. Finally, using the Spearman correlation coefficient, we also explored the potential correlation between video general information and video quality and reliability.
    RESULTS: The overall quality and reliability of the migraine-related information provided by online videos were poor, yet those uploaded to Douyin were relatively more satisfactory. Among all study videos, 10 encouraged patients to keep a headache diary, 12 warned about the risk of medication overuse, and 32 emphasized the preventive treatment of chronic migraine. However, the treatment recommendations proposed by video creators were highly heterogenous, with the most frequently mentioned pharmacological, non-pharmacological, and traditional Chinese medicine measures being Triptans (n = 57, 31.3%), massage (n = 40, 22.0%), and acupuncture (n = 31, 17.0%), respectively. We also observed slight positive correlations between video quality and the numbers of likes and comments received.
    CONCLUSIONS: The results of our research revealed that the quality and reliability of Chinese language online videos focused on patient education for migraine treatment were inadequate, suggesting that viewers should treat these contents with caution. However, the prospects for video-based patient education remain promising. Implementing appropriate strategies, such as strengthening regulations on health-related videos and instituting a review process conducted by medical professionals, may elevate the overall quality and trustworthiness of medical information shared through online video platforms.
    Keywords:  Headache; Migraine; Online Video; Patient Education; Treatment
    DOI:  https://doi.org/10.1186/s12889-025-21861-x
  43. Arthroplast Today. 2024 Dec;30 101486
       Background: The utilization of social media for health-related purposes has surged, especially during the COVID-19 pandemic. TikTok, a short-form video platform, has seen substantial growth, becoming a prominent medium for health information dissemination. However, the lack of regulation poses challenges in evaluating the validity of TikTok content.
    Methods: This cross-sectional study assesses TikTok videos related to total knee arthroplasty rehabilitation exercises. Search terms identified 84 videos, with 64 meeting the inclusion criteria. Engagement metrics and quality scores were analyzed, utilizing the DISCERN tool and the Total Knee Replacement Exercises Education Score.
    Results: The analyzed videos accumulated nearly 6 million views, with a median of 10,293.5 (interquartile range = 4139.3-26,100.0). Health-care professionals contributed 48% of the content. Despite higher engagement metrics for health-care professional videos, the overall quality, as indicated by DISCERN and Total Knee Replacement Exercises Education scores, remained poor. No videos achieved an "excellent" rating, with the majority categorized as "poor."
    Conclusions: This study underscores TikTok's substantial role in total knee arthroplasty rehabilitation information dissemination but reveals a critical deficit in content quality and reliability. Health-care professionals marginally outperformed general users but displayed overall inadequacy. The study emphasizes the necessity for improving the quality of health-related content on emerging social media platforms, especially within the realm of orthopaedic surgery.
    Level of Evidence: Level III, Cross-sectional study.
    Keywords:  COVID-19 pandemic; Social media; TikTok; Total knee arthroplasty; Total knee arthroplasty rehabilitation
    DOI:  https://doi.org/10.1016/j.artd.2024.101486
  44. JMIR Cancer. 2025 Feb 13.
       BACKGROUND: Lack of information, awareness, and misconceptions about clinical trials are major barriers to cancer clinical trial participation. Digital and social media are dominant sources of health information and offer optimal opportunities to improve public medical awareness and education by providing accurate and trust-worthy sources of health information from reliable sources. Infotainment, material intended to both entertain and inform, is an effective strategy for engaging and educating audiences that can be easily disseminated using social media and may be a novel way to improve awareness of and recruitment in clinical trials.
    OBJECTIVE: The purpose of this study was to evaluate whether an infotainment video promoting a clinical trial, disseminated using social media, could drive health information seeking behaviors.
    METHODS: As part of a video series, we created an infotainment video focused on promotion of a specific cancer clinical trial. We instituted a dissemination and marketing process on Facebook to measure video engagement and health information seeking behaviors among targeted audiences who expressed interest in breast cancer research and organizations. To evaluate video engagement, we measured reach, retention, outbound clicks, and outbound click-through rate. Frequencies and descriptive statistics were used to summarize each measure.
    RESULTS: The video substantially increased health information seeking behavior by increasing viewership from 1 visitor one month prior to launch to 414 outbound clicks from the video to the clinical trial webpage during the 21-day social media campaign period.
    CONCLUSIONS: Our study shows that digital and social media tools can be tailored for specific target audiences, are scalable, and can be disseminated at low cost, making it an accessible educational, recruitment, and retention strategy focused on improving awareness of clinical trials.
    CLINICALTRIAL: ClinicalTrials.gov NCT03418961.
    DOI:  https://doi.org/10.2196/56098
  45. J Med Libr Assoc. 2025 Jan 14. 113(1): 86-87
      Digital Object Identifiers (DOIs) are a key persistent identifier in the publishing landscape to ensure the discoverability and citation of research products. Minting DOIs can be a time-consuming task for repository librarians. This process can be automated since the metadata for DOIs is already in the repository record and DataCite, a DOI minting organization, and Open Repository, a DSpace repository platform, both have application programming interfaces (APIs). Existing software enables bulk DOI minting. However, the institutional repository at UMass Chan Medical School contains a mixture of original materials that need DOIs (dissertations, reports, data, etc.) and previously published materials that already have DOIs such as journal articles. An institutional repository librarian and her librarian colleague with Python experience embarked on a paired programming project to create a script to mint DOIs on demand in DataCite for individual items in the institution's Open Repository instance. The pair met for one hour each week to develop and test the script using combined skills in institutional repositories, metadata, DOI minting, coding in Python, APIs, and data cleaning. The project was a great learning opportunity for both librarians to improve their Python coding skills. The new script makes the DOI minting process more efficient, enhances metadata in DataCite, and improves accuracy. Future script enhancements such as automatically updating repository metadata with the new DOI are planned after the repository upgrade to DSpace 7.
    Keywords:  DSpace; DataCite; Institutional Repositories; Open Repositories; Python
    DOI:  https://doi.org/10.5195/jmla.2025.2076