Cureus. 2025 Dec;17(12):
e98901
Introduction Stroke is a major cause of global morbidity and mortality. Readability of educational material is critical for rapid clinical decision-making among healthcare professionals. UpToDate (UpToDate, Inc., Waltham, MA) is a widely used, peer-reviewed point-of-care clinical resource, while ChatGPT (OpenAI, San Francisco, CA) is an emerging AI-based educational support tool. However, a formal comparison of their linguistic accessibility has not been performed. Objective To compare the readability and linguistic complexity of educational material on stroke generated by ChatGPT (GPT-4o) versus content retrieved from UpToDate, using validated readability metrics. Design, setting, and participants This cross-sectional study was conducted between May 27 and June 4, 2025. ChatGPT (GPT-4o, accessed May 27, 2025) was prompted to generate educational content on stroke. A corresponding section from UpToDate (accessed May 27, 2025) was extracted. Only prose content was analyzed. Readability parameters assessed included total word count, sentence count, word/sentence ratio (average words per sentence), Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG) Index, difficult word count, and difficult word percentage. Data were analyzed using IBM SPSS v25 (IBM Corp., Armonk, NY) and R v4.3.2 (R Foundation for Statistical Computing, Vienna, Austria). The Mann-Whitney U test was used. P < 0.05 was considered statistically significant. Results UpToDate content was substantially longer (median = 2772 vs. 304 words; p = 0.008) and used more sentences (median = 134 vs. 23; p = 0.032) and difficult words (median = 857 vs. 88; p = 0.008) compared to ChatGPT. The word/sentence ratio (average words per sentence) was also higher (21.7 vs. 13.2; p = 0.008). However, no statistically significant differences were observed for FRE (p = 1.000), FKGL (p = 0.222), SMOG Index (p = 0.151), or difficult word percentage (p = 0.690). Conclusions ChatGPT produces shorter and more concise educational content on stroke while maintaining comparable readability to UpToDate. The lower linguistic density may enhance rapid orientation for trainees; however, the reduced depth indicates ChatGPT should supplement, not replace, established peer-reviewed resources. Future research should explore multiple medical topics, additional AI models, and assess the clinical applicability and accuracy of AI-generated content.
Keywords: artificial intelligence; chatgpt; medical education; readability score; stroke; uptodate