Cureus. 2025 Jan;17(1): e78313
INTRODUCTION: The rise of artificial intelligence (AI), including generative chatbots like ChatGPT (OpenAI, San Francisco, CA, USA), has revolutionized many fields, including healthcare. Patients have gained the ability to prompt chatbots to generate purportedly accurate and individualized healthcare content. This study analyzed the readability and quality of answers to Achilles tendon rupture questions from six generative AI chatbots to evaluate and distinguish their potential as patient education resources.
METHODS: The six AI models used were ChatGPT 3.5, ChatGPT 4, Gemini 1.0 (previously Bard; Google, Mountain View, CA, USA), Gemini 1.5 Pro, Claude (Anthropic, San Francisco, CA, USA) and Grok (xAI, Palo Alto, CA, USA) without prior prompting. Each was asked 10 common patient questions about Achilles tendon rupture, determined by five orthopaedic surgeons. The readability of generative responses was measured using Flesch-Kincaid Reading Grade Level, Gunning Fog, and SMOG (Simple Measure of Gobbledygook). The response quality was subsequently graded using the DISCERN criteria by five blinded orthopaedic surgeons.
RESULTS: Gemini 1.0 generated statistically significant differences in ease of readability (closest to average American reading level) than responses from ChatGPT 3.5, ChatGPT 4, and Claude. Additionally, mean DISCERN scores demonstrated significantly higher quality of responses from Gemini 1.0 (63.0±5.1) and ChatGPT 4 (63.8±6.2) than ChatGPT 3.5 (53.8±3.8), Claude (55.0±3.8), and Grok (54.2±4.8). However, the overall quality (question 16, DISCERN) of each model was averaged and graded at an above-average level (range, 3.4-4.4).
DISCUSSION AND CONCLUSION: Our results indicate that generative chatbots can potentially serve as patient education resources alongside physicians. Although some models lacked sufficient content, each performed above average in overall quality. With the lowest readability and highest DISCERN scores, Gemini 1.0 outperformed ChatGPT, Claude, and Grok and potentially emerged as the simplest and most reliable generative chatbot regarding management of Achilles tendon rupture.
Keywords: achilles tear; artificial intelligence; deep learning; orthopedics; sports medicine