J Craniomaxillofac Surg. 2026 Jan 13. pii: S1010-5182(26)00025-9. [Epub ahead of print]54(3):
104468
OBJECTIVE: As generative AI tools like ChatGPT-4 gain traction in academic writing, questions arise regarding their credibility, scientific depth, and detectability. This study aimed to evaluate whether experienced oral and maxillofacial surgeons (OMFS) can reliably distinguish between AI- and human-authored manuscripts, and to compare both in terms of coherence, scientific rigor, citation accuracy, and overall quality.
MATERIALS AND METHODS: Three core OMFS topics-impacted third molar surgery, cyst enucleation, and TMJ arthroscopy-were selected. For each topic, two manuscripts (∼2500 words each) were independently written: one by ChatGPT-4 and one by senior OMFS clinicians. Twenty board-certified OMFS reviewers, blinded to authorship, evaluated these manuscripts using a validated 25-item questionnaire assessing five domains: readability, scientific depth, reference accuracy, writing quality, and methodological rigor. Reviewers also attempted to identify the authorship source. Citation accuracy was verified through manual PubMed cross-checking. Statistical analysis included paired t-tests, chi-square tests, and ANOVA.
RESULTS: Human-authored manuscripts outperformed AI-generated ones in scientific depth (4.5 ± 0.4 vs. 3.9 ± 0.6, p < 0.01), reference accuracy (4.9 ± 0.1 vs. 4.4 ± 0.7, p < 0.001), and overall writing quality (4.7 ± 0.4 vs. 4.1 ± 0.5, p < 0.01). Coherence and readability scores were comparable (human: 4.8 ± 0.4; AI: 4.6 ± 0.5; p = 0.07). Reviewers correctly identified manuscript authorship only 54 % of the time (p = 0.68), suggesting AI-generated texts are often indistinguishable from human ones in surface fluency.
CONCLUSION: ChatGPT-4 is capable of producing readable and structurally sound OMFS manuscripts. However, deficiencies in scientific reasoning and citation fidelity underscore the need for expert oversight. As AI tools integrate into academic workflows, transparent disclosure and editorial safeguards are imperative to uphold scientific integrity.
Keywords: Artificial intelligence; ChatGPT; Double-blind evaluation; Generative AI; Oral and maxillofacial surgery; Scientific writing