Orthop Traumatol Surg Res. 2023 Oct 12. pii: S1877-0568(23)00224-4. [Epub ahead of print] 103706
BACKGROUND: Artificial intelligence (AI) tools, although beneficial for data collection and analysis, can also facilitate scientific fraud. AI detectors can help resolve this problem, but their effectiveness depends on their ability to track AI progress. In addition, many methods of evading AI detection exist and their constantly evolving sophistication can make the task more difficult. Thus, from an AI-generated text, we wanted to 1) evaluate the AI detection sites on a text generated entirely by the AI, 2) test the methods described for evading AI detection, and 3) evaluate the effectiveness of these methods to evade AI detection on the sites tested previously.
HYPOTHESIS: Not all AI detection tools are equally effective in detecting AI-generated text and some techniques used to evade АI detection can make an AI-produced text almost undetectable.
MATERIALS AND METHODS: We created a text with ChatGPT-4 (Chat Gеnеrаtivе Prе-trained Trаnsfоrmеr) and submitted it to 11 АI detection web tools (Оriginаlity, ZеrоGPT, Writеr, Cоpylеаks, Crоssplag, GPTZеrо, Sapling, Cоntеnt аt scаlе, Cоrrеctоr, Writеfull еt Quill), bеfоrе аnd аftеr applying strаtеgiеs tо minimise AI detection. The strategies used to minimize AI detection were the improvement of command messages in ChatPGT, the introduction of minor grammatical errors such as comma deletion, paraphrasing, and the substitution of Latin letters with similar Cyrillic letters (a and о) which is also a method used elsewhere to evade the detection of plagiarism. We have also tested the effectiveness of these tools in correctly identifying a scientific text written by a human in 1960.
RESULTS: From the initial text generated by the AI, 7 of the 11 detectors concluded that the text was mainly written by humans. Subsequently, the introduction of simple modifications, such as the removal of commas or paraphrasing can effectively reduce AI detection and make the text appear human for all detectors. In addition, replacing certain Latin letters with Cyrillic letters can make an AI text completely undetectable. Finally, we observe that in a paradoxical way, certain sites detect a significant proportion of AI in a text written by a human in 1960.
DISCUSSION: AI detectors have low efficiency, and simple modifications can allow even the most robust detectors to be easily bypassed. The rapid development of generative AI raises questions about the future of scientific writing but also about the detection of scientific fraud, such as data fabrication.
LEVEL OF EVIDENCE: III; Control case study.