Res Integr Peer Rev. 2025 Oct 27. 10(1): 23
BACKGROUND: Gender and geographical disparities have been widely reported in the peer-review process of biomedical journals. Artificial Intelligence (AI) is increasingly transforming the publishing system; however, its potential to identify suitable reviewers, and whether it might reduce, replicate or reinforce existing biases in peer review has never been comprehensively investigated. This study sought to determine the usefulness of AI in identifying expert scientists in medicine taking into consideration gender and geographical diversity, equity and inclusion (DEI).
METHODS: The title and abstract of 50 research articles published in high-impact biomedical journals between November 2023 and September 2024 were fed into a large language model software (GPT-4o), which was prompted to identify 20 distinguished scientists in the study's field. Two trials were randomly performed with and without a gender and geographical DEI prompt. Scientists were classified based on gender, geographical location, and country of affiliation income level. Furthermore, the number of peer-reviewed publications, Google Scholar-derived total citations and h-index were computed.
RESULTS: Without a DEI prompt, GPT-4o primarily identified male scientists (68%) and those affiliated to high-income countries (95.3%). Conversely, when DEI was explicitly prompted, GPT-4o generated a gender-balanced (51% females) and geographically diverse list of scientists. Specifically, the proportion of scientists from high-income countries decreased to 42.3%, while representation from upper-middle (3.2% to 26.2%), lower-middle (1.2% to 26.1%), and low-income (0.2% to 5.4%) countries significantly increased. The number of publications (without vs. with DEI: 284 ± 237 vs. 281 ± 245, P = 0.77), citations (48,445 ± 60,270 vs. 53,792 ± 71,903, P = 0.13), and h-index (79 ± 43 vs. 76 ± 43, P = 0.15) did not differ between groups.
CONCLUSIONS: When not prompted to consider DEI, GPT-4o successfully identified expert scientists, but primarily males and those from high-income countries. However, when DEI was explicitly prompted, GPT-4o generated a gender-balanced and geographically diverse list of scientists. The academic productivity was considerably high and comparable between groups, suggesting that GPT-4o identified potentially skilled scientists who could reasonably serve as reviewers for scientific journals. These findings provide evidence that AI can be an ally in combating gender and geographical gaps in peer review, though DEI should be explicitly prompted. Conversely, AI could perpetuate existing biases if not carefully managed.
Keywords: AI in medicine; ChatGPT; Diversity; Equity; Gender disparities; Geographical disparities; Inclusion; Scientific review