This study aimed to assess the medical accuracy and readability of responses provided by ChatGPT in relation to patient/parent queries about strabismus. The authors compared responses from the free version (Chat 3.5) with those from a subscription service (version 4.0). They also investigated geographical variations (Florida and California) and reproducibility in responses over time. Thirty-four patient-centric questions were generated by 2 ophthalmologists: questions regarding diagnosis, epidemiology, clinical signs and symptoms, prognosis, treatment and research. These were based on the most common questions asked by patients and carers. The same questions were entered at the same time in both locations at day 0, 1 week and 1 month in March 2024 using incognito mode. Responses were rated as (1) acceptable in 59.9%, (2) accurate but missing key information or minor inaccuracy in 35.3%, and (3) inaccurate and potentially harmful in 4.8% at day 0. This changed at 1 week to 63.6%, 31.3% and 5.1%, respectively, and at 1 month to 66.9%, 30.5% and 2.6%, respectively. This indicates a potential A1 learning component with improved accuracy over time. There was a greater number of acceptable responses in California than Florida (76.5% vs 50.5%). There were greater minor accuracy responses for Florida (42.6% vs 22.1%). This is potentially influenced by differences in educational attainment, socioeconomic status, healthcare access and cultural factors. Readability scores equated to higher than high school grade level, i.e. poor ease. Intra- and inter-rater reliability analysis showed fair to moderate level of consistency. Acceptable responses were more for questions on diagnosis, signs and symptoms and epidemiology. The authors conclude that ChatGPT and other similar platforms are important in supporting patients to obtain accurate, first-line information regarding strabismus. However, information can be inaccurate and it is important for health care providers to guide patients in the cautious interpretation of AI information. Using pre-prompts of 6th-grade level can significantly improve readability.
Accuracy of ChatGPT responses for questions about strabismus
Reviewed by Fiona Rowe
Accuracy and readability of ChatGPT responses to patient-centric strabismus questions.
CONTRIBUTOR
Fiona Rowe (Prof)
Institute of Population Health, University of Liverpool, UK.
View Full Profile
