ChatGPT fails 83% of the time in diagnosing pediatric cases

In a new study published in JAMA Pediatrics on January 2, researchers from Cohen Children’s Medical Center in New York shed light on ChatGPT‘s shortcomings in accurately diagnosing pediatric cases. The analysis revealed an alarming 83 percent error rate in the chatbot’s responses to hypothetical child illnesses.

The research focused on pediatric case challenges, instances originally presented to groups of physicians for learning purposes. These challenges often involve limited or unusual information, simulating real-life diagnostic complexities. The team examined 100 such challenges published between 2013 and 2023 in JAMA Pediatrics and NEJM.

ChatGPT’s diagnostic accuracy proved dismal, providing incorrect diagnoses for 72 out of the 100 experimental cases. Additionally, 11 responses were deemed “clinically related” to the correct diagnosis but were considered too broad to be considered accurate.

One major shortcoming identified in the study was the AI‘s inability to recognize relationships between medical conditions and external or preexisting circumstances. For example, it failed to connect neuropsychiatric conditions like autism to cases of vitamin deficiency or restrictive diet-based conditions commonly seen in clinical settings.

The study underscores the necessity for continued training and collaboration with medical professionals to enhance ChatGPT’s diagnostic capabilities. Emphasizing the importance of using vetted medical literature and expertise rather than internet-generated information, the report suggests that misinformation cycles prevalent on the internet could compromise AI’s reliability.

While AI-based chatbots using Large Language Models (LLMs) have been tested for various medical tasks, the study reveals a significant gap in their diagnostic potential in a pediatric setting. Despite their success in tasks like passing medical exams, the limitations of these models in recognizing nuanced relationships in pediatric cases highlight the irreplaceable nature of human expertise.

The report concludes by acknowledging the ongoing debate around the role of AI in clinical diagnostics. While AI has demonstrated promise in administrative and communicative tasks, such as generating patient-side text or explaining diagnoses, the study makes it clear that the current state of AI, exemplified by ChatGPT, falls short of replacing human medical professionals in the nuanced field of pediatric diagnoses.