Launched in November 2022, ChatGPT is a chatbot that can not only engage in human-like conversations, but also provide accurate answers to a wide range of knowledge questions. The chatbot, created by the company OpenAI, is based on a family of “big language models” – algorithms that can recognize, predict and generate text based on data sets containing hundreds of millions of words.
In the study Appearing this week in PLOS Digital Health, researchers report that ChatGPT performed at or near the passing threshold for the US Medical Licensing Examination (USMLE), a comprehensive, three-part exam that doctors must pass before practicing medicine in the US. In an editorial Accompanying the paper, Leo Anthony Celli, a principal investigator at MIT’s Institute of Medical Engineering and Science, a practicing physician at Beth Israel Deaconess Medical Center, and an associate professor at Harvard Medical School, and his co-authors argue that the success of ChatGPT in this trial should be a wake-up call to the medical community. :
Q: What do you think the success of ChatGPT on the USMLE reveals about the nature of student medical education and assessment?
A: Framing medical knowledge as something that can be included in multiple choice questions creates a cognitive framework of false certainty. Medical knowledge is often taught as a fixed model of health and disease. Treatment effects are presented as stable over time despite ever-changing practice patterns. Mechanistic models are passed from teachers to students with little emphasis on how robust those models are, the uncertainties that persist around them, and how they need to be recalibrated to reflect advances worthy of application in practice.
ChatGPT passed an exam that rewards memorizing the components of a system rather than analyzing how it works, how it fails, how it’s built, how it’s maintained. Its success highlights some of the flaws in how we train and assess medical students. Critical thinking requires an appreciation that fundamental truths in medicine are constantly changing, and more importantly, an understanding of how and why they change.
Q: What steps do you think the medical community should take to change the way students are taught and evaluated?
A: To learn is to use the body of current knowledge, to understand its gaps, and to seek to fill those gaps. It requires comfort and being able to investigate uncertainties. We fail as teachers by not teaching students how to understand existing knowledge gaps. We fail them when we preach confidence over curiosity and arrogance over humility.
Medical education also requires awareness of biases in the way medical knowledge is created and validated. These biases are best addressed by optimizing cognitive diversity within a community. More than ever, there is a need to inspire interdisciplinary collaborative learning and problem solving. Medical students need data science skills that will enable every health care professional to contribute, continuously evaluate, and recalculate medical knowledge.
Q: Do you see any upside to ChatGPT’s success in this exam? Are there beneficial ways in which ChatGPT and other forms of AI can contribute to the practice of medicine?
A: There is no doubt that large language models (LLMs) such as ChatGPT are very powerful tools for sifting through content and extracting knowledge beyond the capabilities of experts, or even groups of experts. However, we need to address the problem of data bias before we can use LLMs and other AI technologies. The body of knowledge on which LLMs are trained, both medically and beyond, is dominated by content and research from well-funded institutions in high-income countries. It is not representative of most of the world.
We also learned that even mechanistic models of health and disease can be biased. These inputs are fed to encoders and transformers that ignore these biases. Fundamental truths in medicine are constantly changing, and there is currently no way to determine when fundamental truths have deviated. LLMs do not assess the quality and bias of the content they are trained on. They also do not provide a level of uncertainty around their product. But the perfect should not be the enemy of the good. There is a huge opportunity to improve how healthcare providers currently make clinical decisions, which we know are tainted by unconscious bias. I have no doubt that AI will deliver on its promise once we optimize data entry.