A team of computer scientists at the Lawrence Livermore National Laboratory (LLNL), a federal research facility founded by the University of California, Berkeley proposed the use of Confidence Calibration for a more reliable classifier model to predict disease types based on diagnostic images. In a preprint made available through Cornell University’s open-access website arXiv, the team hopes that this novel deep learning approach will assist clinicians to improve on their interpretability of imageries without sacrificing in accuracy.
Overcoming the challenge in quantifying AI’s reliability
Jay Thiagarajan, LLNL Computational Scientist and the study’s Principal Investigator said as artificial intelligence (AI) is assimilated into high-risk tools and continued to influence clinical decision-making processes, reliability becomes a crucial measure for users to examine its behavior and tell whether there will be any undesirable consequences shall something go wrong.
“If something as simple as changing the diversity of the population can break your system, you need to know that, rather than deploy it and then find out,” Thiagarajan told the LLNL news portal. Thus, creating a yardstick to tell how reliable a model is when they are used in real-world settings becomes the motivation behind the study. However, quantifying the reliability of machine learning models can be a daunting task. This is especially so as researchers do not wish to undermine the interpretability of the model.
To overcome the challenge, researchers first created a “reliability plot”, involving human experts in a so-called “inference loop” to determine the trade-off between accuracy and the kind of autonomy given to the model. The research team then evaluates the reliability of the model by preventing it from making predictions when its confidence is low. They discovered that models undergone this calibration-driven learning produce are more accurate (80% accuracy) and reliable when detecting disease state of a series of dermoscopy images of lesions used for skin cancer screening as compared to existing deep learning solutions (74% accuracy).
Confidence Calibration to improve AI’s reliability
Thiagarajan believes confidence calibration is a new way of building interpretability tools to solve scientific or medical problems. As the model is now being designed in an introspective manner, whereby users can enter a speculation concerning a patient (i.e., the patient is probably experiencing an onset of a medical condition because of these symptoms) and the model will yield contrasting statements that support the hypothesis as much as possible. This “what-if” analysis not enables the model to gain insights into the sea of data but also highlights its strengths and weaknesses.
Recently, Thiagarajan had also applied these models to review chest x-ray images of COVID-19 positive patients to better understand the roles of demography, social habits and medical interventions in these cases. He noted that AI needs to handle an amount of data that human beings can’t handle and generate results that human experts can understand in order to be considered helpful. Thus, interpretability and introspection techniques will make AI more powerful and at the same time, provides opportunities for physicians to devise new hypotheses about certain diseases and policymakers to make decisions that will impact public health.
“We were exploring how to make a tool that can potentially support more sophisticated reasoning or inferencing. These AI models systematically provide ways to gain new insights by placing your hypothesis in a prediction space. The question is ‘How should the image look if a person has been diagnosed with a condition A versus condition B?’ Our method can provide the most plausible or meaningful evidence for that hypothesis. We can even obtain a continuous transition of a patient from state A to state B, where the expert or a doctor defines what those states are,” Thiagarajan explains.