Case Study 1: Repurposing an old tool using artificial intelligence (AI)
Background: Electrocardiogram (ECG) was invented in 1895 and it has not changed much over the years, including the way results are being reported and printed out from the machine. Traditionally, ECG is used to diagnose various cardiac problems. An AI-enhanced ECG interpretation will streamline the process or go beyond human capacity as it not just provides cardiologists with normal and abnormal classifications but also finer insights that are not necessarily noticeable and in turn, value add their triage work.
What have been done: According to Dr. Francisco Lopez-Jimenez, Co-Director of AI in Cardiovascular Medicine at Mayo Clinic, three convolutional neural network (CNN) were created targeting at
– Hypertrophic Cardiomyopathy (i.e., sudden death in relatively healthy individuals, usually occur under the age of 40 and after they play sports and it is difficult to diagnose). It was used on a 25-year-old female patient whose ECG appeared to be completely normal but was diagnosed with HC.
In fact, the patient just undergone septal myectomy or a surgery to remove the big walls obstructing blood flow. A few days after the surgery, her ECG became very abnormal but the algorithm demonstrated a low chance (2.5% probability) of developing HC.
– Atrial Fibrillation (i.e., Paroxysmal; sometimes asymptomatic; hard to diagnose because it presents for a few minutes within an interval of days. Often, the first manifestation is stroke, there is an urgency to locate these patients in time). In this case, the patient started having ECG taken in 2013. When he had his first stroke in 2014, his ECG was still considered normal. It was not until the second stroke took place in 2020, that his ECG went above the threshold and he was considered high risk for AF.
So, could AI have prevented all these if it was used years before the onset of stroke? Dr. Lopez-Jimenez said this will be an interesting question to be answered in the next few years, especially how can the ECG be completely normal, without any telltale sign of an increased risk in AF over the years?
– Detection of low ejection fraction (i.e., an indication that the heart is not functioning as well as it should be when EF falls below 55%). A 35-year-old male patient presented at the clinic after the unexpected death of his sister. The algorithm said a 76% probability of having low EF and echocardiogram EF showing an 18% reduction although, again, the ECG was seemingly normal.
In another case, the patient was labeled as low EF positive but the real EF was borderline around 50%. So, it was officially labeled as a false positive. However, in a sub-group analysis, following up on individuals who had been labeled as false positive over time and assess whether they have an increased probability of developing low EF over time. Dr. Lopez-Jimenez and his team found that while the patient’s initial EF was not low but five years later it became 31%. So, the question is what makes a “real” false positive. The assumption made was the low EF suggested by ECG is either a “real” low EF or a probability of developing “real” low EF over time.
Case Study 2: Using machine learning to automate Ventricular Contouring (VC) in Congenital Heart Disease (CHD)
Background: Dr. Animesh Tandon, Assistant Professor of Pediatrics and Joint appointment in Radiology at University of Texas Southwestern Medical School and Children’s Medical Center Dallas said Cardiac Magnetic Resonance Imaging (CMR) is considered the gold standard for measuring ventricular volumes and functions, particularly the right ventricle. It is also one of the better ways of showing 3D anatomy.
To get the values, there is a need to outline the surfaces of the ventricles (i.e., contours) and having a software to add up the sizes of these contours to give an ejection fraction and volume. Normally, it would take about 20 minutes to contour a structurally normal heart while more time is needed for those with CHD. Automating the process will not only improve the clinical CMR (including CT and echocardiogram) workflow and reduced intra-observer variability but also, in the words of Dr. Tandon, reduced boredom.
These values will assist cardiologists to make decisions when CHD patients will need a new pulmonary valve. There are already automated methods exist, such as motion-based and atlas-based methods; iterative methods; deep learning algorithms specifically for measuring right ventricle (RV) volume metrics, other CNNs for adult CMR and so on. Some of which were developed commercially by companies but not many had not published their underline data.
What have been done: Dr. Tandon’s approach was to use a CNN that had already been trained for mostly structurally normal (MSN) adult hearts and team with a commercial partner to bring the results to clinical practice at a faster pace. This was a 2-part study.
For the first part, the algorithm was trained mostly on MSN hearts and tested on rTOF (i.e., repaired Tetralogy of Fallot; TOF is a form of CHD). It was found that MSN contouring was worse for the RV than the LV. For the second part, the algorithm was trained on the 5000 MSN and an added 57 rTOF patient data and tested on a 30 cases rTOF testing dataset. What was observed, this improved algorithm has better spatial characteristics; better correlation to the manual RV End-diastolic volume (EDV) and decreased bias and percentage error in RV EDV.
Nevertheless, the algorithm did contour up into the atrium and did not recognize that part is not the ventricle. This is different from the CMR found in the UK Biobank, which forms the training dataset. Thus, the algorithm will still need to be perfected over time. The tool is now available for clinical use to open up the possibility of more in-depth understanding of cardiac mechanics.
Dr. Tandon urged as a community, there is a need to collate data so that data-driven approaches can be more meaningful and easier to implement. In general, it will be nice to see data from commercial systems without revealing intellectual property to uncover sources of errors and ways to fix these wrongdoings.
Case Study 3: Predicting cardiovascular disease (CVD) using machine learning
Background: Wasif Bokhari, PhD Candidate at Arizona State University and Senior Software Engineer at Citizen Bank said CVD is the biggest cause of death worldwide. Lives can be saved if it’s identified early. As such, Bokhair’s goal is to improve prediction of 10-year risks of CVD and the risk scores that are currently used by cardiologists.
Currently, the two main challenges are imbalanced dataset (i.e., the number of positive cases does not correspond with the number of individuals being tested) and a much bigger cost of false negative (i.e., fail to inform someone with CVD) as compared to false positive (i.e., labeling non-CVD individuals with CVD) due to live loss. A common solution to imbalanced dataset is sampling. However, under-sampling could lead to removal of important data and over-sampling could result in overfitting (i.e., getting trained on the noise rather than the actual signals) and an increase in training time.
What have been done: Bokhari created a CVD decision tree ensemble classifier and proposed the use of Hellinger Distance (i.e., quantification of similarities between two probability distributions) as the tree splitting criterion because the method has been proven to perform well on imbalanced datasets without the need of extensive sampling. To control the amount of false negative (i.e., asymmetric error control, AEC), Bokhari used Neyman-Pearson Lemma (i.e., a method to find out if the hypothesis has the greatest statistical power).
For a traditional machine learning classifier, only the scoring function needs to be constructed from training data as the threshold is always kept as .5, whereas the asymmetric error control CVD classifier created by Bokhari, the training data will be used to calculate bot the scoring function and threshold. Steps to achieve it include sample splitting (i.e., zero represents negative CVD and one represents positive. There were three classes of samples: a mixed of zero and one; the left-out zero and the left out one).
AEC classifier was applied to the first class of samples to become the trained scoring functions. They were then applied to the second class of samples to become classification score. The third class of samples were evaluated to calculate the Type 2 error upper and lower bounds. After the sample splitting is done, order statistics will be used to search for threshold such that the classifiers will always have a number of false negative below a specific value.
The new classifier is found to have a better accuracy and lower false negative rates as compared to the present 10-year CVD risk prediction scores.