Myura Nagendran, Yang Chen, Christopher A Lovejoy, Anthony C Gordon, Matthieu Komorowski, Hugh Harvey, Eric J Topol, John P A Ioannidis, Gary S Collins, Mahiben Maruthappu


This is a good systematic review of medical imaging studies that utilize diagnostic deep learning algorithms with a focus on performance comparison between these algorithms and expert clinicians. The result demonstrated that there are very few prospective deep learning randomized controlled studies in medical imaging that compare deep learning with groups of human experts (usually small in number). The majority of these studies are non-randomized controlled trials that are not prospective and are at risk for bias; in addition, these studies did not adhere to the standard transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD). Lastly, data and code are often not available for these studies. Future studies in this category need more clinical relevance and methodology transparency, and probably also at a much higher level of collaboration amongst centers. We as a community of AI enthusiasts for clinical medicine and health care need to encourage human and machine synergy while tempering the exuberant conclusions sometimes seen both in publications as well as the media. Of note, as these studies are designed by human investigators, humans (more than machines) are accountable for the myriad of deficiencies.