A study from Google Health – the first to look at the impact of a deep learning tool in real clinical settings – reveals that even the most accurate AI system can actually make things worse if not tailored to the clinical environments in which they will operate.

Standards set by the US Food and Drug Administration (FDA) and CE mark in Europe focus mainly on accuracy and safety but patient outcome is not an explicit requirement.

Google Health arrived at the conclusion after it tested the company’s AI tool in Thailand, where the government has an annual goal of screening 60% people with diabetes for diabetic retinopathy, which could result in blindness if not treated early. But with around 4.5 million patients to only 200 retinal specialists, clinics are struggling to meet the target. The AI developed by Google Health can identify signs of diabetic retinopathy from an eye scan with more than 90% accuracy and was deployed in 11 clinics. However, feedback wasn’t entirely positive.

The deep learning tool was trained with high quality scans but in actual clinical settings, images were poorly taken due to inadequate lighting and more than a fifth of images were rejected. Furthermore, because the system had to upload images to the cloud for processing, poor internet connections in several clinics also caused delays.

Google Health is now working with the local medical facilities to design new workflows, in an effort to overcome these logistical challenges