The continued increase in diagnostic imaging studies, including 3D imaging studies such as computed tomography (CT), means that radiologists are looking at thousands of images each day, searching for tiny abnormalities that can signal life-threatening emergencies. The number of images from each brain scan can be so large that on a busy day, radiologists may opt to scroll through some large 3D stacks of images using mice with frictionless wheels, almost like viewing a movie. But it could be much more efficient—and potentially more accurate—if AI technology could pick out the images with significant abnormalities, so radiologists could examine them more closely. “We wanted something that was practical, and for this technology to be useful clinically, the accuracy level needs to be close to perfect,” said Esther Yuh, MD, Ph.D., associate professor of radiology at UCSF and co-corresponding author of the study. “The performance bar is high for this application, due to the potential consequences of a missed abnormality, and people won’t tolerate less than human performance or accuracy.”
The algorithm the team developed took just one second to determine whether an entire head scan contained any signs of hemorrhage. It also traced the detailed outlines of the abnormalities it found—demonstrating their location within the brain’s three-dimensional structure. Some spots may be on the order of 100 pixels in size, in a 3D stack of images containing over a million of them, and even expert radiologists sometimes miss them, with potentially grave consequences.
The algorithm found some small abnormalities that the experts missed. It also noted their location within the brain, and classified them according to subtype, information that physicians need to determine the best treatment. And the algorithm provided all of this information with an acceptable level of false positives—minimizing the amount of time that physicians would need to spend reviewing its results.
Yuh said one of the hardest things to achieve with the AI technology was the ability to determine whether an entire exam, consisting of a 3-D “stack” of approximately 30 images, was normal. “Achieving 95 percent accuracy on a single image, or even 99 percent, is not OK, because in a series of 30 images, you’ll make an incorrect call on one of every 2 or 3 scans,” she said. “To make this clinically useful, you have to get all 30 images correct—what we call exam level accuracy. If a computer is pointing out a lot of false positives, it will slow the radiologist down, and may lead to more errors.”
he radiology experts said the algorithm’s ability to find very small abnormalities and demonstrate their location in the brain was a substantial advance. “The hemorrhage can be tiny and still be significant,” said Pratik Mukherjee, MD, Ph.D., professor of radiology at UCSF. “That’s what makes a radiologist’s job so hard, and that’s why these things occasionally get missed. If a patient has an aneurysm, and it’s starting to bleed, and you send them home, they can die.”
Jitendra Malik, Ph.D., the Arthur J. Chick Professor of Electrical Engineering and Computer Sciences at Berkeley, said the key was choosing which data to feed into the model. The new study made use of a type of deep learning known as a fully convolutional neural network, or FCN, which trains algorithms on a relatively small number of images, in this case 4,396 CT exams. But the training images used by the researchers were packed with information, because each small abnormality was manually delineated at the pixel level. The richness of this data—along with other steps that prevented the model from misinterpreting random variations or “noise” as meaningful—created an extremely accurate algorithm.
The scientists could have chosen to feed an entire stack of images, or one complete image, all at once. Instead, they chose to feed only a portion or “patch” of an image at a time, contextualizing this image with the ones that directly preceded and followed it in the stack. Viewing an image in patches is also how people read text or look at a computer screen, and this enabled the network to learn from the relevant information in the data without “overfitting” the model by drawing conclusions based on insignificant variations that were also present in the data. They called their model PatchFCN.