Designing deep learning studies in cancer diagnostics.
Designing deep learning studies in cancer diagnostics.
Source: Oslo University Hospital

Designing medical deep learning systems

Researchers at Oslo University Hospital have analysed whether better design of deep learning studies can lead to the faster transformation of medical practices.

"We propose several protocol items that should be defined before evaluating the external cohort" says first author Andreas Kleppe at the Institute for Cancer Diagnostics and Informatics at Oslo University Hospital. "In this way, the evaluation becomes rigorous and more reliable. Such evaluations would make it much clearer which systems are likely to work well in clinical practice, and these systems should be further assessed in phase III randomized clinical trials."

Slow implementation is partly a natural consequence of the time needed to evaluate and adapt systems affecting patient treatment. However, many studies assessing well-functioning systems are at high risk of bias.

According to Kleppe, even among the seemingly best studies that evaluate external cohorts, few predefine the primary analysis. Adaptations of the deep learning system, patient selection or analysis methodology can make the results presented over-optimistic.

The frequent lack of stringent evaluation of external data is of particular concern. Some systems are developed or evaluated on too narrow or inappropriate data for the intended medical setting. The lack of a well-established sequence of evaluation steps for converting promising prototypes into properly evaluated medical systems limits deep learning systems' medical utilization.

Millions of adjustable parameters

Deep learning facilitates utilization of large data sets through direct learning of correlations between raw input data and target output, providing systems that may use intricate structures in high-dimensional input data to model the association with the target output accurately. Whereas supervised machine learning techniques traditionally utilized carefully selected representations of the input data to predict the target output, modern deep learning techniques use highly flexible artificial neural networks to correlate input data directly to the target outputs.

The relations learnt by such direct correlation will often be true but may sometimes be spurious phenomena exclusive to the data utilized for learning. The millions of adjustable parameters make deep neural networks capable of performing correctly in training sets even when the target outputs are randomly generated and, therefore, utterly meaningless.

Design and evaluation challenges

The high capacity of neural networks induces severe challenges for designing and developing deep learning systems and validating their performance in the intended medical setting. An adequate clinical performance will only be possible if the system has good generalisability to subjects not included in the training data.

The design challenges involve selecting appropriate training data, such as representativeness of the target population. It also includes modeling questions such as how the variation of training data may be artificially increased without jeopardizing the relationship between input data and target outputs in the training data.

"To achieve good performance for new patients, it is crucial to use various training data. Natural variation is always essential, but so is introducing artificial variation."

Andreas Kleppe

The validation challenge includes verifying that the system generalizes well. For example, does it perform satisfactorily when evaluated on relevant patient populations at new locations and when input data are obtained using differing laboratory procedures or alternative equipment? Moreover, deep learning systems are typically developed iteratively, with repeated testing and various selection processes that may bias results. Similar selection issues have been recognized as a general concern for the medical literature for many years.

Thus, when selecting design and validation processes for diagnostic deep learning systems, one should focus on the generalization challenges and prevent more classical pitfalls in data analysis. "To achieve good performance for new patients, it is crucial to use various training data. Natural variation is always essential, but so is introducing artificial variation. These types of variation complement each other and facilitate good generalisability," says Kleppe.

The research was published in Nature Reviews Cancer.

Subscribe to our newsletter

Related articles

How to train a robot - using AI and supercomputers

How to train a robot - using AI and supercomputers

Computer scientists use TACC systems to generate synthetic objects for robot training.

Biomedical research: deep learning outperforms machine learning

Biomedical research: deep learning outperforms machine learning

Deep-learning methods have the potential to offer substantially better results, generating superior representations for characterizing the human brain.

Using AI to find new uses for existing medications

Using AI to find new uses for existing medications

Scientists have developed a machine learning method that crunches massive amounts of data to help determine which existing medications could improve outcomes in diseases for which they are not prescribed.

Neural network learns when it should not be trusted

Neural network learns when it should not be trusted

Researchers have developed a way for deep learning neural networks to rapidly estimate confidence levels in their output.

5 ways AI is used against COVID-19

5 ways AI is used against COVID-19

Find out more about how scientists and physician are using AI to make contributions in the fight against the coronavirus.

A computer reads and predicts thoughts

A computer reads and predicts thoughts

Researchers at the University of Helsinki have developed a technique in which a computer models visual perception by monitoring human brain signals.

Optimizing neural networks on a brain-inspired computer

Optimizing neural networks on a brain-inspired computer

Research shows how so-called “critical states” can be used to optimize artificial neural networks running on brain-inspired neuromorphic hardware.

Neural networks: artificial brains need sleep too

Neural networks: artificial brains need sleep too

States that resemble sleep-like cycles in simulated neural networks quell the instability that comes with uninterrupted self-learning in artificial analogs of brains.

AI outperform doctors: Experts express concerns

AI outperform doctors: Experts express concerns

Many studies claiming that AI is as good as (or better than) human experts at interpreting medical images are of poor quality and are arguably exaggerated, warn researchers in The BMJ.

Popular articles