In recent years, easy access to massive sets of labelled data, increased computing power provided from GPUs and pretrained models built by experts made Deep Learning dominate in many computer vision and pattern recognition tasks. Before the breakthrough of AlexNet, computers were trained using example data extracted by human researchers, handcrafted features. Deep learning allowed machines to learn the features that optimally represent the data for the specific problem. In medical applications, the transition from systems that use handcrafted features to systems that learn features from the data has been gradual. The number deep learning applications to medical image analysis grew rapidly in 2015 and 2016 and now deep learning is dominant at major conferences and competitions.
Deep learning methods are highly effective when the number of available samples are large during training. Thus, in applications for natural images very few people train an entire Convolutional Network from scratch, by random weight initialization, but instead it is common to pre-train a ConvNet on a very large dataset, like ImageNet which contains 1.2 million images with 1000 categories and then use it as an initialization or a fixed feature extractor. Then, this ConvNet is used as fixed feature extractor. It is also possible to fine-tune one or more of its layers on the new dataset.
However, in medical applications, we usually have a very limited number of images, for example less than 1,000 images. Therefore, one of the main challenges in applying deep learning to medical images arises from the limited small number available training samples to build deep models without suffering from overfitting. One of the various strategies to solve this issue is to use the same pre-trained models derived by natural images but this is not always possible. For example, only slices of a 3D scan can be individually or combined used, as 3D information cannot directly be used in these network architectures. Furthermore, transfer learning only succeeds when the training samples are like natural images and similar features can be extracted and represent the original data.
We propose a novel strategy of transfer learning where pre-trained models can be generated by training on medical images. This way these models will be more similar and compatible to medical applications. A huge dataset will be created by collecting medical analysis datasets from various applications, classifying or detecting structures in 3D medical data. The network architecture of these models will be inspired from both ImageNet CNNs (like AlexNet, VGG, GoogleLeNet, ResNet) and medical 3D CNNs (like UNet and networks proposed by CuMedVis). Then for a specific medical application one of the pre-trained models can be fine-tuned using a smaller annotated dataset and data augmentation techniques.
Unsupervised training techniques like auto-encoders or Restricted Boltzmann machines (RBMs) can be employed to generate generic deep learning models. These algorithms process data without labels and are trained to find patterns, such as latent subspaces.
Data augmentation techniques like translation and rotation will also be used to further increase the size of the training data and avoid overfitting. Using ReLUs as activation function, batch normalization, dropout and momentum has also been proven that helps deep models to better converge without being overfitted.