In the last few years there is a trend towards high dimensional datasets resulting from the simultaneous measurement of a high number of characteristics of a small number of samples. In computer assisted radiology, Radiomics is an emerging field which involves the extraction and analysis of large amounts of quantitative features from medical images. These features have the potential to bring better results in diagnosis outcomes and response to therapy, thus paving the road towards personalized treatment. Hundreds of features can be extracted per patient but due to the limited number of samples the resulting datasets suffer from the curse of dimensionality. Feature selection is hence necessary to identify the relevant attributes, to avoid overfitting, to improve the model performance and to provide faster and more cost-effective models. In addition, a good classification system does not only depend on the feature selection method, but also on parameters of the classifier that are not always easily tunable; e.g. in designing an Artificial Neural Network (ANN), the network topology, which plays a crucial role in its performance, is usually selected by trial-and-error experiments that constitute a bottleneck in the developmental procedure.
Neuroevolution is a machine learning method that can simultaneously optimize the topology and the weights of ANNs by means of genetic algorithms and hence it tackles the latter problem. Ideally, also feature selection should be performed simultaneously with topology and weight learning, as the three processes are not independent. The goal of this PhD is to develop an embedded feature selection method based on Neuroevolution for application to high-dimensional datasets by tackling important issues in the fields of evolutionary computation and machine learning.
|||Evaluation of data balancing techniques. Application to CAD of lung nodules using the LUNA16 framework. Revista de Ingeniería Electrónica, Automática y Comunicaciones, 39(3):57-67, 2018.|