

Feature selection is an important data preprocessing step in data mining and machine learning tasks, especially in the case of high dimensional data. In this paper we present a novel feature selection method based on complex weighted networks describing the strongest correlations among features. The method relies on community detection techniques to identify cohesive groups of features. A subset of features exhibiting a strong association with the class feature is selected from each identified community of features taking into account the size of and connections within the community. The proposed method is evaluated on a high dimensional dataset containing signaling protein features related to the diagnosis of Alzheimer’s disease. We compared the performance of seven widely used classifiers that were trained without feature selection, with correlation-based feature selection by a state-of-the-art method provided by the WEKA tool, and with feature selection by four variants of our method determined by four different community detection techniques. The results of the evaluation indicate that our method improves the classification accuracy of several classification models while drastically reducing the dimensionality of the dataset. Additionally, one variant of our method outperforms the correlation-based feature selection method implemented in WEKA. © 2017, Springer International Publishing AG.
| Engineering controlled terms: | Clustering algorithmsData miningDiagnosisFeature extractionLearning systemsPopulation dynamics |
|---|---|
| Engineering uncontrolled terms | AlzheimerClassification accuracyCommunity detectionFeature correlationFeature selection methodsHigh dimensional dataHigh-dimensional datasetState-of-the-art methods |
| Engineering main heading: | Classification (of information) |
| Funding sponsor | Funding number | Acronym |
|---|---|---|
| Javna Agencija za Raziskovalno Dejavnost RS | OI174023 | ARRS |
| Ministarstvo Prosvete, Nauke i Tehnološkog Razvoja | MPNTR |
Acknowledgments. This work is supported by the bilateral project “Intelligent computer techniques for improving medical detection, analysis and explanation of human cognition and behavior disorders” between the Ministry of Education, Science and Technological Development of the Republic of Serbia and the Slovenian Research Agency. M. Savić, V. Kurbalija and M. Ivanović also thank the Ministry of Education, Science and Technological Development of the Republic of Serbia for additional support through project no. OI174023, “Intelligent techniques and their integration into wide-spectrum decision support.”
Savić, M.; Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, Novi Sad, Serbia;
© Copyright 2017 Elsevier B.V., All rights reserved.