Data dimensionality reduction based on genetic selection of feature subsets
Autor(a) principal: | |
---|---|
Data de Publicação: | 2007 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | INFOCOMP: Jornal de Ciência da Computação |
Texto Completo: | https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/184 |
Resumo: | In the present paper, we show that a multi-classification process can be significantly enhanced by selecting an optimal set of the features used as input for the training operation. The selection of such a subset will reduce the dimensionality of the data samples and eliminate the redundancy and ambiguity introduced by some attributes. The used classifier can then operate only on the selected features to perform the learning process. A genetic search is used here to explore the set of all possible features subsets whose size is exponentially proportional to the number of features. A new measure is proposed to compute the information gain provided by each features subsets, and used as the fitness function of the genetic search. Experiments are performed using the KDD99 dataset to classify DoS network intrusions, according to the 41 existing features. The optimality of the obtained features subset is then tested using a multi-layered neural network. Obtained results show that the proposed approach can enhance both the classification rate and the learning runtime. |
id |
UFLA-5_8ffbb146c6ce5a97fb209a523182694a |
---|---|
oai_identifier_str |
oai:infocomp.dcc.ufla.br:article/184 |
network_acronym_str |
UFLA-5 |
network_name_str |
INFOCOMP: Jornal de Ciência da Computação |
repository_id_str |
|
spelling |
Data dimensionality reduction based on genetic selection of feature subsetsFeatures selectiongenetic algorithmspatterns classificationIn the present paper, we show that a multi-classification process can be significantly enhanced by selecting an optimal set of the features used as input for the training operation. The selection of such a subset will reduce the dimensionality of the data samples and eliminate the redundancy and ambiguity introduced by some attributes. The used classifier can then operate only on the selected features to perform the learning process. A genetic search is used here to explore the set of all possible features subsets whose size is exponentially proportional to the number of features. A new measure is proposed to compute the information gain provided by each features subsets, and used as the fitness function of the genetic search. Experiments are performed using the KDD99 dataset to classify DoS network intrusions, according to the 41 existing features. The optimality of the obtained features subset is then tested using a multi-layered neural network. Obtained results show that the proposed approach can enhance both the classification rate and the learning runtime.Editora da UFLA2007-09-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/184INFOCOMP Journal of Computer Science; Vol. 6 No. 3 (2007): September, 2007; 36-461982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/184/169Copyright (c) 2016 INFOCOMP Journal of Computer Scienceinfo:eu-repo/semantics/openAccessFaraoun, K. M.Rabhi, A.2015-06-27T23:27:22Zoai:infocomp.dcc.ufla.br:article/184Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br||apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:22.793634INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true |
dc.title.none.fl_str_mv |
Data dimensionality reduction based on genetic selection of feature subsets |
title |
Data dimensionality reduction based on genetic selection of feature subsets |
spellingShingle |
Data dimensionality reduction based on genetic selection of feature subsets Faraoun, K. M. Features selection genetic algorithms patterns classification |
title_short |
Data dimensionality reduction based on genetic selection of feature subsets |
title_full |
Data dimensionality reduction based on genetic selection of feature subsets |
title_fullStr |
Data dimensionality reduction based on genetic selection of feature subsets |
title_full_unstemmed |
Data dimensionality reduction based on genetic selection of feature subsets |
title_sort |
Data dimensionality reduction based on genetic selection of feature subsets |
author |
Faraoun, K. M. |
author_facet |
Faraoun, K. M. Rabhi, A. |
author_role |
author |
author2 |
Rabhi, A. |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Faraoun, K. M. Rabhi, A. |
dc.subject.por.fl_str_mv |
Features selection genetic algorithms patterns classification |
topic |
Features selection genetic algorithms patterns classification |
description |
In the present paper, we show that a multi-classification process can be significantly enhanced by selecting an optimal set of the features used as input for the training operation. The selection of such a subset will reduce the dimensionality of the data samples and eliminate the redundancy and ambiguity introduced by some attributes. The used classifier can then operate only on the selected features to perform the learning process. A genetic search is used here to explore the set of all possible features subsets whose size is exponentially proportional to the number of features. A new measure is proposed to compute the information gain provided by each features subsets, and used as the fitness function of the genetic search. Experiments are performed using the KDD99 dataset to classify DoS network intrusions, according to the 41 existing features. The optimality of the obtained features subset is then tested using a multi-layered neural network. Obtained results show that the proposed approach can enhance both the classification rate and the learning runtime. |
publishDate |
2007 |
dc.date.none.fl_str_mv |
2007-09-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/184 |
url |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/184 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/184/169 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2016 INFOCOMP Journal of Computer Science info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2016 INFOCOMP Journal of Computer Science |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Editora da UFLA |
publisher.none.fl_str_mv |
Editora da UFLA |
dc.source.none.fl_str_mv |
INFOCOMP Journal of Computer Science; Vol. 6 No. 3 (2007): September, 2007; 36-46 1982-3363 1807-4545 reponame:INFOCOMP: Jornal de Ciência da Computação instname:Universidade Federal de Lavras (UFLA) instacron:UFLA |
instname_str |
Universidade Federal de Lavras (UFLA) |
instacron_str |
UFLA |
institution |
UFLA |
reponame_str |
INFOCOMP: Jornal de Ciência da Computação |
collection |
INFOCOMP: Jornal de Ciência da Computação |
repository.name.fl_str_mv |
INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA) |
repository.mail.fl_str_mv |
infocomp@dcc.ufla.br||apfreire@dcc.ufla.br |
_version_ |
1799874740447346688 |