Data dimensionality reduction based on genetic selection of feature subsets

Faraoun, K. M.; Rabhi, A.

Data dimensionality reduction based on genetic selection of feature subsets

Detalhes bibliográficos
Autor(a) principal:	Faraoun, K. M.
Data de Publicação:	2007
Outros Autores:	Rabhi, A.
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	INFOCOMP: Jornal de Ciência da Computação
Texto Completo:	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/169
Resumo:	In the present paper, we show that a multi-classification process can be significantly enhanced by selecting an optimal set of the features used as input for the training operation. The selection of such a subset will reduce the dimensionality of the data samples and eliminate the redundancy and ambiguity introduced by some attributes. The used classifier can then operate only on the selected features to perform the learning process. A genetic search is used here to explore the set of all possible features subsets whose size is exponentially proportional to the number of features. A new measure is proposed to compute the information gain provided by each features subsets, and used as the fitness function of the genetic search. Experiments are performed using the KDD99 dataset to classify DoS network intrusions, according to the 41 existing features. The optimality of the obtained features subset is then tested using a multi-layered neural network. Obtained results show that the proposed approach can enhance both the classification rate and the learning runtime.

Metadados do item

id	UFLA-5_e5d7314c3944adc4b051f23569e631e6
oai_identifier_str	oai:infocomp.dcc.ufla.br:article/169
network_acronym_str	UFLA-5
network_name_str	INFOCOMP: Jornal de Ciência da Computação
repository_id_str
spelling	Data dimensionality reduction based on genetic selection of feature subsetsFeatures selectiongenetic algorithmspatterns classificationIn the present paper, we show that a multi-classification process can be significantly enhanced by selecting an optimal set of the features used as input for the training operation. The selection of such a subset will reduce the dimensionality of the data samples and eliminate the redundancy and ambiguity introduced by some attributes. The used classifier can then operate only on the selected features to perform the learning process. A genetic search is used here to explore the set of all possible features subsets whose size is exponentially proportional to the number of features. A new measure is proposed to compute the information gain provided by each features subsets, and used as the fitness function of the genetic search. Experiments are performed using the KDD99 dataset to classify DoS network intrusions, according to the 41 existing features. The optimality of the obtained features subset is then tested using a multi-layered neural network. Obtained results show that the proposed approach can enhance both the classification rate and the learning runtime.Editora da UFLA2007-06-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/169INFOCOMP Journal of Computer Science; Vol. 6 No. 2 (2007): June, 2007; 9-191982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/169/154Copyright (c) 2016 INFOCOMP Journal of Computer Scienceinfo:eu-repo/semantics/openAccessFaraoun, K. M.Rabhi, A.2015-06-27T23:27:41Zoai:infocomp.dcc.ufla.br:article/169Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br\|\|apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:21.822951INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true
dc.title.none.fl_str_mv	Data dimensionality reduction based on genetic selection of feature subsets
title	Data dimensionality reduction based on genetic selection of feature subsets
spellingShingle	Data dimensionality reduction based on genetic selection of feature subsets Faraoun, K. M. Features selection genetic algorithms patterns classification
title_short	Data dimensionality reduction based on genetic selection of feature subsets
title_full	Data dimensionality reduction based on genetic selection of feature subsets
title_fullStr	Data dimensionality reduction based on genetic selection of feature subsets
title_full_unstemmed	Data dimensionality reduction based on genetic selection of feature subsets
title_sort	Data dimensionality reduction based on genetic selection of feature subsets
author	Faraoun, K. M.
author_facet	Faraoun, K. M. Rabhi, A.
author_role	author
author2	Rabhi, A.
author2_role	author
dc.contributor.author.fl_str_mv	Faraoun, K. M. Rabhi, A.
dc.subject.por.fl_str_mv	Features selection genetic algorithms patterns classification
topic	Features selection genetic algorithms patterns classification
description	In the present paper, we show that a multi-classification process can be significantly enhanced by selecting an optimal set of the features used as input for the training operation. The selection of such a subset will reduce the dimensionality of the data samples and eliminate the redundancy and ambiguity introduced by some attributes. The used classifier can then operate only on the selected features to perform the learning process. A genetic search is used here to explore the set of all possible features subsets whose size is exponentially proportional to the number of features. A new measure is proposed to compute the information gain provided by each features subsets, and used as the fitness function of the genetic search. Experiments are performed using the KDD99 dataset to classify DoS network intrusions, according to the 41 existing features. The optimality of the obtained features subset is then tested using a multi-layered neural network. Obtained results show that the proposed approach can enhance both the classification rate and the learning runtime.
publishDate	2007
dc.date.none.fl_str_mv	2007-06-01
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/169
url	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/169
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/169/154
dc.rights.driver.fl_str_mv	Copyright (c) 2016 INFOCOMP Journal of Computer Science info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Copyright (c) 2016 INFOCOMP Journal of Computer Science
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Editora da UFLA
publisher.none.fl_str_mv	Editora da UFLA
dc.source.none.fl_str_mv	INFOCOMP Journal of Computer Science; Vol. 6 No. 2 (2007): June, 2007; 9-19 1982-3363 1807-4545 reponame:INFOCOMP: Jornal de Ciência da Computação instname:Universidade Federal de Lavras (UFLA) instacron:UFLA
instname_str	Universidade Federal de Lavras (UFLA)
instacron_str	UFLA
institution	UFLA
reponame_str	INFOCOMP: Jornal de Ciência da Computação
collection	INFOCOMP: Jornal de Ciência da Computação
repository.name.fl_str_mv	INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv	infocomp@dcc.ufla.br\|\|apfreire@dcc.ufla.br
_version_	1799874740427423744

Data dimensionality reduction based on genetic selection of feature subsets

Registros relacionados