An unsupervised approach to feature discretization and selection

J. Ferreira, Artur; Figueiredo, Mário A. T.

An unsupervised approach to feature discretization and selection

Detalhes bibliográficos
Autor(a) principal:	J. Ferreira, Artur
Data de Publicação:	2012
Outros Autores:	Figueiredo, Mário A. T.
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10400.21/5074
Resumo:	Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.

Metadados do item

id	RCAP_7fd1395dc971bc82cd696fc2868a7e91
oai_identifier_str	oai:repositorio.ipl.pt:10400.21/5074
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	An unsupervised approach to feature discretization and selectionFeature discretizationFeature quantizationFeature selectionLinde-Buzo-Gray algorithmSparse dataSupport vector machinesNaive BayesK-Nearest neighborMany learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.ElsevierRCIPLJ. Ferreira, ArturFigueiredo, Mário A. T.2015-09-07T11:17:36Z2012-092012-09-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.21/5074engFERREIRA, Artur J.; FIGUEIREDO, Mário A. T. – An unsupervised approach to feature discretization and selection. Pattern Recognition. ISSN: 0031-3203. Vol 45, nr. 9 (2012), pp. 3048-30600031-320310.1016/j.patcog.2011.12.008metadata only accessinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-03T09:47:58Zoai:repositorio.ipl.pt:10400.21/5074Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:14:24.291135Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	An unsupervised approach to feature discretization and selection
title	An unsupervised approach to feature discretization and selection
spellingShingle	An unsupervised approach to feature discretization and selection J. Ferreira, Artur Feature discretization Feature quantization Feature selection Linde-Buzo-Gray algorithm Sparse data Support vector machines Naive Bayes K-Nearest neighbor
title_short	An unsupervised approach to feature discretization and selection
title_full	An unsupervised approach to feature discretization and selection
title_fullStr	An unsupervised approach to feature discretization and selection
title_full_unstemmed	An unsupervised approach to feature discretization and selection
title_sort	An unsupervised approach to feature discretization and selection
author	J. Ferreira, Artur
author_facet	J. Ferreira, Artur Figueiredo, Mário A. T.
author_role	author
author2	Figueiredo, Mário A. T.
author2_role	author
dc.contributor.none.fl_str_mv	RCIPL
dc.contributor.author.fl_str_mv	J. Ferreira, Artur Figueiredo, Mário A. T.
dc.subject.por.fl_str_mv	Feature discretization Feature quantization Feature selection Linde-Buzo-Gray algorithm Sparse data Support vector machines Naive Bayes K-Nearest neighbor
topic	Feature discretization Feature quantization Feature selection Linde-Buzo-Gray algorithm Sparse data Support vector machines Naive Bayes K-Nearest neighbor
description	Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
publishDate	2012
dc.date.none.fl_str_mv	2012-09 2012-09-01T00:00:00Z 2015-09-07T11:17:36Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10400.21/5074
url	http://hdl.handle.net/10400.21/5074
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	FERREIRA, Artur J.; FIGUEIREDO, Mário A. T. – An unsupervised approach to feature discretization and selection. Pattern Recognition. ISSN: 0031-3203. Vol 45, nr. 9 (2012), pp. 3048-3060 0031-3203 10.1016/j.patcog.2011.12.008
dc.rights.driver.fl_str_mv	metadata only access info:eu-repo/semantics/openAccess
rights_invalid_str_mv	metadata only access
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Elsevier
publisher.none.fl_str_mv	Elsevier
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799133401630900224

An unsupervised approach to feature discretization and selection

Registros relacionados