An unsupervised approach to feature discretization and selection
Autor(a) principal: | |
---|---|
Data de Publicação: | 2012 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.21/5074 |
Resumo: | Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques. |
id |
RCAP_7fd1395dc971bc82cd696fc2868a7e91 |
---|---|
oai_identifier_str |
oai:repositorio.ipl.pt:10400.21/5074 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
An unsupervised approach to feature discretization and selectionFeature discretizationFeature quantizationFeature selectionLinde-Buzo-Gray algorithmSparse dataSupport vector machinesNaive BayesK-Nearest neighborMany learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.ElsevierRCIPLJ. Ferreira, ArturFigueiredo, Mário A. T.2015-09-07T11:17:36Z2012-092012-09-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.21/5074engFERREIRA, Artur J.; FIGUEIREDO, Mário A. T. – An unsupervised approach to feature discretization and selection. Pattern Recognition. ISSN: 0031-3203. Vol 45, nr. 9 (2012), pp. 3048-30600031-320310.1016/j.patcog.2011.12.008metadata only accessinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-03T09:47:58Zoai:repositorio.ipl.pt:10400.21/5074Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:14:24.291135Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
An unsupervised approach to feature discretization and selection |
title |
An unsupervised approach to feature discretization and selection |
spellingShingle |
An unsupervised approach to feature discretization and selection J. Ferreira, Artur Feature discretization Feature quantization Feature selection Linde-Buzo-Gray algorithm Sparse data Support vector machines Naive Bayes K-Nearest neighbor |
title_short |
An unsupervised approach to feature discretization and selection |
title_full |
An unsupervised approach to feature discretization and selection |
title_fullStr |
An unsupervised approach to feature discretization and selection |
title_full_unstemmed |
An unsupervised approach to feature discretization and selection |
title_sort |
An unsupervised approach to feature discretization and selection |
author |
J. Ferreira, Artur |
author_facet |
J. Ferreira, Artur Figueiredo, Mário A. T. |
author_role |
author |
author2 |
Figueiredo, Mário A. T. |
author2_role |
author |
dc.contributor.none.fl_str_mv |
RCIPL |
dc.contributor.author.fl_str_mv |
J. Ferreira, Artur Figueiredo, Mário A. T. |
dc.subject.por.fl_str_mv |
Feature discretization Feature quantization Feature selection Linde-Buzo-Gray algorithm Sparse data Support vector machines Naive Bayes K-Nearest neighbor |
topic |
Feature discretization Feature quantization Feature selection Linde-Buzo-Gray algorithm Sparse data Support vector machines Naive Bayes K-Nearest neighbor |
description |
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques. |
publishDate |
2012 |
dc.date.none.fl_str_mv |
2012-09 2012-09-01T00:00:00Z 2015-09-07T11:17:36Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.21/5074 |
url |
http://hdl.handle.net/10400.21/5074 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
FERREIRA, Artur J.; FIGUEIREDO, Mário A. T. – An unsupervised approach to feature discretization and selection. Pattern Recognition. ISSN: 0031-3203. Vol 45, nr. 9 (2012), pp. 3048-3060 0031-3203 10.1016/j.patcog.2011.12.008 |
dc.rights.driver.fl_str_mv |
metadata only access info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
metadata only access |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier |
publisher.none.fl_str_mv |
Elsevier |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133401630900224 |