An unsupervised approach to feature discretization and selection

Detalhes bibliográficos
Autor(a) principal: J. Ferreira, Artur
Data de Publicação: 2012
Outros Autores: Figueiredo, Mário A. T.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.21/8569
Resumo: Many learning problems require handling high dimensional data sets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevante (oreven detrimental) for the learning tasks. It ist hus clear that the reisaneed for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for médium and high-dimensional datas ets. The experimental results on several standard data sets, with both sparse and dense features, showthe efficiency of the proposed techniques as well as improvements over previous related techniques.
id RCAP_aa97465db818b41a7638002423b0e4a4
oai_identifier_str oai:repositorio.ipl.pt:10400.21/8569
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling An unsupervised approach to feature discretization and selectionFeature discretizationFeature quantizationFeature selectionLinde–Buzo–Gray algorithmSparse dataSupport vectormachinesNaïve BayesK-nearest neighborMany learning problems require handling high dimensional data sets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevante (oreven detrimental) for the learning tasks. It ist hus clear that the reisaneed for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for médium and high-dimensional datas ets. The experimental results on several standard data sets, with both sparse and dense features, showthe efficiency of the proposed techniques as well as improvements over previous related techniques.ElsevierRCIPLJ. Ferreira, ArturFigueiredo, Mário A. T.2018-06-06T08:56:51Z20122012-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.21/8569engFERREIRA, Artur Jorge; FIGUEIREDO, Mário A. T. – An unsupervised approach to feature discretization and selection. Pattern Recognition. ISSN 0031-3203. Vol. 45, (2012), pp. 3048-3060.0031-3203metadata only accessinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-03T09:56:09Zoai:repositorio.ipl.pt:10400.21/8569Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:17:18.629534Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv An unsupervised approach to feature discretization and selection
title An unsupervised approach to feature discretization and selection
spellingShingle An unsupervised approach to feature discretization and selection
J. Ferreira, Artur
Feature discretization
Feature quantization
Feature selection
Linde–Buzo–Gray algorithm
Sparse data
Support vectormachines
Naïve Bayes
K-nearest neighbor
title_short An unsupervised approach to feature discretization and selection
title_full An unsupervised approach to feature discretization and selection
title_fullStr An unsupervised approach to feature discretization and selection
title_full_unstemmed An unsupervised approach to feature discretization and selection
title_sort An unsupervised approach to feature discretization and selection
author J. Ferreira, Artur
author_facet J. Ferreira, Artur
Figueiredo, Mário A. T.
author_role author
author2 Figueiredo, Mário A. T.
author2_role author
dc.contributor.none.fl_str_mv RCIPL
dc.contributor.author.fl_str_mv J. Ferreira, Artur
Figueiredo, Mário A. T.
dc.subject.por.fl_str_mv Feature discretization
Feature quantization
Feature selection
Linde–Buzo–Gray algorithm
Sparse data
Support vectormachines
Naïve Bayes
K-nearest neighbor
topic Feature discretization
Feature quantization
Feature selection
Linde–Buzo–Gray algorithm
Sparse data
Support vectormachines
Naïve Bayes
K-nearest neighbor
description Many learning problems require handling high dimensional data sets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevante (oreven detrimental) for the learning tasks. It ist hus clear that the reisaneed for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for médium and high-dimensional datas ets. The experimental results on several standard data sets, with both sparse and dense features, showthe efficiency of the proposed techniques as well as improvements over previous related techniques.
publishDate 2012
dc.date.none.fl_str_mv 2012
2012-01-01T00:00:00Z
2018-06-06T08:56:51Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.21/8569
url http://hdl.handle.net/10400.21/8569
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv FERREIRA, Artur Jorge; FIGUEIREDO, Mário A. T. – An unsupervised approach to feature discretization and selection. Pattern Recognition. ISSN 0031-3203. Vol. 45, (2012), pp. 3048-3060.
0031-3203
dc.rights.driver.fl_str_mv metadata only access
info:eu-repo/semantics/openAccess
rights_invalid_str_mv metadata only access
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799133435170652160