Efficient feature selection filters for high-dimensional data
Autor(a) principal: | |
---|---|
Data de Publicação: | 2012 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.21/5081 |
Resumo: | Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster. |
id |
RCAP_4a0b4841018cf4bfc89c4d73f82a1634 |
---|---|
oai_identifier_str |
oai:repositorio.ipl.pt:10400.21/5081 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Efficient feature selection filters for high-dimensional dataFeature selectionFiltersDispersion measuresSimilarity measuresHigh-dimensional dataFeature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.Elsevier Science BVRCIPLJ. Ferreira, ArturFigueiredo, Mário A. T.2015-09-07T13:27:31Z2012-10-012012-10-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.21/5081engFERREIRA, Artur J.; FIGUEIREDO, Mário A. T. – Efficient feature selection filters for high-dimensional data. Pattern Recognition Letters. ISSN: 0167-8655. Vol. 33, N.º 13 (2012), pp. 1794-1804.0167-865510.1016/j.patrec.2012.05.019metadata only accessinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-03T09:47:58Zoai:repositorio.ipl.pt:10400.21/5081Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:14:24.554777Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Efficient feature selection filters for high-dimensional data |
title |
Efficient feature selection filters for high-dimensional data |
spellingShingle |
Efficient feature selection filters for high-dimensional data J. Ferreira, Artur Feature selection Filters Dispersion measures Similarity measures High-dimensional data |
title_short |
Efficient feature selection filters for high-dimensional data |
title_full |
Efficient feature selection filters for high-dimensional data |
title_fullStr |
Efficient feature selection filters for high-dimensional data |
title_full_unstemmed |
Efficient feature selection filters for high-dimensional data |
title_sort |
Efficient feature selection filters for high-dimensional data |
author |
J. Ferreira, Artur |
author_facet |
J. Ferreira, Artur Figueiredo, Mário A. T. |
author_role |
author |
author2 |
Figueiredo, Mário A. T. |
author2_role |
author |
dc.contributor.none.fl_str_mv |
RCIPL |
dc.contributor.author.fl_str_mv |
J. Ferreira, Artur Figueiredo, Mário A. T. |
dc.subject.por.fl_str_mv |
Feature selection Filters Dispersion measures Similarity measures High-dimensional data |
topic |
Feature selection Filters Dispersion measures Similarity measures High-dimensional data |
description |
Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster. |
publishDate |
2012 |
dc.date.none.fl_str_mv |
2012-10-01 2012-10-01T00:00:00Z 2015-09-07T13:27:31Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.21/5081 |
url |
http://hdl.handle.net/10400.21/5081 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
FERREIRA, Artur J.; FIGUEIREDO, Mário A. T. – Efficient feature selection filters for high-dimensional data. Pattern Recognition Letters. ISSN: 0167-8655. Vol. 33, N.º 13 (2012), pp. 1794-1804. 0167-8655 10.1016/j.patrec.2012.05.019 |
dc.rights.driver.fl_str_mv |
metadata only access info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
metadata only access |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier Science BV |
publisher.none.fl_str_mv |
Elsevier Science BV |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133401636143104 |