COMPRESSED LEARNING FOR TEXT CATEGORIZATION
Autor(a) principal: | |
---|---|
Data de Publicação: | 2013 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://doi.org/10.34629/ipl.isel.i-ETC.3 |
Resumo: | In text classification based on the bag-of-words (BoW) or similar representations, we usually have a large number of features, many of which are irrelevant (or even detrimental) for classification tasks. Recent results show that compressed learning (CL), i.e., learning in a domain of reduced dimensionality obtained by random projections (RP), is possible, and theoretical bounds on the test set error rate have been shown. In this work, we assess the performance of CL, based on RP of BoW representations for text classification. Our experimental results show that CL significantly reduces the number of features and the training time, while simultaneously improving the classification accuracy. Rather than the mild decrease in accuracy upper bounded by the theory, we actually find an increase of accuracy. Our approach is further compared against two techniques, namely the unsupervised random subspaces method and the supervised Fisher index. The CL approach is suited for unsupervised or semi-supervised learning, without any modification, since it does not use the class labels. |
id |
RCAP_8ac0e759f37ff52d93cf694a48d5bc7b |
---|---|
oai_identifier_str |
oai:i-ETC.journals.isel.pt:article/3 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
COMPRESSED LEARNING FOR TEXT CATEGORIZATIONComputers; Machine Learningrandom projections, random subspaces, compressed learning, text classification, support vector machinesIn text classification based on the bag-of-words (BoW) or similar representations, we usually have a large number of features, many of which are irrelevant (or even detrimental) for classification tasks. Recent results show that compressed learning (CL), i.e., learning in a domain of reduced dimensionality obtained by random projections (RP), is possible, and theoretical bounds on the test set error rate have been shown. In this work, we assess the performance of CL, based on RP of BoW representations for text classification. Our experimental results show that CL significantly reduces the number of features and the training time, while simultaneously improving the classification accuracy. Rather than the mild decrease in accuracy upper bounded by the theory, we actually find an increase of accuracy. Our approach is further compared against two techniques, namely the unsupervised random subspaces method and the supervised Fisher index. The CL approach is suited for unsupervised or semi-supervised learning, without any modification, since it does not use the class labels.ISEL - High Institute of Engineering of Lisbon2013-06-26T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.34629/ipl.isel.i-ETC.3oai:i-ETC.journals.isel.pt:article/3i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-1i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-12182-4010reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPenghttp://journals.isel.pt/index.php/i-ETC/article/view/3https://doi.org/10.34629/ipl.isel.i-ETC.3http://journals.isel.pt/index.php/i-ETC/article/view/3/3Copyright (c) 2013 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computershttp://creativecommons.org/licenses/by-nc/4.0info:eu-repo/semantics/openAccessFerreira, ArturFigueiredo, Mario2022-09-20T15:26:06Zoai:i-ETC.journals.isel.pt:article/3Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T15:51:11.874485Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
COMPRESSED LEARNING FOR TEXT CATEGORIZATION |
title |
COMPRESSED LEARNING FOR TEXT CATEGORIZATION |
spellingShingle |
COMPRESSED LEARNING FOR TEXT CATEGORIZATION Ferreira, Artur Computers; Machine Learning random projections, random subspaces, compressed learning, text classification, support vector machines |
title_short |
COMPRESSED LEARNING FOR TEXT CATEGORIZATION |
title_full |
COMPRESSED LEARNING FOR TEXT CATEGORIZATION |
title_fullStr |
COMPRESSED LEARNING FOR TEXT CATEGORIZATION |
title_full_unstemmed |
COMPRESSED LEARNING FOR TEXT CATEGORIZATION |
title_sort |
COMPRESSED LEARNING FOR TEXT CATEGORIZATION |
author |
Ferreira, Artur |
author_facet |
Ferreira, Artur Figueiredo, Mario |
author_role |
author |
author2 |
Figueiredo, Mario |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Ferreira, Artur Figueiredo, Mario |
dc.subject.por.fl_str_mv |
Computers; Machine Learning random projections, random subspaces, compressed learning, text classification, support vector machines |
topic |
Computers; Machine Learning random projections, random subspaces, compressed learning, text classification, support vector machines |
description |
In text classification based on the bag-of-words (BoW) or similar representations, we usually have a large number of features, many of which are irrelevant (or even detrimental) for classification tasks. Recent results show that compressed learning (CL), i.e., learning in a domain of reduced dimensionality obtained by random projections (RP), is possible, and theoretical bounds on the test set error rate have been shown. In this work, we assess the performance of CL, based on RP of BoW representations for text classification. Our experimental results show that CL significantly reduces the number of features and the training time, while simultaneously improving the classification accuracy. Rather than the mild decrease in accuracy upper bounded by the theory, we actually find an increase of accuracy. Our approach is further compared against two techniques, namely the unsupervised random subspaces method and the supervised Fisher index. The CL approach is suited for unsupervised or semi-supervised learning, without any modification, since it does not use the class labels. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-06-26T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.34629/ipl.isel.i-ETC.3 oai:i-ETC.journals.isel.pt:article/3 |
url |
https://doi.org/10.34629/ipl.isel.i-ETC.3 |
identifier_str_mv |
oai:i-ETC.journals.isel.pt:article/3 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
http://journals.isel.pt/index.php/i-ETC/article/view/3 https://doi.org/10.34629/ipl.isel.i-ETC.3 http://journals.isel.pt/index.php/i-ETC/article/view/3/3 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2013 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers http://creativecommons.org/licenses/by-nc/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2013 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers http://creativecommons.org/licenses/by-nc/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
ISEL - High Institute of Engineering of Lisbon |
publisher.none.fl_str_mv |
ISEL - High Institute of Engineering of Lisbon |
dc.source.none.fl_str_mv |
i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-1 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 2, No 1 (2013): The CETC2011 Issue; ID-1 2182-4010 reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799130375489847296 |