Using IR techniques to improve Automated Text Classification

Detalhes bibliográficos
Autor(a) principal: Gonçalves, Teresa
Data de Publicação: 2004
Outros Autores: Quaresma, Paulo
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/2557
Resumo: This paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets, respectively. The study can be seen as a search, for the best document representa- tion, in three different axes: the feature reduction (using linguistic in- formation), the feature selection (using word frequencies) and the term weighting (using information retrieval measures).
id RCAP_83b1f3f549f979cddd8fda5362525d7f
oai_identifier_str oai:dspace.uevora.pt:10174/2557
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Using IR techniques to improve Automated Text Classificationmachine learningText classificationThis paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets, respectively. The study can be seen as a search, for the best document representa- tion, in three different axes: the feature reduction (using linguistic in- formation), the feature selection (using word frequencies) and the term weighting (using information retrieval measures).Springer-Verlag2011-02-15T10:54:06Z2011-02-152004-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article129335 bytesapplication/pdfhttp://hdl.handle.net/10174/2557http://hdl.handle.net/10174/2557eng374-379Lecture Notes in Computer Science3136livretcg@uevora.ptpq@uevora.ptNLDB-04, Natural Language Processing and Information SystemsMeziane, F.Metais, E.498Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2557Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:58:14.192931Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Using IR techniques to improve Automated Text Classification
title Using IR techniques to improve Automated Text Classification
spellingShingle Using IR techniques to improve Automated Text Classification
Gonçalves, Teresa
machine learning
Text classification
title_short Using IR techniques to improve Automated Text Classification
title_full Using IR techniques to improve Automated Text Classification
title_fullStr Using IR techniques to improve Automated Text Classification
title_full_unstemmed Using IR techniques to improve Automated Text Classification
title_sort Using IR techniques to improve Automated Text Classification
author Gonçalves, Teresa
author_facet Gonçalves, Teresa
Quaresma, Paulo
author_role author
author2 Quaresma, Paulo
author2_role author
dc.contributor.author.fl_str_mv Gonçalves, Teresa
Quaresma, Paulo
dc.subject.por.fl_str_mv machine learning
Text classification
topic machine learning
Text classification
description This paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets, respectively. The study can be seen as a search, for the best document representa- tion, in three different axes: the feature reduction (using linguistic in- formation), the feature selection (using word frequencies) and the term weighting (using information retrieval measures).
publishDate 2004
dc.date.none.fl_str_mv 2004-01-01T00:00:00Z
2011-02-15T10:54:06Z
2011-02-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/2557
http://hdl.handle.net/10174/2557
url http://hdl.handle.net/10174/2557
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 374-379
Lecture Notes in Computer Science
3136
livre
tcg@uevora.pt
pq@uevora.pt
NLDB-04, Natural Language Processing and Information Systems
Meziane, F.
Metais, E.
498
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 129335 bytes
application/pdf
dc.publisher.none.fl_str_mv Springer-Verlag
publisher.none.fl_str_mv Springer-Verlag
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136465770250240