Enhancing a Portuguese text classifier using part-of-speech tags

Detalhes bibliográficos
Autor(a) principal: Gonçalves, Teresa
Data de Publicação: 2005
Outros Autores: Quaresma, Paulo
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/2562
Resumo: Support Vector Machines have been applied to text classification with great success. In this paper, we apply and evaluate the impact of using part-of- speech tags (nouns, proper nouns, adjectives and verbs) as a feature selection procedure in a European Portuguese written dataset – the Portuguese Attorney General’s Office documents. From the results, we can conclude that verbs alone don’t have enough informa- tion to produce good learners. On the other hand, we obtain learners with equiva- lent performance and a reduced number of features (at least half) if we use specific part-of-speech tags instead of all words.
id RCAP_8861338580123eba5bc121516cc41b36
oai_identifier_str oai:dspace.uevora.pt:10174/2562
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Enhancing a Portuguese text classifier using part-of-speech tagsmachine learningText classificationSupport Vector Machines have been applied to text classification with great success. In this paper, we apply and evaluate the impact of using part-of- speech tags (nouns, proper nouns, adjectives and verbs) as a feature selection procedure in a European Portuguese written dataset – the Portuguese Attorney General’s Office documents. From the results, we can conclude that verbs alone don’t have enough informa- tion to produce good learners. On the other hand, we obtain learners with equiva- lent performance and a reduced number of features (at least half) if we use specific part-of-speech tags instead of all words.Springer-Verlag2011-02-15T11:39:29Z2011-02-152005-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article119604 bytesapplication/pdfhttp://hdl.handle.net/10174/2562http://hdl.handle.net/10174/2562eng189-198Advances in Soft Computinglivretcg@uevora.ptpq@uevora.ptIIPWM-05, Intelligent Information Processing and Web MiningKlopotek, M.Weirzchon, S.Trojanowski, K.606Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2562Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:58:14.149072Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Enhancing a Portuguese text classifier using part-of-speech tags
title Enhancing a Portuguese text classifier using part-of-speech tags
spellingShingle Enhancing a Portuguese text classifier using part-of-speech tags
Gonçalves, Teresa
machine learning
Text classification
title_short Enhancing a Portuguese text classifier using part-of-speech tags
title_full Enhancing a Portuguese text classifier using part-of-speech tags
title_fullStr Enhancing a Portuguese text classifier using part-of-speech tags
title_full_unstemmed Enhancing a Portuguese text classifier using part-of-speech tags
title_sort Enhancing a Portuguese text classifier using part-of-speech tags
author Gonçalves, Teresa
author_facet Gonçalves, Teresa
Quaresma, Paulo
author_role author
author2 Quaresma, Paulo
author2_role author
dc.contributor.author.fl_str_mv Gonçalves, Teresa
Quaresma, Paulo
dc.subject.por.fl_str_mv machine learning
Text classification
topic machine learning
Text classification
description Support Vector Machines have been applied to text classification with great success. In this paper, we apply and evaluate the impact of using part-of- speech tags (nouns, proper nouns, adjectives and verbs) as a feature selection procedure in a European Portuguese written dataset – the Portuguese Attorney General’s Office documents. From the results, we can conclude that verbs alone don’t have enough informa- tion to produce good learners. On the other hand, we obtain learners with equiva- lent performance and a reduced number of features (at least half) if we use specific part-of-speech tags instead of all words.
publishDate 2005
dc.date.none.fl_str_mv 2005-01-01T00:00:00Z
2011-02-15T11:39:29Z
2011-02-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/2562
http://hdl.handle.net/10174/2562
url http://hdl.handle.net/10174/2562
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 189-198
Advances in Soft Computing
livre
tcg@uevora.pt
pq@uevora.pt
IIPWM-05, Intelligent Information Processing and Web Mining
Klopotek, M.
Weirzchon, S.
Trojanowski, K.
606
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 119604 bytes
application/pdf
dc.publisher.none.fl_str_mv Springer-Verlag
publisher.none.fl_str_mv Springer-Verlag
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136465769201664