The impact of NLP techniques in the multilabel text classification problem

Detalhes bibliográficos
Autor(a) principal: Gonçalves, Teresa
Data de Publicação: 2004
Outros Autores: Quaresma, Paulo
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/2558
Resumo: Support Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office.
id RCAP_4f6e13ffb7f611e5be0ca3dc916f5273
oai_identifier_str oai:dspace.uevora.pt:10174/2558
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling The impact of NLP techniques in the multilabel text classification problemmachine learningText classificationSupport Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office.Springer-Verlag2011-02-15T11:25:04Z2011-02-152004-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article168602 bytesapplication/pdfhttp://hdl.handle.net/10174/2558http://hdl.handle.net/10174/2558eng424-428Advances in Soft Computinglivretcg@uevora.ptpq@uevora.ptIIPWM-04, Intelligent Information Processing and Web MiningKlopotek, M.Weirzchon, S.Trojanowski, K.498Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2558Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:58:14.237559Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv The impact of NLP techniques in the multilabel text classification problem
title The impact of NLP techniques in the multilabel text classification problem
spellingShingle The impact of NLP techniques in the multilabel text classification problem
Gonçalves, Teresa
machine learning
Text classification
title_short The impact of NLP techniques in the multilabel text classification problem
title_full The impact of NLP techniques in the multilabel text classification problem
title_fullStr The impact of NLP techniques in the multilabel text classification problem
title_full_unstemmed The impact of NLP techniques in the multilabel text classification problem
title_sort The impact of NLP techniques in the multilabel text classification problem
author Gonçalves, Teresa
author_facet Gonçalves, Teresa
Quaresma, Paulo
author_role author
author2 Quaresma, Paulo
author2_role author
dc.contributor.author.fl_str_mv Gonçalves, Teresa
Quaresma, Paulo
dc.subject.por.fl_str_mv machine learning
Text classification
topic machine learning
Text classification
description Support Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office.
publishDate 2004
dc.date.none.fl_str_mv 2004-01-01T00:00:00Z
2011-02-15T11:25:04Z
2011-02-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/2558
http://hdl.handle.net/10174/2558
url http://hdl.handle.net/10174/2558
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 424-428
Advances in Soft Computing
livre
tcg@uevora.pt
pq@uevora.pt
IIPWM-04, Intelligent Information Processing and Web Mining
Klopotek, M.
Weirzchon, S.
Trojanowski, K.
498
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 168602 bytes
application/pdf
dc.publisher.none.fl_str_mv Springer-Verlag
publisher.none.fl_str_mv Springer-Verlag
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136465771298816