The impact of NLP techniques in the multilabel text classification problem
Autor(a) principal: | |
---|---|
Data de Publicação: | 2004 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10174/2558 |
Resumo: | Support Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office. |
id |
RCAP_4f6e13ffb7f611e5be0ca3dc916f5273 |
---|---|
oai_identifier_str |
oai:dspace.uevora.pt:10174/2558 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
The impact of NLP techniques in the multilabel text classification problemmachine learningText classificationSupport Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office.Springer-Verlag2011-02-15T11:25:04Z2011-02-152004-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article168602 bytesapplication/pdfhttp://hdl.handle.net/10174/2558http://hdl.handle.net/10174/2558eng424-428Advances in Soft Computinglivretcg@uevora.ptpq@uevora.ptIIPWM-04, Intelligent Information Processing and Web MiningKlopotek, M.Weirzchon, S.Trojanowski, K.498Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2558Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:58:14.237559Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
The impact of NLP techniques in the multilabel text classification problem |
title |
The impact of NLP techniques in the multilabel text classification problem |
spellingShingle |
The impact of NLP techniques in the multilabel text classification problem Gonçalves, Teresa machine learning Text classification |
title_short |
The impact of NLP techniques in the multilabel text classification problem |
title_full |
The impact of NLP techniques in the multilabel text classification problem |
title_fullStr |
The impact of NLP techniques in the multilabel text classification problem |
title_full_unstemmed |
The impact of NLP techniques in the multilabel text classification problem |
title_sort |
The impact of NLP techniques in the multilabel text classification problem |
author |
Gonçalves, Teresa |
author_facet |
Gonçalves, Teresa Quaresma, Paulo |
author_role |
author |
author2 |
Quaresma, Paulo |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Gonçalves, Teresa Quaresma, Paulo |
dc.subject.por.fl_str_mv |
machine learning Text classification |
topic |
machine learning Text classification |
description |
Support Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office. |
publishDate |
2004 |
dc.date.none.fl_str_mv |
2004-01-01T00:00:00Z 2011-02-15T11:25:04Z 2011-02-15 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10174/2558 http://hdl.handle.net/10174/2558 |
url |
http://hdl.handle.net/10174/2558 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
424-428 Advances in Soft Computing livre tcg@uevora.pt pq@uevora.pt IIPWM-04, Intelligent Information Processing and Web Mining Klopotek, M. Weirzchon, S. Trojanowski, K. 498 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
168602 bytes application/pdf |
dc.publisher.none.fl_str_mv |
Springer-Verlag |
publisher.none.fl_str_mv |
Springer-Verlag |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136465771298816 |