A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/37352 |
Resumo: | This work presents a comparative study between two different approaches to build an automatic classification system for Modalityvalues in the Portuguese language. One approach uses a single multi-class classifier with the full dataset that includes eleven modal verbs; the other builds different classifiers, one for each verb. The performance is measured using precision, recall and F1. Due to the unbalanced nature of the dataset a weighted average approach was calculated for each metric. We use support vector machines as ourclassifier and experimented with various SVM kernels to find the optimal classifier for the task at hand. We experimented with several different types of feature attributes representing parse tree information and compare these complex feature representation against a simple bag-of-words feature representation as baseline. The best obtained F1values are above 0.60 and from the results it is possible to conclude that there is no significant difference between both approaches. |
id |
RCAP_1677a8bc38fd7b1f2e409c7db1e1d7ae |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/37352 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language.Natural language processingModalityFeature selectionSupport Vector MachinesThis work presents a comparative study between two different approaches to build an automatic classification system for Modalityvalues in the Portuguese language. One approach uses a single multi-class classifier with the full dataset that includes eleven modal verbs; the other builds different classifiers, one for each verb. The performance is measured using precision, recall and F1. Due to the unbalanced nature of the dataset a weighted average approach was calculated for each metric. We use support vector machines as ourclassifier and experimented with various SVM kernels to find the optimal classifier for the task at hand. We experimented with several different types of feature attributes representing parse tree information and compare these complex feature representation against a simple bag-of-words feature representation as baseline. The best obtained F1values are above 0.60 and from the results it is possible to conclude that there is no significant difference between both approaches.European Language Resources AssociationRepositório da Universidade de LisboaSequeira, JoãoGonçalves, TeresaQuaresma, PauloMendes, AmáliaHendrickx, Iris2019-03-07T14:22:05Z20182018-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/37352engSequeira, João, Teresa Gonçalves, Paulo Quaresma, Amália Mendes, Iris Hendrickx (2018) A Multi-versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. In Proceedings of the 11th Language Resources and Evaluation Conference - LREC’2018, 7-12 May 2018, Miyazaki, Japan, pp. 1000-1005.979-10-95546-00-9info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:33:58Zoai:repositorio.ul.pt:10451/37352Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:51:09.974496Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. |
title |
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. |
spellingShingle |
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. Sequeira, João Natural language processing Modality Feature selection Support Vector Machines |
title_short |
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. |
title_full |
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. |
title_fullStr |
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. |
title_full_unstemmed |
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. |
title_sort |
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. |
author |
Sequeira, João |
author_facet |
Sequeira, João Gonçalves, Teresa Quaresma, Paulo Mendes, Amália Hendrickx, Iris |
author_role |
author |
author2 |
Gonçalves, Teresa Quaresma, Paulo Mendes, Amália Hendrickx, Iris |
author2_role |
author author author author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Sequeira, João Gonçalves, Teresa Quaresma, Paulo Mendes, Amália Hendrickx, Iris |
dc.subject.por.fl_str_mv |
Natural language processing Modality Feature selection Support Vector Machines |
topic |
Natural language processing Modality Feature selection Support Vector Machines |
description |
This work presents a comparative study between two different approaches to build an automatic classification system for Modalityvalues in the Portuguese language. One approach uses a single multi-class classifier with the full dataset that includes eleven modal verbs; the other builds different classifiers, one for each verb. The performance is measured using precision, recall and F1. Due to the unbalanced nature of the dataset a weighted average approach was calculated for each metric. We use support vector machines as ourclassifier and experimented with various SVM kernels to find the optimal classifier for the task at hand. We experimented with several different types of feature attributes representing parse tree information and compare these complex feature representation against a simple bag-of-words feature representation as baseline. The best obtained F1values are above 0.60 and from the results it is possible to conclude that there is no significant difference between both approaches. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018 2018-01-01T00:00:00Z 2019-03-07T14:22:05Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/37352 |
url |
http://hdl.handle.net/10451/37352 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Sequeira, João, Teresa Gonçalves, Paulo Quaresma, Amália Mendes, Iris Hendrickx (2018) A Multi-versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. In Proceedings of the 11th Language Resources and Evaluation Conference - LREC’2018, 7-12 May 2018, Miyazaki, Japan, pp. 1000-1005. 979-10-95546-00-9 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
European Language Resources Association |
publisher.none.fl_str_mv |
European Language Resources Association |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134447366307840 |