Modality annotation for Portuguese: from manual annotation to automatic labeling

Detalhes bibliográficos
Autor(a) principal: Mendes, Amália
Data de Publicação: 2016
Outros Autores: Hendrickx, Iris, Ávila, Luciana, Quaresma, Paulo, Gonçalves, Teresa, Sequeira, João
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10451/30693
Resumo: We investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classi er trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new uni ed scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data.
id RCAP_d9471b93a4f55c87e243eeac92b1fb35
oai_identifier_str oai:repositorio.ul.pt:10451/30693
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Modality annotation for Portuguese: from manual annotation to automatic labelingModalityCorpus annotationText miningPortuguese linguisticsWe investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classi er trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new uni ed scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data.Stanford UniversityRepositório da Universidade de LisboaMendes, AmáliaHendrickx, IrisÁvila, LucianaQuaresma, PauloGonçalves, TeresaSequeira, João2018-01-17T16:50:56Z20162016-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/30693engMendes, Amália, Iris Hendrickx, Luciana Ávila, Paulo Quaresma, Teresa Gonçalves, João Sequeira (2016) Modality Annotation for Portuguese: from manual annotation to automatic labeling, LiLT - Language Issues in Language Technology, 14 (2016), Special volume on Modality: Modes of Modality in NLP (ISSN: 1945-3604)1945-3604info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:23:35Zoai:repositorio.ul.pt:10451/30693Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:46:18.824938Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Modality annotation for Portuguese: from manual annotation to automatic labeling
title Modality annotation for Portuguese: from manual annotation to automatic labeling
spellingShingle Modality annotation for Portuguese: from manual annotation to automatic labeling
Mendes, Amália
Modality
Corpus annotation
Text mining
Portuguese linguistics
title_short Modality annotation for Portuguese: from manual annotation to automatic labeling
title_full Modality annotation for Portuguese: from manual annotation to automatic labeling
title_fullStr Modality annotation for Portuguese: from manual annotation to automatic labeling
title_full_unstemmed Modality annotation for Portuguese: from manual annotation to automatic labeling
title_sort Modality annotation for Portuguese: from manual annotation to automatic labeling
author Mendes, Amália
author_facet Mendes, Amália
Hendrickx, Iris
Ávila, Luciana
Quaresma, Paulo
Gonçalves, Teresa
Sequeira, João
author_role author
author2 Hendrickx, Iris
Ávila, Luciana
Quaresma, Paulo
Gonçalves, Teresa
Sequeira, João
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Mendes, Amália
Hendrickx, Iris
Ávila, Luciana
Quaresma, Paulo
Gonçalves, Teresa
Sequeira, João
dc.subject.por.fl_str_mv Modality
Corpus annotation
Text mining
Portuguese linguistics
topic Modality
Corpus annotation
Text mining
Portuguese linguistics
description We investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classi er trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new uni ed scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data.
publishDate 2016
dc.date.none.fl_str_mv 2016
2016-01-01T00:00:00Z
2018-01-17T16:50:56Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/30693
url http://hdl.handle.net/10451/30693
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Mendes, Amália, Iris Hendrickx, Luciana Ávila, Paulo Quaresma, Teresa Gonçalves, João Sequeira (2016) Modality Annotation for Portuguese: from manual annotation to automatic labeling, LiLT - Language Issues in Language Technology, 14 (2016), Special volume on Modality: Modes of Modality in NLP (ISSN: 1945-3604)
1945-3604
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Stanford University
publisher.none.fl_str_mv Stanford University
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134387571261440