Modality annotation for Portuguese: from manual annotation to automatic labeling
Autor(a) principal: | |
---|---|
Data de Publicação: | 2016 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/30693 |
Resumo: | We investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classi er trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new uni ed scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data. |
id |
RCAP_d9471b93a4f55c87e243eeac92b1fb35 |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/30693 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Modality annotation for Portuguese: from manual annotation to automatic labelingModalityCorpus annotationText miningPortuguese linguisticsWe investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classi er trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new uni ed scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data.Stanford UniversityRepositório da Universidade de LisboaMendes, AmáliaHendrickx, IrisÁvila, LucianaQuaresma, PauloGonçalves, TeresaSequeira, João2018-01-17T16:50:56Z20162016-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/30693engMendes, Amália, Iris Hendrickx, Luciana Ávila, Paulo Quaresma, Teresa Gonçalves, João Sequeira (2016) Modality Annotation for Portuguese: from manual annotation to automatic labeling, LiLT - Language Issues in Language Technology, 14 (2016), Special volume on Modality: Modes of Modality in NLP (ISSN: 1945-3604)1945-3604info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:23:35Zoai:repositorio.ul.pt:10451/30693Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:46:18.824938Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Modality annotation for Portuguese: from manual annotation to automatic labeling |
title |
Modality annotation for Portuguese: from manual annotation to automatic labeling |
spellingShingle |
Modality annotation for Portuguese: from manual annotation to automatic labeling Mendes, Amália Modality Corpus annotation Text mining Portuguese linguistics |
title_short |
Modality annotation for Portuguese: from manual annotation to automatic labeling |
title_full |
Modality annotation for Portuguese: from manual annotation to automatic labeling |
title_fullStr |
Modality annotation for Portuguese: from manual annotation to automatic labeling |
title_full_unstemmed |
Modality annotation for Portuguese: from manual annotation to automatic labeling |
title_sort |
Modality annotation for Portuguese: from manual annotation to automatic labeling |
author |
Mendes, Amália |
author_facet |
Mendes, Amália Hendrickx, Iris Ávila, Luciana Quaresma, Paulo Gonçalves, Teresa Sequeira, João |
author_role |
author |
author2 |
Hendrickx, Iris Ávila, Luciana Quaresma, Paulo Gonçalves, Teresa Sequeira, João |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Mendes, Amália Hendrickx, Iris Ávila, Luciana Quaresma, Paulo Gonçalves, Teresa Sequeira, João |
dc.subject.por.fl_str_mv |
Modality Corpus annotation Text mining Portuguese linguistics |
topic |
Modality Corpus annotation Text mining Portuguese linguistics |
description |
We investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classi er trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new uni ed scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016 2016-01-01T00:00:00Z 2018-01-17T16:50:56Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/30693 |
url |
http://hdl.handle.net/10451/30693 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Mendes, Amália, Iris Hendrickx, Luciana Ávila, Paulo Quaresma, Teresa Gonçalves, João Sequeira (2016) Modality Annotation for Portuguese: from manual annotation to automatic labeling, LiLT - Language Issues in Language Technology, 14 (2016), Special volume on Modality: Modes of Modality in NLP (ISSN: 1945-3604) 1945-3604 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Stanford University |
publisher.none.fl_str_mv |
Stanford University |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134387571261440 |