Abstract Meaning Representation Parsing for the Brazilian Portuguese Language
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da USP |
Texto Completo: | https://www.teses.usp.br/teses/disponiveis/55/55134/tde-29072020-120805/ |
Resumo: | Computational semantics is the area in charge of studying possible meaning representations, that is, computationally viable semantic formalisms to represent human expressions. Such formalisms play an important role in making sense of natural language, capturing the meaning of linguistic statements. Moreover, these formalisms are the main component to develop semantic parsers, which are responsible to map sentences of a natural language into a computationally treatable meaning representation. In order to represent and understand semantic features of a natural language and, with that, develop computational tools that produce results close to those of humans, several semantic formalisms were proposed, as Universal Networking Language (UNL), Universal Conceptual Cognitive Annotation (UCCA), Abstract Meaning Representation (AMR), among others. In special, AMR is a rooted directed graph-based semantic formalism with labeled nodes and edges. The nodes are concepts (that may be the words of a sentence) and the edges are semantic relations among them, where the nodes do not have an explicit alignment with the tokens of the sentences. Furthermore, AMR encompasses some linguistic features, as named entities, coreference, semantic roles, word sense disambiguation, and others. In this work, we focused on AMR representation for Portuguese, since it has a simpler structure to produce than other semantic formalisms. In this way, we annotated the Little Prince book, which is the first annotated corpus with AMR information for Portuguese and developed the first AMR parser for Portuguese. Moreover, we adapted some AMR parsing methods from English to Portuguese. More than that, we developed a new alignment strategy to align the word tokens of the sentence and the nodes of the AMR graph that improves the results of the adapted AMR parsers and a new metric to evaluate AMR graphs, which is more robust, faster, and fairer than the traditional AMR metric. Finally, we used these resources and methods in a paraphrase detection task, joining both explicit and implicit semantic features to classify if two sentences are paraphrase each other. |
id |
USP_a3326e3c735cd4f316fd25a80cad7d3b |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-29072020-120805 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
2721 |
spelling |
Abstract Meaning Representation Parsing for the Brazilian Portuguese LanguageAnalisadores para Representação Abstrata de Significado para o Português BrasileiroAbstract meaning representationAnalisador semânticoAnotação semânticaRepresentação abstrata de significadoSemantic annotationSemantic parsnigComputational semantics is the area in charge of studying possible meaning representations, that is, computationally viable semantic formalisms to represent human expressions. Such formalisms play an important role in making sense of natural language, capturing the meaning of linguistic statements. Moreover, these formalisms are the main component to develop semantic parsers, which are responsible to map sentences of a natural language into a computationally treatable meaning representation. In order to represent and understand semantic features of a natural language and, with that, develop computational tools that produce results close to those of humans, several semantic formalisms were proposed, as Universal Networking Language (UNL), Universal Conceptual Cognitive Annotation (UCCA), Abstract Meaning Representation (AMR), among others. In special, AMR is a rooted directed graph-based semantic formalism with labeled nodes and edges. The nodes are concepts (that may be the words of a sentence) and the edges are semantic relations among them, where the nodes do not have an explicit alignment with the tokens of the sentences. Furthermore, AMR encompasses some linguistic features, as named entities, coreference, semantic roles, word sense disambiguation, and others. In this work, we focused on AMR representation for Portuguese, since it has a simpler structure to produce than other semantic formalisms. In this way, we annotated the Little Prince book, which is the first annotated corpus with AMR information for Portuguese and developed the first AMR parser for Portuguese. Moreover, we adapted some AMR parsing methods from English to Portuguese. More than that, we developed a new alignment strategy to align the word tokens of the sentence and the nodes of the AMR graph that improves the results of the adapted AMR parsers and a new metric to evaluate AMR graphs, which is more robust, faster, and fairer than the traditional AMR metric. Finally, we used these resources and methods in a paraphrase detection task, joining both explicit and implicit semantic features to classify if two sentences are paraphrase each other.Semântica computacional é a área encarregada de estudar possíveis representações semânticas, ou seja, formalismos semânticos que são computacionalmente viáveis para representar expressões da língua humana. Esses formalismos desempenham um papel importante para o entendimento de uma língua natural, capturando o significado de expressões linguísticas. Além disso, eles são o principal ingrediente para desenvolver analisadores semânticos, que são responsáveis por mapear sentenças de uma língua natural em uma representação semântica computacionalmente tratável. Com o objetivo de representar e entender características semânticas de uma língua natural e, com isso, desenvolver ferramentas computacionais que produzam resultados mais próximos aos dos humanos, diversos formalismos semânticos foram propostos, como: Universal Networking Language (UNL), Universal Conceptual Cognitive Annotation, (UCCA), Abstract Meaning Representation (AMR), entre outros. Em especial, Abstract Meaning Representation (AMR) é um formalismo semântico baseado em grafo direcionado que possui única raiz com nós e arestas rotulados. Os nós representam conceitos (que podem ser as palavras de uma sentença), as arestas representam relações semânticas entre os conceitos e os nós não possuem alinhamento explícito com as palavras da sentença. AMR compreende algumas caractetísticas semânticas como: entidades nomeadas, correferência, papéis semânticos, desambiguação lexical, entre outras. Neste trabalho, focou-se na representação AMR para a língua portuguesa, pois ela possui uma estrutura mais fácil de produzir do que outras representações semânticas. Dessa forma, anotou-se o livro do Pequeno Príncipe, que é primeiro corpus anotado nesse formalismo para a língua portuguesa e desenvolveu-se o primeiro analisador semântico para essa representação. Além disso, adaptou-se alguns métodos de análise semântica da língua inglesa para a língua portuguesa. Mais do que isso, desenvolveu-se um novo método de alinhamento entre as palavras da sentença e os nós do grafo que melhora os resultados dos analisadores semânticos adaptados e um novo método de avaliação entre grafos AMRs que é mais robusto, rápido e justo do que a métrica tradicional de avaliação. Por fim, utilizou-se esses métodos em uma tarefa de detecção de paráfrase, combinando tanto características semânticas implícitas quanto explícitas para classificar se uma sentença é paráfrase de outra.Biblioteca Digitais de Teses e Dissertações da USPPardo, Thiago Alexandre SalgueiroAnchiêta, Rafael Torres2020-05-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-29072020-120805/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2020-08-13T00:48:27Zoai:teses.usp.br:tde-29072020-120805Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212020-08-13T00:48:27Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Abstract Meaning Representation Parsing for the Brazilian Portuguese Language Analisadores para Representação Abstrata de Significado para o Português Brasileiro |
title |
Abstract Meaning Representation Parsing for the Brazilian Portuguese Language |
spellingShingle |
Abstract Meaning Representation Parsing for the Brazilian Portuguese Language Anchiêta, Rafael Torres Abstract meaning representation Analisador semântico Anotação semântica Representação abstrata de significado Semantic annotation Semantic parsnig |
title_short |
Abstract Meaning Representation Parsing for the Brazilian Portuguese Language |
title_full |
Abstract Meaning Representation Parsing for the Brazilian Portuguese Language |
title_fullStr |
Abstract Meaning Representation Parsing for the Brazilian Portuguese Language |
title_full_unstemmed |
Abstract Meaning Representation Parsing for the Brazilian Portuguese Language |
title_sort |
Abstract Meaning Representation Parsing for the Brazilian Portuguese Language |
author |
Anchiêta, Rafael Torres |
author_facet |
Anchiêta, Rafael Torres |
author_role |
author |
dc.contributor.none.fl_str_mv |
Pardo, Thiago Alexandre Salgueiro |
dc.contributor.author.fl_str_mv |
Anchiêta, Rafael Torres |
dc.subject.por.fl_str_mv |
Abstract meaning representation Analisador semântico Anotação semântica Representação abstrata de significado Semantic annotation Semantic parsnig |
topic |
Abstract meaning representation Analisador semântico Anotação semântica Representação abstrata de significado Semantic annotation Semantic parsnig |
description |
Computational semantics is the area in charge of studying possible meaning representations, that is, computationally viable semantic formalisms to represent human expressions. Such formalisms play an important role in making sense of natural language, capturing the meaning of linguistic statements. Moreover, these formalisms are the main component to develop semantic parsers, which are responsible to map sentences of a natural language into a computationally treatable meaning representation. In order to represent and understand semantic features of a natural language and, with that, develop computational tools that produce results close to those of humans, several semantic formalisms were proposed, as Universal Networking Language (UNL), Universal Conceptual Cognitive Annotation (UCCA), Abstract Meaning Representation (AMR), among others. In special, AMR is a rooted directed graph-based semantic formalism with labeled nodes and edges. The nodes are concepts (that may be the words of a sentence) and the edges are semantic relations among them, where the nodes do not have an explicit alignment with the tokens of the sentences. Furthermore, AMR encompasses some linguistic features, as named entities, coreference, semantic roles, word sense disambiguation, and others. In this work, we focused on AMR representation for Portuguese, since it has a simpler structure to produce than other semantic formalisms. In this way, we annotated the Little Prince book, which is the first annotated corpus with AMR information for Portuguese and developed the first AMR parser for Portuguese. Moreover, we adapted some AMR parsing methods from English to Portuguese. More than that, we developed a new alignment strategy to align the word tokens of the sentence and the nodes of the AMR graph that improves the results of the adapted AMR parsers and a new metric to evaluate AMR graphs, which is more robust, faster, and fairer than the traditional AMR metric. Finally, we used these resources and methods in a paraphrase detection task, joining both explicit and implicit semantic features to classify if two sentences are paraphrase each other. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-05-22 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-29072020-120805/ |
url |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-29072020-120805/ |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
|
dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.coverage.none.fl_str_mv |
|
dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1809091179316248576 |