Resolução de correferência nominal usando semântica em língua portuguesa

Fonseca, Evandro Brasil

Resolução de correferência nominal usando semântica em língua portuguesa

Detalhes bibliográficos
Autor(a) principal:	Fonseca, Evandro Brasil
Data de Publicação:	2018
Tipo de documento:	Tese
Idioma:	por
Título da fonte:	Biblioteca Digital de Teses e Dissertações da PUC_RS
Texto Completo:	http://tede2.pucrs.br/tede2/handle/tede/8169
Resumo:	Coreference Resolution task is challenging for Natural Language Processing, considering the required linguistic knowledge and the sophistication of language processing techniques involved. Even though it is a demanding task, a motivating factor in the study of this phenomenon is its usefulness. Basically, several Natural Language Processing tasks may benefit from their results, such as named entities recognition, relation extraction between named entities, summarization, sentiment analysis, among others. Coreference Resolution is a process that consists on identifying certain terms and expressions that refer to the same entity. For example, in the sentence “ France is refusing. The country is one of the first in the ranking... ” we can say that [the country] is a coreference of [France]. By grouping these referential terms, we form coreference groups, more commonly known as coreference chains. This thesis proposes a process for coreference resolution between noun phrases for Portuguese, focusing on the use of semantic knowledge. Our proposed approach is based on syntactic-semantic linguistic rules. That is, we combine different levels of linguistic processing, using semantic relations as support, in order to infer referential relations between mentions. Models based on linguistic rules have been efficiently applied in other languages, such as: English, Spanish and Galician. In few words, these models are more efficient than machine learning approaches when we deal with less resourceful languages, since the lack of sample-rich corpora may produce a poor training. The proposed approach is the first model for Portuguese coreference resolution which uses semantic knowledge. Thus, we consider it as the main contribution of this thesis.

Metadados do item

id	P_RS_343e7a6d1062ba023a79e689e27a50b4
oai_identifier_str	oai:tede2.pucrs.br:tede/8169
network_acronym_str	P_RS
network_name_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
repository_id_str
spelling	Vieira, RenataVanin, Aline Averhttp://lattes.cnpq.br/7639784707152839http://lattes.cnpq.br/3229974637891253Fonseca, Evandro Brasil2018-06-26T14:48:46Z2018-03-19http://tede2.pucrs.br/tede2/handle/tede/8169Coreference Resolution task is challenging for Natural Language Processing, considering the required linguistic knowledge and the sophistication of language processing techniques involved. Even though it is a demanding task, a motivating factor in the study of this phenomenon is its usefulness. Basically, several Natural Language Processing tasks may benefit from their results, such as named entities recognition, relation extraction between named entities, summarization, sentiment analysis, among others. Coreference Resolution is a process that consists on identifying certain terms and expressions that refer to the same entity. For example, in the sentence “ France is refusing. The country is one of the first in the ranking... ” we can say that [the country] is a coreference of [France]. By grouping these referential terms, we form coreference groups, more commonly known as coreference chains. This thesis proposes a process for coreference resolution between noun phrases for Portuguese, focusing on the use of semantic knowledge. Our proposed approach is based on syntactic-semantic linguistic rules. That is, we combine different levels of linguistic processing, using semantic relations as support, in order to infer referential relations between mentions. Models based on linguistic rules have been efficiently applied in other languages, such as: English, Spanish and Galician. In few words, these models are more efficient than machine learning approaches when we deal with less resourceful languages, since the lack of sample-rich corpora may produce a poor training. The proposed approach is the first model for Portuguese coreference resolution which uses semantic knowledge. Thus, we consider it as the main contribution of this thesis.A tarefa de Resolução de Correferência é um grande desafio para a área de Processamento da Linguagem Natural, tendo em vista o conhecimento linguístico exigido e a sofisticação das técnicas de processamento da língua empregados. Mesmo sendo uma tarefa desafiadora, um fator motivador do estudo deste fenômeno se dá pela sua utilidade. Basicamente, várias tarefas de Processamento da Linguagem Natural podem se beneficiar de seus resultados, como, por exemplo, o reconhecimento de entidades nomeadas, extração de relação entre entidades nomeadas, sumarização, análise de sentimentos, entre outras. A Resolução de Correferência é um processo que consiste em identificar determinados termos e expressões que remetem a uma mesma entidade. Por exemplo, na sentença “A França está resistindo. O país é um dos primeiros no ranking...” podemos dizer que [o país] é uma correferência de [A França]. Realizando o agrupamento desses termos referenciais, formamos grupos de menções correferentes, mais conhecidos como cadeias de correferência. Esta tese propõe um processo para a resolução de correferência entre sintagmas nominais para a língua portuguesa, tendo como foco a utilização do conhecimento semântico. Nossa abordagem proposta é baseada em regras linguísticas sintático-semânticas. Ou seja, combinamos diferentes níveis de processamento linguístico utilizando relações semânticas como apoio, de forma a inferir relações referenciais entre menções. Modelos baseados em regras linguísticas têm sido aplicados eficientemente em outros idiomas como o inglês, o espanhol e o galego. Esses modelos mostram-se mais eficientes que os baseados em aprendizado de máquina quando lidamos com idiomas menos providos de recursos, dado que a ausência de corpora ricos em amostras pode prejudicar o treino desses modelos. O modelo proposto nesta tese é o primeiro voltado para a resolução de correferência em português que faz uso de conhecimento semântico. Dessa forma, tomamos este fator como a principal contribuição deste trabalho.Submitted by PPG Ciência da Computação (ppgcc@pucrs.br) on 2018-06-19T11:37:24Z No. of bitstreams: 1 EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5)Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-06-26T14:40:39Z (GMT) No. of bitstreams: 1 EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5)Made available in DSpace on 2018-06-26T14:48:46Z (GMT). No. of bitstreams: 1 EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5) Previous issue date: 2018-03-19application/pdfhttp://tede2.pucrs.br:80/tede2/retrieve/172616/EVANDRO%20BRASIL%20FONSECA_TES.pdf.jpgporPontifícia Universidade Católica do Rio Grande do SulPrograma de Pós-Graduação em Ciência da ComputaçãoPUCRSBrasilEscola PolitécnicaResolução de CorreferênciaExtração de InformaçãoCoreference ResolutionInformation ExtractionCIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAOResolução de correferência nominal usando semântica em língua portuguesainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisTrabalho não apresenta restrição para publicação1974996533081274470500500-862078257083325301info:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_RSinstname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)instacron:PUC_RSTHUMBNAILEVANDRO BRASIL FONSECA_TES.pdf.jpgEVANDRO BRASIL FONSECA_TES.pdf.jpgimage/jpeg4899http://tede2.pucrs.br/tede2/bitstream/tede/8169/4/EVANDRO+BRASIL+FONSECA_TES.pdf.jpgd7fa51000ab126c04f3d0dea38dd68f4MD54TEXTEVANDRO BRASIL FONSECA_TES.pdf.txtEVANDRO BRASIL FONSECA_TES.pdf.txttext/plain208449http://tede2.pucrs.br/tede2/bitstream/tede/8169/3/EVANDRO+BRASIL+FONSECA_TES.pdf.txt0da35164ce29c1637605f29c70d29c6bMD53ORIGINALEVANDRO BRASIL FONSECA_TES.pdfEVANDRO BRASIL FONSECA_TES.pdfapplication/pdf1972824http://tede2.pucrs.br/tede2/bitstream/tede/8169/2/EVANDRO+BRASIL+FONSECA_TES.pdf9fca0c499753cd9d2822c59040e826bfMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-8610http://tede2.pucrs.br/tede2/bitstream/tede/8169/1/license.txt5a9d6006225b368ef605ba16b4f6d1beMD51tede/81692018-06-26 12:00:58.995oai:tede2.pucrs.br:tede/8169QXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2HDp8OjbyBFbGV0csO0bmljYTogQ29tIGJhc2Ugbm8gZGlzcG9zdG8gbmEgTGVpIEZlZGVyYWwgbsK6OS42MTAsIGRlIDE5IGRlIGZldmVyZWlybyBkZSAxOTk4LCBvIGF1dG9yIEFVVE9SSVpBIGEgcHVibGljYcOnw6NvIGVsZXRyw7RuaWNhIGRhIHByZXNlbnRlIG9icmEgbm8gYWNlcnZvIGRhIEJpYmxpb3RlY2EgRGlnaXRhbCBkYSBQb250aWbDrWNpYSBVbml2ZXJzaWRhZGUgQ2F0w7NsaWNhIGRvIFJpbyBHcmFuZGUgZG8gU3VsLCBzZWRpYWRhIGEgQXYuIElwaXJhbmdhIDY2ODEsIFBvcnRvIEFsZWdyZSwgUmlvIEdyYW5kZSBkbyBTdWwsIGNvbSByZWdpc3RybyBkZSBDTlBKIDg4NjMwNDEzMDAwMi04MSBiZW0gY29tbyBlbSBvdXRyYXMgYmlibGlvdGVjYXMgZGlnaXRhaXMsIG5hY2lvbmFpcyBlIGludGVybmFjaW9uYWlzLCBjb25zw7NyY2lvcyBlIHJlZGVzIMOgcyBxdWFpcyBhIGJpYmxpb3RlY2EgZGEgUFVDUlMgcG9zc2EgYSB2aXIgcGFydGljaXBhciwgc2VtIMO0bnVzIGFsdXNpdm8gYW9zIGRpcmVpdG9zIGF1dG9yYWlzLCBhIHTDrXR1bG8gZGUgZGl2dWxnYcOnw6NvIGRhIHByb2R1w6fDo28gY2llbnTDrWZpY2EuCg==Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.pucrs.br/tede2/PRIhttps://tede2.pucrs.br/oai/requestbiblioteca.central@pucrs.br\|\|opendoar:2018-06-26T15:00:58Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)false
dc.title.por.fl_str_mv	Resolução de correferência nominal usando semântica em língua portuguesa
title	Resolução de correferência nominal usando semântica em língua portuguesa
spellingShingle	Resolução de correferência nominal usando semântica em língua portuguesa Fonseca, Evandro Brasil Resolução de Correferência Extração de Informação Coreference Resolution Information Extraction CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
title_short	Resolução de correferência nominal usando semântica em língua portuguesa
title_full	Resolução de correferência nominal usando semântica em língua portuguesa
title_fullStr	Resolução de correferência nominal usando semântica em língua portuguesa
title_full_unstemmed	Resolução de correferência nominal usando semântica em língua portuguesa
title_sort	Resolução de correferência nominal usando semântica em língua portuguesa
author	Fonseca, Evandro Brasil
author_facet	Fonseca, Evandro Brasil
author_role	author
dc.contributor.advisor1.fl_str_mv	Vieira, Renata
dc.contributor.advisor-co1.fl_str_mv	Vanin, Aline Aver
dc.contributor.advisor-co1Lattes.fl_str_mv	http://lattes.cnpq.br/7639784707152839
dc.contributor.authorLattes.fl_str_mv	http://lattes.cnpq.br/3229974637891253
dc.contributor.author.fl_str_mv	Fonseca, Evandro Brasil
contributor_str_mv	Vieira, Renata Vanin, Aline Aver
dc.subject.por.fl_str_mv	Resolução de Correferência Extração de Informação
topic	Resolução de Correferência Extração de Informação Coreference Resolution Information Extraction CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	Coreference Resolution Information Extraction
dc.subject.cnpq.fl_str_mv	CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
description	Coreference Resolution task is challenging for Natural Language Processing, considering the required linguistic knowledge and the sophistication of language processing techniques involved. Even though it is a demanding task, a motivating factor in the study of this phenomenon is its usefulness. Basically, several Natural Language Processing tasks may benefit from their results, such as named entities recognition, relation extraction between named entities, summarization, sentiment analysis, among others. Coreference Resolution is a process that consists on identifying certain terms and expressions that refer to the same entity. For example, in the sentence “ France is refusing. The country is one of the first in the ranking... ” we can say that [the country] is a coreference of [France]. By grouping these referential terms, we form coreference groups, more commonly known as coreference chains. This thesis proposes a process for coreference resolution between noun phrases for Portuguese, focusing on the use of semantic knowledge. Our proposed approach is based on syntactic-semantic linguistic rules. That is, we combine different levels of linguistic processing, using semantic relations as support, in order to infer referential relations between mentions. Models based on linguistic rules have been efficiently applied in other languages, such as: English, Spanish and Galician. In few words, these models are more efficient than machine learning approaches when we deal with less resourceful languages, since the lack of sample-rich corpora may produce a poor training. The proposed approach is the first model for Portuguese coreference resolution which uses semantic knowledge. Thus, we consider it as the main contribution of this thesis.
publishDate	2018
dc.date.accessioned.fl_str_mv	2018-06-26T14:48:46Z
dc.date.issued.fl_str_mv	2018-03-19
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://tede2.pucrs.br/tede2/handle/tede/8169
url	http://tede2.pucrs.br/tede2/handle/tede/8169
dc.language.iso.fl_str_mv	por
language	por
dc.relation.program.fl_str_mv	1974996533081274470
dc.relation.confidence.fl_str_mv	500 500
dc.relation.cnpq.fl_str_mv	-862078257083325301
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Pontifícia Universidade Católica do Rio Grande do Sul
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv	PUCRS
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	Escola Politécnica
publisher.none.fl_str_mv	Pontifícia Universidade Católica do Rio Grande do Sul
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da PUC_RS instname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS) instacron:PUC_RS
instname_str	Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron_str	PUC_RS
institution	PUC_RS
reponame_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
collection	Biblioteca Digital de Teses e Dissertações da PUC_RS
bitstream.url.fl_str_mv	http://tede2.pucrs.br/tede2/bitstream/tede/8169/4/EVANDRO+BRASIL+FONSECA_TES.pdf.jpg http://tede2.pucrs.br/tede2/bitstream/tede/8169/3/EVANDRO+BRASIL+FONSECA_TES.pdf.txt http://tede2.pucrs.br/tede2/bitstream/tede/8169/2/EVANDRO+BRASIL+FONSECA_TES.pdf http://tede2.pucrs.br/tede2/bitstream/tede/8169/1/license.txt
bitstream.checksum.fl_str_mv	d7fa51000ab126c04f3d0dea38dd68f4 0da35164ce29c1637605f29c70d29c6b 9fca0c499753cd9d2822c59040e826bf 5a9d6006225b368ef605ba16b4f6d1be
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
repository.mail.fl_str_mv	biblioteca.central@pucrs.br\|\|
_version_	1799765334680403968

Resolução de correferência nominal usando semântica em língua portuguesa

Registros relacionados