Extração automática de relações semânticas a partir de textos escritos em português do Brasil

Taba, Leonardo Sameshima

Extração automática de relações semânticas a partir de textos escritos em português do Brasil

Detalhes bibliográficos
Autor(a) principal:	Taba, Leonardo Sameshima
Data de Publicação:	2013
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFSCAR
Texto Completo:	https://repositorio.ufscar.br/handle/ufscar/543
Resumo:	Information extraction (IE) is one of the many applications in Natural Language Processing (NLP); it focuses on processing texts in order to retrieve specific information about a certain entity or concept. One of its subtasks is the automatic extraction of semantic relations between terms, which is very useful in the construction and improvement of linguistic resources such as ontologies and lexical bases. Moreover, there s a rising demand for semantic knowledge, as many computational NLP systems need that information in their processing. Applications such as information retrieval from web documents and automatic translation to other languages could benefit from that kind of knowledge. However, there aren t sufficient human resources to produce that knowledge at the same rate of its demand. Aiming to solve that semantic data scarcity problem, this work investigates how binary semantic relations can be automatically extracted from Brazilian Portuguese texts. These relations are based on Minsky s (1986) theory and are used to represent common sense knowledge in the Open Mind Common Sense no Brasil (OMCS-Br) project developed at LIA (Laboratório de Interação Avanc¸ada), partner of LaLiC (Laborat´orio de Lingu´ıstica Computacional), where this research was conducted, both in Universidade Federal de São Carlos (UFSCar). The first strategies for this task were based on searching textual patterns in texts, where a certain textual expression indicates that there is a specific relation between two terms in a sentence. This approach has high precision but low recall, which led to the research of methods that use machine learning as their main model, encompassing techniques such as probabilistic and statistical classifiers and also kernel methods, which currently figure among the state of the art. Therefore, this work investigates, implements and evaluates some of these techniques in order to determine how and to which extent they can be applied to the automatic extraction of binary semantic relations in Portuguese texts. In that way, this work is an important step in the advancement of the state of the art in information extraction for the Portuguese language, which still lacks resources in the semantic area, and also advances the Portuguese language NLP scenario as a whole.

Metadados do item

id	SCAR_0816d4ff7b7c07ba88c98b7b36dc303c
oai_identifier_str	oai:repositorio.ufscar.br:ufscar/543
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str	4322
spelling	Taba, Leonardo SameshimaCaseli, Helena de Medeiroshttp://lattes.cnpq.br/6608582057810385http://lattes.cnpq.br/294519397662403032ec034a-31c2-4dfa-82b3-6a2477c10b212016-06-02T19:06:08Z2013-09-272016-06-02T19:06:08Z2013-07-11TABA, Leonardo Sameshima. Extração automática de relações semânticas a partir de textos escritos em português do Brasil. 2013. 98 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2013.https://repositorio.ufscar.br/handle/ufscar/543Information extraction (IE) is one of the many applications in Natural Language Processing (NLP); it focuses on processing texts in order to retrieve specific information about a certain entity or concept. One of its subtasks is the automatic extraction of semantic relations between terms, which is very useful in the construction and improvement of linguistic resources such as ontologies and lexical bases. Moreover, there s a rising demand for semantic knowledge, as many computational NLP systems need that information in their processing. Applications such as information retrieval from web documents and automatic translation to other languages could benefit from that kind of knowledge. However, there aren t sufficient human resources to produce that knowledge at the same rate of its demand. Aiming to solve that semantic data scarcity problem, this work investigates how binary semantic relations can be automatically extracted from Brazilian Portuguese texts. These relations are based on Minsky s (1986) theory and are used to represent common sense knowledge in the Open Mind Common Sense no Brasil (OMCS-Br) project developed at LIA (Laboratório de Interação Avanc¸ada), partner of LaLiC (Laborat´orio de Lingu´ıstica Computacional), where this research was conducted, both in Universidade Federal de São Carlos (UFSCar). The first strategies for this task were based on searching textual patterns in texts, where a certain textual expression indicates that there is a specific relation between two terms in a sentence. This approach has high precision but low recall, which led to the research of methods that use machine learning as their main model, encompassing techniques such as probabilistic and statistical classifiers and also kernel methods, which currently figure among the state of the art. Therefore, this work investigates, implements and evaluates some of these techniques in order to determine how and to which extent they can be applied to the automatic extraction of binary semantic relations in Portuguese texts. In that way, this work is an important step in the advancement of the state of the art in information extraction for the Portuguese language, which still lacks resources in the semantic area, and also advances the Portuguese language NLP scenario as a whole.A extração de informação (EI) é uma das muitas aplicações do Processamento de Língua Natural (PLN); seu foco é o processamento de textos com o objetivo de recuperar informações específicas sobre uma determinada entidade ou conceito. Uma de suas subtarefas é a extração automática de relações semânticas entre termos, que é muito útil na construção e melhoramento de recursos linguísticos como ontologias e bases lexicais. A esse contexto soma-se o fato de que há uma demanda crescente por conhecimento semântico, visto que diversos sistemas computacionais de PLN necessitam dessas informações em seu processamento. Aplicações como recuperação de informação em documentos web e tradução automática para outros idiomas podem se beneficiar desse tipo de conhecimento. No entanto, não há recursos humanos suficientes para produzir esse conhecimento na mesma velocidade que sua demanda. Com o objetivo de remediar essa escassez de dados semânticos, esta dissertação apresenta a investigação da extração automática de relações semânticas binárias a partir de textos escritos no português do Brasil. Tais relações se baseiam na teoria de Minsky (1986) e são usadas para representar conhecimento de senso comum no projeto Open Mind Common Sense no Brasil (OMCS-Br) desenvolvido no LIA (Laboratório de Interação Avançada), laboratório parceiro do LaLiC (Laboratório de Linguística Computacional) no qual esta pesquisa se desenvolveu, ambos da Universidade Federal de São Carlos (UFSCar). As primeiras estratégias para essa tarefa se basearam na busca de padrões textuais em textos, onde uma determinada expressão textual indica que há uma relação específica entre dois termos em uma sentença. Essa abordagem tem alta precisão mas baixa cobertura, o que levou ao estudo de métodos que utilizam aprendizado de máquina como modelo principal, englobando o uso de técnicas como classificadores probabilísticos e estatísticos, além de métodos de kernel, que atualmente figuram no estado da arte. Esta dissertação apresenta a investigação, implementação e avaliação de algumas dessas técnicas com o objetivo de determinar como e em que medida elas podem ser aplicadas para a extração automática de relações semânticas binárias em textos escritos em português. Desse modo, este trabalho é um importante passo no avanço do estado da arte em extração de informação com foco no idioma português, que ainda carece de recursos na área semântica, além de um avanço no cenário de PLN do português como um todo.Universidade Federal de Minas Geraisapplication/pdfporUniversidade Federal de São CarlosPrograma de Pós-Graduação em Ciência da Computação - PPGCCUFSCarBRInteligência artificialProcessamento de linguagem natural (Computação)Extração de informaçãoExtração de relações semânticasCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOExtração automática de relações semânticas a partir de textos escritos em português do Brasilinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis-1-1e36d4e63-960d-4f5c-9c93-f8b7f5f93d65info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINAL5456.pdfapplication/pdf1895896https://repositorio.ufscar.br/bitstream/ufscar/543/1/5456.pdf0a6d9c5bee84eaab067717a8c3e11b11MD51TEXT5456.pdf.txt5456.pdf.txtExtracted texttext/plain0https://repositorio.ufscar.br/bitstream/ufscar/543/2/5456.pdf.txtd41d8cd98f00b204e9800998ecf8427eMD52THUMBNAIL5456.pdf.jpg5456.pdf.jpgIM Thumbnailimage/jpeg5647https://repositorio.ufscar.br/bitstream/ufscar/543/3/5456.pdf.jpgeb1abda97547bcdeb32a1e1a17962fd9MD53ufscar/5432023-09-18 18:31:27.503oai:repositorio.ufscar.br:ufscar/543Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:27Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv	Extração automática de relações semânticas a partir de textos escritos em português do Brasil
title	Extração automática de relações semânticas a partir de textos escritos em português do Brasil
spellingShingle	Extração automática de relações semânticas a partir de textos escritos em português do Brasil Taba, Leonardo Sameshima Inteligência artificial Processamento de linguagem natural (Computação) Extração de informação Extração de relações semânticas CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Extração automática de relações semânticas a partir de textos escritos em português do Brasil
title_full	Extração automática de relações semânticas a partir de textos escritos em português do Brasil
title_fullStr	Extração automática de relações semânticas a partir de textos escritos em português do Brasil
title_full_unstemmed	Extração automática de relações semânticas a partir de textos escritos em português do Brasil
title_sort	Extração automática de relações semânticas a partir de textos escritos em português do Brasil
author	Taba, Leonardo Sameshima
author_facet	Taba, Leonardo Sameshima
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/2945193976624030
dc.contributor.author.fl_str_mv	Taba, Leonardo Sameshima
dc.contributor.advisor1.fl_str_mv	Caseli, Helena de Medeiros
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/6608582057810385
dc.contributor.authorID.fl_str_mv	32ec034a-31c2-4dfa-82b3-6a2477c10b21
contributor_str_mv	Caseli, Helena de Medeiros
dc.subject.por.fl_str_mv	Inteligência artificial Processamento de linguagem natural (Computação) Extração de informação Extração de relações semânticas
topic	Inteligência artificial Processamento de linguagem natural (Computação) Extração de informação Extração de relações semânticas CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	Information extraction (IE) is one of the many applications in Natural Language Processing (NLP); it focuses on processing texts in order to retrieve specific information about a certain entity or concept. One of its subtasks is the automatic extraction of semantic relations between terms, which is very useful in the construction and improvement of linguistic resources such as ontologies and lexical bases. Moreover, there s a rising demand for semantic knowledge, as many computational NLP systems need that information in their processing. Applications such as information retrieval from web documents and automatic translation to other languages could benefit from that kind of knowledge. However, there aren t sufficient human resources to produce that knowledge at the same rate of its demand. Aiming to solve that semantic data scarcity problem, this work investigates how binary semantic relations can be automatically extracted from Brazilian Portuguese texts. These relations are based on Minsky s (1986) theory and are used to represent common sense knowledge in the Open Mind Common Sense no Brasil (OMCS-Br) project developed at LIA (Laboratório de Interação Avanc¸ada), partner of LaLiC (Laborat´orio de Lingu´ıstica Computacional), where this research was conducted, both in Universidade Federal de São Carlos (UFSCar). The first strategies for this task were based on searching textual patterns in texts, where a certain textual expression indicates that there is a specific relation between two terms in a sentence. This approach has high precision but low recall, which led to the research of methods that use machine learning as their main model, encompassing techniques such as probabilistic and statistical classifiers and also kernel methods, which currently figure among the state of the art. Therefore, this work investigates, implements and evaluates some of these techniques in order to determine how and to which extent they can be applied to the automatic extraction of binary semantic relations in Portuguese texts. In that way, this work is an important step in the advancement of the state of the art in information extraction for the Portuguese language, which still lacks resources in the semantic area, and also advances the Portuguese language NLP scenario as a whole.
publishDate	2013
dc.date.available.fl_str_mv	2013-09-27 2016-06-02T19:06:08Z
dc.date.issued.fl_str_mv	2013-07-11
dc.date.accessioned.fl_str_mv	2016-06-02T19:06:08Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	TABA, Leonardo Sameshima. Extração automática de relações semânticas a partir de textos escritos em português do Brasil. 2013. 98 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2013.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/ufscar/543
identifier_str_mv	TABA, Leonardo Sameshima. Extração automática de relações semânticas a partir de textos escritos em português do Brasil. 2013. 98 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2013.
url	https://repositorio.ufscar.br/handle/ufscar/543
dc.language.iso.fl_str_mv	por
language	por
dc.relation.confidence.fl_str_mv	-1 -1
dc.relation.authority.fl_str_mv	e36d4e63-960d-4f5c-9c93-f8b7f5f93d65
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.publisher.initials.fl_str_mv	UFSCar
dc.publisher.country.fl_str_mv	BR
publisher.none.fl_str_mv	Universidade Federal de São Carlos
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstream/ufscar/543/1/5456.pdf https://repositorio.ufscar.br/bitstream/ufscar/543/2/5456.pdf.txt https://repositorio.ufscar.br/bitstream/ufscar/543/3/5456.pdf.jpg
bitstream.checksum.fl_str_mv	0a6d9c5bee84eaab067717a8c3e11b11 d41d8cd98f00b204e9800998ecf8427e eb1abda97547bcdeb32a1e1a17962fd9
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_	1802136245565915136

Extração automática de relações semânticas a partir de textos escritos em português do Brasil

Registros relacionados