Extração de relação entre entidades nomeadas no contexto econômico-financeiro

Reyes, Daniel Alessandro Guimarães de los

Extração de relação entre entidades nomeadas no contexto econômico-financeiro

Detalhes bibliográficos
Autor(a) principal:	Reyes, Daniel Alessandro Guimarães de los
Data de Publicação:	2021
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Biblioteca Digital de Teses e Dissertações da PUC_RS
Texto Completo:	http://tede2.pucrs.br/tede2/handle/tede/9970
Resumo:	Competitive Intelligence (CI) is a relevant area of a corporation and can support the strategic business area, helping those responsible for decision making and how to position your organization in the market. In the financial domain, identifying the organizations contained in a news story can become insufficient, and it is also necessary to extract relations (ER) between entities. Therefore, the main goal of this work is to propose an approach for the extraction of any semantic relation between Named Entities (NEs) in the Financial Market domain for the Portuguese language. To achieve this goal, a state-of-the-art review was initially carried out, which led to the analysis of 76 articles to identify techniques and datasets used to assess them. This study shows that there are readings for the RE task in Portuguese language. Therefore, following the methodology of Knowledge Discovery in Databases (KDD) created by Fayyad, we proposed a five-step approach, which goes from collecting data to evaluating the results. This approach uses two models based on Bidirectional Transformer Encoding Representations (BERT) to process a sentence and its named entities. We first classify whether or not a given pair of entities has a semantic relation and then extract the sentence parts representing or describing the semantic relation between these named entities. The approach was developed for the Portuguese language, considering the financial domain and exploring deep linguistic representations without using other lexical-semantic resources. The results of the experiments show an accuracy of 76.3% using the Jaccard metric, which measures the similarity between the relations extracted by the extractor model, in addition to achieving scores of 87%, 84.5% and 85.8%, respectively for the Recall, Precision and F-Measure metrics when assessing the complete approach. Another important contribution is the manually built corpus with more than 9,114 tuples (phrase, entity, entity) annotated from tweets and news provided by CI analysts to support the decision.

Metadados do item

id	P_RS_e48f0768ed29feabfd04a257fe7362f5
oai_identifier_str	oai:tede2.pucrs.br:tede/9970
network_acronym_str	P_RS
network_name_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
repository_id_str
spelling	Manssour, Isabel Harbhttp://lattes.cnpq.br/4904489502853690Reyes, Daniel Alessandro Guimarães de los2021-11-24T11:25:17Z2021-08-30http://tede2.pucrs.br/tede2/handle/tede/9970Competitive Intelligence (CI) is a relevant area of a corporation and can support the strategic business area, helping those responsible for decision making and how to position your organization in the market. In the financial domain, identifying the organizations contained in a news story can become insufficient, and it is also necessary to extract relations (ER) between entities. Therefore, the main goal of this work is to propose an approach for the extraction of any semantic relation between Named Entities (NEs) in the Financial Market domain for the Portuguese language. To achieve this goal, a state-of-the-art review was initially carried out, which led to the analysis of 76 articles to identify techniques and datasets used to assess them. This study shows that there are readings for the RE task in Portuguese language. Therefore, following the methodology of Knowledge Discovery in Databases (KDD) created by Fayyad, we proposed a five-step approach, which goes from collecting data to evaluating the results. This approach uses two models based on Bidirectional Transformer Encoding Representations (BERT) to process a sentence and its named entities. We first classify whether or not a given pair of entities has a semantic relation and then extract the sentence parts representing or describing the semantic relation between these named entities. The approach was developed for the Portuguese language, considering the financial domain and exploring deep linguistic representations without using other lexical-semantic resources. The results of the experiments show an accuracy of 76.3% using the Jaccard metric, which measures the similarity between the relations extracted by the extractor model, in addition to achieving scores of 87%, 84.5% and 85.8%, respectively for the Recall, Precision and F-Measure metrics when assessing the complete approach. Another important contribution is the manually built corpus with more than 9,114 tuples (phrase, entity, entity) annotated from tweets and news provided by CI analysts to support the decision.Inteligência Competitiva (IC) é uma área relevante de uma corporação e pode apoiar a área estratégica de negócios, auxiliando os responsáveis pela tomada de decisões e como posicionar sua organização no mercado. No domínio financeiro, a identificação das organizações contidas em uma notícia pode se tornar insuficiente, sendo necessário extrair relações (ER) entre as entidades. Assim sendo, o objetivo deste trabalho é propor uma abordagem para a extração de qualquer relação semântica entre Entidades Nomeadas (ENs) no domínio do Mercado Financeiro para a língua portuguesa. Para atingir este objetivo, inicialmente foi feita uma revisão do estado da arte que levou à análise de 76 artigos para identificar as técnicas e conjuntos de dados usados para avaliá-las. Este estudo demonstrou que existem poucas abordagens para a tarefa de ER na língua portuguesa. Portanto, seguindo a metodologia de Knowledge Discovery in Databases (KDD) criada por Fayyad, propusemos uma abordagem em cinco etapas, que vai desde a coleta de dados até a avaliação dos resultados. Esta abordagem usa dois modelos baseados em Bidirectional Transformer Encoding Representations (BERT) para processar uma frase e suas entidades nomeadas. Primeiro classificamos se um determinado par de entidades tem ou não uma relação semântica e, em seguida, extraímos as partes da frase que representam ou descrevem a relação semântica entre essas entidades nomeadas. A abordagem foi desenvolvida para a língua portuguesa, considerando o domínio financeiro e explorando representações linguísticas profundas sem utilizar outros recursos léxico-semânticos. Os resultados dos experimentos mostram uma precisão de 76,3% usando a métrica de Jaccard, que mede a similaridade entre as relações extraídas pelo modelo extrator, além de alcançar pontuações de 87%, 84,5% e 85,8%, respectivamente para as métricas de Recall, Precisão e F-Measure quando mensuramos a abordagem completa. Outra contribuição importante é o corpus construído manualmente com mais de 9.114 tuplas (frase, entidade, entidade) anotadas em tweets e notícias disponibilizadas por analistas de IC para apoiar a decisão.Submitted by PPG Ciência da Computação (ppgcc@pucrs.br) on 2021-11-23T17:54:32Z No. of bitstreams: 1 DANIEL ALESSANDRO GUIMARÃES DE LOS REYES_DIS.pdf: 2384395 bytes, checksum: 761da9e6e646f285a0d58da6103f97ca (MD5)Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2021-11-24T11:17:42Z (GMT) No. of bitstreams: 1 DANIEL ALESSANDRO GUIMARÃES DE LOS REYES_DIS.pdf: 2384395 bytes, checksum: 761da9e6e646f285a0d58da6103f97ca (MD5)Made available in DSpace on 2021-11-24T11:25:17Z (GMT). No. of bitstreams: 1 DANIEL ALESSANDRO GUIMARÃES DE LOS REYES_DIS.pdf: 2384395 bytes, checksum: 761da9e6e646f285a0d58da6103f97ca (MD5) Previous issue date: 2021-08-30Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESapplication/pdfhttp://tede2.pucrs.br:80/tede2/retrieve/182625/DANIEL%20ALESSANDRO%20GUIMAR%c3%83ES%20DE%20LOS%20REYES_DIS.pdf.jpgporPontifícia Universidade Católica do Rio Grande do SulPrograma de Pós-Graduação em Ciência da ComputaçãoPUCRSBrasilEscola PolitécnicaExtração de RelaçãoExtração de Relação Financeira de Entidade NomeadaExtração de Relação SemânticaProcessamento de Linguagem NaturalRelation ExtractionFinancial Named-Entity Relation ExtractionSemantic Eelation ExtractionNatural Language ProcessingCIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAOExtração de relação entre entidades nomeadas no contexto econômico-financeiroinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisTrabalho não apresenta restrição para publicação-4570527706994352458500500600-8620782570833253013590462550136975366info:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_RSinstname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)instacron:PUC_RSTHUMBNAILDANIEL ALESSANDRO GUIMARÃES DE LOS REYES_DIS.pdf.jpgDANIEL ALESSANDRO GUIMARÃES DE LOS REYES_DIS.pdf.jpgimage/jpeg5709http://tede2.pucrs.br/tede2/bitstream/tede/9970/4/DANIEL+ALESSANDRO+GUIMAR%C3%83ES+DE+LOS+REYES_DIS.pdf.jpg2143ef5e93f9f415cd619d5b488e27feMD54TEXTDANIEL ALESSANDRO GUIMARÃES DE LOS REYES_DIS.pdf.txtDANIEL ALESSANDRO GUIMARÃES DE LOS REYES_DIS.pdf.txttext/plain191197http://tede2.pucrs.br/tede2/bitstream/tede/9970/3/DANIEL+ALESSANDRO+GUIMAR%C3%83ES+DE+LOS+REYES_DIS.pdf.txtede5390230a7a4ff074061dbf0ac990eMD53ORIGINALDANIEL ALESSANDRO GUIMARÃES DE LOS REYES_DIS.pdfDANIEL ALESSANDRO GUIMARÃES DE LOS REYES_DIS.pdfapplication/pdf2384395http://tede2.pucrs.br/tede2/bitstream/tede/9970/2/DANIEL+ALESSANDRO+GUIMAR%C3%83ES+DE+LOS+REYES_DIS.pdf761da9e6e646f285a0d58da6103f97caMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-8590http://tede2.pucrs.br/tede2/bitstream/tede/9970/1/license.txt220e11f2d3ba5354f917c7035aadef24MD51tede/99702021-11-24 20:00:30.815oai:tede2.pucrs.br:tede/9970QXV0b3JpemE/P28gcGFyYSBQdWJsaWNhPz9vIEVsZXRyP25pY2E6IENvbSBiYXNlIG5vIGRpc3Bvc3RvIG5hIExlaSBGZWRlcmFsIG4/OS42MTAsIGRlIDE5IGRlIGZldmVyZWlybyBkZSAxOTk4LCBvIGF1dG9yIEFVVE9SSVpBIGEgcHVibGljYT8/byBlbGV0cj9uaWNhIGRhIHByZXNlbnRlIG9icmEgbm8gYWNlcnZvIGRhIEJpYmxpb3RlY2EgRGlnaXRhbCBkYSBQb250aWY/Y2lhIFVuaXZlcnNpZGFkZSBDYXQ/bGljYSBkbyBSaW8gR3JhbmRlIGRvIFN1bCwgc2VkaWFkYSBhIEF2LiBJcGlyYW5nYSA2NjgxLCBQb3J0byBBbGVncmUsIFJpbyBHcmFuZGUgZG8gU3VsLCBjb20gcmVnaXN0cm8gZGUgQ05QSiA4ODYzMDQxMzAwMDItODEgYmVtIGNvbW8gZW0gb3V0cmFzIGJpYmxpb3RlY2FzIGRpZ2l0YWlzLCBuYWNpb25haXMgZSBpbnRlcm5hY2lvbmFpcywgY29ucz9yY2lvcyBlIHJlZGVzID9zIHF1YWlzIGEgYmlibGlvdGVjYSBkYSBQVUNSUyBwb3NzYSBhIHZpciBwYXJ0aWNpcGFyLCBzZW0gP251cyBhbHVzaXZvIGFvcyBkaXJlaXRvcyBhdXRvcmFpcywgYSB0P3R1bG8gZGUgZGl2dWxnYT8/byBkYSBwcm9kdT8/byBjaWVudD9maWNhLgo=Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.pucrs.br/tede2/PRIhttps://tede2.pucrs.br/oai/requestbiblioteca.central@pucrs.br\|\|opendoar:2021-11-24T22:00:30Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)false
dc.title.por.fl_str_mv	Extração de relação entre entidades nomeadas no contexto econômico-financeiro
title	Extração de relação entre entidades nomeadas no contexto econômico-financeiro
spellingShingle	Extração de relação entre entidades nomeadas no contexto econômico-financeiro Reyes, Daniel Alessandro Guimarães de los Extração de Relação Extração de Relação Financeira de Entidade Nomeada Extração de Relação Semântica Processamento de Linguagem Natural Relation Extraction Financial Named-Entity Relation Extraction Semantic Eelation Extraction Natural Language Processing CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
title_short	Extração de relação entre entidades nomeadas no contexto econômico-financeiro
title_full	Extração de relação entre entidades nomeadas no contexto econômico-financeiro
title_fullStr	Extração de relação entre entidades nomeadas no contexto econômico-financeiro
title_full_unstemmed	Extração de relação entre entidades nomeadas no contexto econômico-financeiro
title_sort	Extração de relação entre entidades nomeadas no contexto econômico-financeiro
author	Reyes, Daniel Alessandro Guimarães de los
author_facet	Reyes, Daniel Alessandro Guimarães de los
author_role	author
dc.contributor.advisor1.fl_str_mv	Manssour, Isabel Harb
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/4904489502853690
dc.contributor.author.fl_str_mv	Reyes, Daniel Alessandro Guimarães de los
contributor_str_mv	Manssour, Isabel Harb
dc.subject.por.fl_str_mv	Extração de Relação Extração de Relação Financeira de Entidade Nomeada Extração de Relação Semântica Processamento de Linguagem Natural
topic	Extração de Relação Extração de Relação Financeira de Entidade Nomeada Extração de Relação Semântica Processamento de Linguagem Natural Relation Extraction Financial Named-Entity Relation Extraction Semantic Eelation Extraction Natural Language Processing CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	Relation Extraction Financial Named-Entity Relation Extraction Semantic Eelation Extraction Natural Language Processing
dc.subject.cnpq.fl_str_mv	CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
description	Competitive Intelligence (CI) is a relevant area of a corporation and can support the strategic business area, helping those responsible for decision making and how to position your organization in the market. In the financial domain, identifying the organizations contained in a news story can become insufficient, and it is also necessary to extract relations (ER) between entities. Therefore, the main goal of this work is to propose an approach for the extraction of any semantic relation between Named Entities (NEs) in the Financial Market domain for the Portuguese language. To achieve this goal, a state-of-the-art review was initially carried out, which led to the analysis of 76 articles to identify techniques and datasets used to assess them. This study shows that there are readings for the RE task in Portuguese language. Therefore, following the methodology of Knowledge Discovery in Databases (KDD) created by Fayyad, we proposed a five-step approach, which goes from collecting data to evaluating the results. This approach uses two models based on Bidirectional Transformer Encoding Representations (BERT) to process a sentence and its named entities. We first classify whether or not a given pair of entities has a semantic relation and then extract the sentence parts representing or describing the semantic relation between these named entities. The approach was developed for the Portuguese language, considering the financial domain and exploring deep linguistic representations without using other lexical-semantic resources. The results of the experiments show an accuracy of 76.3% using the Jaccard metric, which measures the similarity between the relations extracted by the extractor model, in addition to achieving scores of 87%, 84.5% and 85.8%, respectively for the Recall, Precision and F-Measure metrics when assessing the complete approach. Another important contribution is the manually built corpus with more than 9,114 tuples (phrase, entity, entity) annotated from tweets and news provided by CI analysts to support the decision.
publishDate	2021
dc.date.accessioned.fl_str_mv	2021-11-24T11:25:17Z
dc.date.issued.fl_str_mv	2021-08-30
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://tede2.pucrs.br/tede2/handle/tede/9970
url	http://tede2.pucrs.br/tede2/handle/tede/9970
dc.language.iso.fl_str_mv	por
language	por
dc.relation.program.fl_str_mv	-4570527706994352458
dc.relation.confidence.fl_str_mv	500 500 600
dc.relation.cnpq.fl_str_mv	-862078257083325301
dc.relation.sponsorship.fl_str_mv	3590462550136975366
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Pontifícia Universidade Católica do Rio Grande do Sul
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv	PUCRS
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	Escola Politécnica
publisher.none.fl_str_mv	Pontifícia Universidade Católica do Rio Grande do Sul
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da PUC_RS instname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS) instacron:PUC_RS
instname_str	Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron_str	PUC_RS
institution	PUC_RS
reponame_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
collection	Biblioteca Digital de Teses e Dissertações da PUC_RS
bitstream.url.fl_str_mv	http://tede2.pucrs.br/tede2/bitstream/tede/9970/4/DANIEL+ALESSANDRO+GUIMAR%C3%83ES+DE+LOS+REYES_DIS.pdf.jpg http://tede2.pucrs.br/tede2/bitstream/tede/9970/3/DANIEL+ALESSANDRO+GUIMAR%C3%83ES+DE+LOS+REYES_DIS.pdf.txt http://tede2.pucrs.br/tede2/bitstream/tede/9970/2/DANIEL+ALESSANDRO+GUIMAR%C3%83ES+DE+LOS+REYES_DIS.pdf http://tede2.pucrs.br/tede2/bitstream/tede/9970/1/license.txt
bitstream.checksum.fl_str_mv	2143ef5e93f9f415cd619d5b488e27fe ede5390230a7a4ff074061dbf0ac990e 761da9e6e646f285a0d58da6103f97ca 220e11f2d3ba5354f917c7035aadef24
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
repository.mail.fl_str_mv	biblioteca.central@pucrs.br\|\|
_version_	1799765352796651520

Extração de relação entre entidades nomeadas no contexto econômico-financeiro

Registros relacionados