Augmenting product knowledge graphs with subjective information
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFPE |
Texto Completo: | https://repositorio.ufpe.br/handle/123456789/49464 |
Resumo: | Product Graphs (PGs), are knowledge graphs on consumer product data. They have become popular lately due to their potential to enable AI-related tasks in e-commerce. PGs contain facts on products (e.g., mobile phones) and their characteristics (e.g., brand, dimensions, and processor) automatically gathered from several sources. Enriching these structures with dynamic and subjective information, such as users’ opinions, is essential for improving recommendations, searching, comparison, and pricing. However, this is a novel task, and works trying to handle this are based on supervised approaches. In this thesis, we address this task by exploring two complementary stages: (1) We build a weak-supervised pipeline called Product Graph enriched with Opinions (PGOpi) which augments PGs with users’ opinions extracted from product reviews. For that, we explore a traditional method for opinion mining, Distant Supervision based on word embeddings to alleviate manual labor dependency for training, and Deep Learning approaches to map extracted opinions to targets in the PG; (2) We devised SYNthetiC OPinionAted TriplEs (SYNCOPATE), a generator that autonomously builds opinionated triples and can replace traditional methods for extracting aspect-opinion pairs from opinionated reviews. We build it by exploring In-Context Learning on an adapted pretrained Language Model. Finally, we apply post-processing to clean up and label the autonomously generated text. We perform the experimental evaluation of both frameworks. We evaluated PGOpi on five product categories of two representative real-world datasets. The proposed weak-supervised approach achieves a superior micro F1 score over more complex weak-supervised models. It also presents comparable results to a fully-supervised state-ofthe-art (SOTA) model. We evaluated SYNCOPATE by augmenting existing benchmark datasets with the generated data and comparing the performance of four SOTA models on aspect-opinion pair extraction. The results show that the models trained on the generated synthetic data outperform those trained on a small percentage of human-labeled data. Furthermore, three human raters’ manual inspection of these triples attested to their quality. |
id |
UFPE_e6a1845d37928f9d4f1d9effa790e537 |
---|---|
oai_identifier_str |
oai:repositorio.ufpe.br:123456789/49464 |
network_acronym_str |
UFPE |
network_name_str |
Repositório Institucional da UFPE |
repository_id_str |
2221 |
spelling |
SILVA, Johny Moreira dahttp://lattes.cnpq.br/0022427692093493http://lattes.cnpq.br/7113249247656195BARBOSA, Luciano de Andrade2023-03-23T16:50:56Z2023-03-23T16:50:56Z2023-03-02SILVA, Johny Moreira da. Augmenting product knowledge graphs with subjective information. 2023. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.https://repositorio.ufpe.br/handle/123456789/49464Product Graphs (PGs), are knowledge graphs on consumer product data. They have become popular lately due to their potential to enable AI-related tasks in e-commerce. PGs contain facts on products (e.g., mobile phones) and their characteristics (e.g., brand, dimensions, and processor) automatically gathered from several sources. Enriching these structures with dynamic and subjective information, such as users’ opinions, is essential for improving recommendations, searching, comparison, and pricing. However, this is a novel task, and works trying to handle this are based on supervised approaches. In this thesis, we address this task by exploring two complementary stages: (1) We build a weak-supervised pipeline called Product Graph enriched with Opinions (PGOpi) which augments PGs with users’ opinions extracted from product reviews. For that, we explore a traditional method for opinion mining, Distant Supervision based on word embeddings to alleviate manual labor dependency for training, and Deep Learning approaches to map extracted opinions to targets in the PG; (2) We devised SYNthetiC OPinionAted TriplEs (SYNCOPATE), a generator that autonomously builds opinionated triples and can replace traditional methods for extracting aspect-opinion pairs from opinionated reviews. We build it by exploring In-Context Learning on an adapted pretrained Language Model. Finally, we apply post-processing to clean up and label the autonomously generated text. We perform the experimental evaluation of both frameworks. We evaluated PGOpi on five product categories of two representative real-world datasets. The proposed weak-supervised approach achieves a superior micro F1 score over more complex weak-supervised models. It also presents comparable results to a fully-supervised state-ofthe-art (SOTA) model. We evaluated SYNCOPATE by augmenting existing benchmark datasets with the generated data and comparing the performance of four SOTA models on aspect-opinion pair extraction. The results show that the models trained on the generated synthetic data outperform those trained on a small percentage of human-labeled data. Furthermore, three human raters’ manual inspection of these triples attested to their quality.CAPESGrafos de Produto, do inglês Product Graphs (PGs), são grafos de conhecimento com dados sobre produtos de consumo. Essas estruturas têm o potencial de facilitar tarefas de Inteligência Artificial no comércio eletrônico. Os PGs armazenam dados factuais sobre produtos (ex: smartphones) e suas características (ex: marca, dimensões, e processador) coletados de diversas fontes. O enriquecimento dessas estruturas com informações dinâmicas e subjetivas, como opiniões de usuários, pode contribuir para a melhoria dessas tarefas. No entanto, esta é uma nova tarefa e os trabalhos existentes são baseados em abordagens supervisionadas. Neste trabalho de tese nós abordamos essa tarefa por meio de duas etapas complementares: (1) Nós desenvolvemos uma abordagem semi-supervisionada chamada Product Graph enriched with Opinions (PGOpi) para enriquecimento de PGs com opiniões extraídas de avaliações de clientes. Para isso, exploramos mineração de opinião, Supervisão Distante baseada em representação de palavras para mitigar a dependência na rotulagem manual de dados de treino, e utilizamos Aprendizagem Profunda para mapear as opiniões extraídas até os nós do PG; (2) Nós construímos um gerador de triplas opinativas chamado SYNthetiC OPinionAted TriplEs (SYNCOPATE) que pode substituir métodos tradicionais para extração pareada de aspectos e opiniões em avaliações de produtos. Para construí-lo realizamos In-Context Learning em um Modelo de Linguagem pré-treinado e adaptado. Nós efetuamos a avaliação experimental das duas etapas. O PGOpi foi avaliado em cinco categorias de produtos de duas plataformas de e-commerce. O PGOpi alcançou valores de micro F1-score superiores a modelos semi-supervisionados mais complexos, e apresentou performance comparável a um modelo de estado-da-arte totalmente supervisionado. O SYNCOPATE foi avaliado aumentando bases de treino de benchmarking com as triplas opinativas geradas sinteticamente. Quatro modelos de estado-da-arte para extração pareada de aspectos e opiniões foram treinados com esses dados sintéticos e avaliados. Os resultados obtidos mostraram que os modelos treinados com dados sintéticos apresentaram performance superior àqueles treinados em uma pequena porcentagem de dados rotulados e curados por humanos. Três avaliadores humanos atestaram a qualidade das triplas geradas sinteticamente.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessInteligência computacionalAprendizagemAugmenting product knowledge graphs with subjective informationinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisdoutoradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPEORIGINALTESE Johny Moreira da Silva.pdfTESE Johny Moreira da Silva.pdfapplication/pdf4081928https://repositorio.ufpe.br/bitstream/123456789/49464/1/TESE%20Johny%20Moreira%20da%20Silva.pdfa7b65e521dab8124c14ee1e19210fd2dMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/49464/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82362https://repositorio.ufpe.br/bitstream/123456789/49464/3/license.txt5e89a1613ddc8510c6576f4b23a78973MD53TEXTTESE Johny Moreira da Silva.pdf.txtTESE Johny Moreira da Silva.pdf.txtExtracted texttext/plain282649https://repositorio.ufpe.br/bitstream/123456789/49464/4/TESE%20Johny%20Moreira%20da%20Silva.pdf.txt5d3c5a2c933e85ae77b06f4963f6df2fMD54THUMBNAILTESE Johny Moreira da Silva.pdf.jpgTESE Johny Moreira da Silva.pdf.jpgGenerated Thumbnailimage/jpeg1208https://repositorio.ufpe.br/bitstream/123456789/49464/5/TESE%20Johny%20Moreira%20da%20Silva.pdf.jpgba1dbb930031d658ca5b878e9e100cd5MD55123456789/494642023-03-24 02:16:21.818oai:repositorio.ufpe.br:123456789/49464VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2l6YcOnw6NvIGRlIERvY3VtZW50b3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRQoKCkRlY2xhcm8gZXN0YXIgY2llbnRlIGRlIHF1ZSBlc3RlIFRlcm1vIGRlIERlcMOzc2l0byBMZWdhbCBlIEF1dG9yaXphw6fDo28gdGVtIG8gb2JqZXRpdm8gZGUgZGl2dWxnYcOnw6NvIGRvcyBkb2N1bWVudG9zIGRlcG9zaXRhZG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgZSBkZWNsYXJvIHF1ZToKCkkgLSBvcyBkYWRvcyBwcmVlbmNoaWRvcyBubyBmb3JtdWzDoXJpbyBkZSBkZXDDs3NpdG8gc8OjbyB2ZXJkYWRlaXJvcyBlIGF1dMOqbnRpY29zOwoKSUkgLSAgbyBjb250ZcO6ZG8gZGlzcG9uaWJpbGl6YWRvIMOpIGRlIHJlc3BvbnNhYmlsaWRhZGUgZGUgc3VhIGF1dG9yaWE7CgpJSUkgLSBvIGNvbnRlw7pkbyDDqSBvcmlnaW5hbCwgZSBzZSBvIHRyYWJhbGhvIGUvb3UgcGFsYXZyYXMgZGUgb3V0cmFzIHBlc3NvYXMgZm9yYW0gdXRpbGl6YWRvcywgZXN0YXMgZm9yYW0gZGV2aWRhbWVudGUgcmVjb25oZWNpZGFzOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIG9icmEgY29sZXRpdmEgKG1haXMgZGUgdW0gYXV0b3IpOiB0b2RvcyBvcyBhdXRvcmVzIGVzdMOjbyBjaWVudGVzIGRvIGRlcMOzc2l0byBlIGRlIGFjb3JkbyBjb20gZXN0ZSB0ZXJtbzsKClYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogbyBhcnF1aXZvIGRlcG9zaXRhZG8gY29ycmVzcG9uZGUgw6AgdmVyc8OjbyBmaW5hbCBkbyB0cmFiYWxobzsKClZJIC0gcXVhbmRvIHRyYXRhci1zZSBkZSBUcmFiYWxobyBkZSBDb25jbHVzw6NvIGRlIEN1cnNvLCBEaXNzZXJ0YcOnw6NvIG91IFRlc2U6IGVzdG91IGNpZW50ZSBkZSBxdWUgYSBhbHRlcmHDp8OjbyBkYSBtb2RhbGlkYWRlIGRlIGFjZXNzbyBhbyBkb2N1bWVudG8gYXDDs3MgbyBkZXDDs3NpdG8gZSBhbnRlcyBkZSBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvLCBxdWFuZG8gZm9yIGVzY29saGlkbyBhY2Vzc28gcmVzdHJpdG8sIHNlcsOhIHBlcm1pdGlkYSBtZWRpYW50ZSBzb2xpY2l0YcOnw6NvIGRvIChhKSBhdXRvciAoYSkgYW8gU2lzdGVtYSBJbnRlZ3JhZG8gZGUgQmlibGlvdGVjYXMgZGEgVUZQRSAoU0lCL1VGUEUpLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gQWJlcnRvOgoKTmEgcXVhbGlkYWRlIGRlIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRlIGF1dG9yIHF1ZSByZWNhZW0gc29icmUgZXN0ZSBkb2N1bWVudG8sIGZ1bmRhbWVudGFkbyBuYSBMZWkgZGUgRGlyZWl0byBBdXRvcmFsIG5vIDkuNjEwLCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIGEgcGFydGlyIGRhIGRhdGEgZGUgZGVww7NzaXRvLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gUmVzdHJpdG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAgZGUgMTkgZGUgZmV2ZXJlaXJvIGRlIDE5OTgsIGFydC4gMjksIGluY2lzbyBJSUksIGF1dG9yaXpvIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgUGVybmFtYnVjbyBhIGRpc3BvbmliaWxpemFyIGdyYXR1aXRhbWVudGUsIHNlbSByZXNzYXJjaW1lbnRvIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgcGFyYSBmaW5zIGRlIGxlaXR1cmEsIGltcHJlc3PDo28gZS9vdSBkb3dubG9hZCAoYXF1aXNpw6fDo28pIGF0cmF2w6lzIGRvIHNpdGUgZG8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBubyBlbmRlcmXDp28gaHR0cDovL3d3dy5yZXBvc2l0b3Jpby51ZnBlLmJyLCBxdWFuZG8gZmluZGFyIG8gcGVyw61vZG8gZGUgZW1iYXJnbyBjb25kaXplbnRlIGFvIHRpcG8gZGUgZG9jdW1lbnRvLCBjb25mb3JtZSBpbmRpY2FkbyBubyBjYW1wbyBEYXRhIGRlIEVtYmFyZ28uCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212023-03-24T05:16:21Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false |
dc.title.pt_BR.fl_str_mv |
Augmenting product knowledge graphs with subjective information |
title |
Augmenting product knowledge graphs with subjective information |
spellingShingle |
Augmenting product knowledge graphs with subjective information SILVA, Johny Moreira da Inteligência computacional Aprendizagem |
title_short |
Augmenting product knowledge graphs with subjective information |
title_full |
Augmenting product knowledge graphs with subjective information |
title_fullStr |
Augmenting product knowledge graphs with subjective information |
title_full_unstemmed |
Augmenting product knowledge graphs with subjective information |
title_sort |
Augmenting product knowledge graphs with subjective information |
author |
SILVA, Johny Moreira da |
author_facet |
SILVA, Johny Moreira da |
author_role |
author |
dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/0022427692093493 |
dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/7113249247656195 |
dc.contributor.author.fl_str_mv |
SILVA, Johny Moreira da |
dc.contributor.advisor1.fl_str_mv |
BARBOSA, Luciano de Andrade |
contributor_str_mv |
BARBOSA, Luciano de Andrade |
dc.subject.por.fl_str_mv |
Inteligência computacional Aprendizagem |
topic |
Inteligência computacional Aprendizagem |
description |
Product Graphs (PGs), are knowledge graphs on consumer product data. They have become popular lately due to their potential to enable AI-related tasks in e-commerce. PGs contain facts on products (e.g., mobile phones) and their characteristics (e.g., brand, dimensions, and processor) automatically gathered from several sources. Enriching these structures with dynamic and subjective information, such as users’ opinions, is essential for improving recommendations, searching, comparison, and pricing. However, this is a novel task, and works trying to handle this are based on supervised approaches. In this thesis, we address this task by exploring two complementary stages: (1) We build a weak-supervised pipeline called Product Graph enriched with Opinions (PGOpi) which augments PGs with users’ opinions extracted from product reviews. For that, we explore a traditional method for opinion mining, Distant Supervision based on word embeddings to alleviate manual labor dependency for training, and Deep Learning approaches to map extracted opinions to targets in the PG; (2) We devised SYNthetiC OPinionAted TriplEs (SYNCOPATE), a generator that autonomously builds opinionated triples and can replace traditional methods for extracting aspect-opinion pairs from opinionated reviews. We build it by exploring In-Context Learning on an adapted pretrained Language Model. Finally, we apply post-processing to clean up and label the autonomously generated text. We perform the experimental evaluation of both frameworks. We evaluated PGOpi on five product categories of two representative real-world datasets. The proposed weak-supervised approach achieves a superior micro F1 score over more complex weak-supervised models. It also presents comparable results to a fully-supervised state-ofthe-art (SOTA) model. We evaluated SYNCOPATE by augmenting existing benchmark datasets with the generated data and comparing the performance of four SOTA models on aspect-opinion pair extraction. The results show that the models trained on the generated synthetic data outperform those trained on a small percentage of human-labeled data. Furthermore, three human raters’ manual inspection of these triples attested to their quality. |
publishDate |
2023 |
dc.date.accessioned.fl_str_mv |
2023-03-23T16:50:56Z |
dc.date.available.fl_str_mv |
2023-03-23T16:50:56Z |
dc.date.issued.fl_str_mv |
2023-03-02 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
SILVA, Johny Moreira da. Augmenting product knowledge graphs with subjective information. 2023. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufpe.br/handle/123456789/49464 |
identifier_str_mv |
SILVA, Johny Moreira da. Augmenting product knowledge graphs with subjective information. 2023. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023. |
url |
https://repositorio.ufpe.br/handle/123456789/49464 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.publisher.program.fl_str_mv |
Programa de Pos Graduacao em Ciencia da Computacao |
dc.publisher.initials.fl_str_mv |
UFPE |
dc.publisher.country.fl_str_mv |
Brasil |
publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE |
instname_str |
Universidade Federal de Pernambuco (UFPE) |
instacron_str |
UFPE |
institution |
UFPE |
reponame_str |
Repositório Institucional da UFPE |
collection |
Repositório Institucional da UFPE |
bitstream.url.fl_str_mv |
https://repositorio.ufpe.br/bitstream/123456789/49464/1/TESE%20Johny%20Moreira%20da%20Silva.pdf https://repositorio.ufpe.br/bitstream/123456789/49464/2/license_rdf https://repositorio.ufpe.br/bitstream/123456789/49464/3/license.txt https://repositorio.ufpe.br/bitstream/123456789/49464/4/TESE%20Johny%20Moreira%20da%20Silva.pdf.txt https://repositorio.ufpe.br/bitstream/123456789/49464/5/TESE%20Johny%20Moreira%20da%20Silva.pdf.jpg |
bitstream.checksum.fl_str_mv |
a7b65e521dab8124c14ee1e19210fd2d e39d27027a6cc9cb039ad269a5db8e34 5e89a1613ddc8510c6576f4b23a78973 5d3c5a2c933e85ae77b06f4963f6df2f ba1dbb930031d658ca5b878e9e100cd5 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE) |
repository.mail.fl_str_mv |
attena@ufpe.br |
_version_ |
1802310760526774272 |