Triplet extraction leveraging sentence transformers and dependency parsing

Detalhes bibliográficos
Autor(a) principal: Ottersen, Stuart Gallina
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/149856
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
id RCAP_3ce5c657d69a974d8679e9ac7c7d5176
oai_identifier_str oai:run.unl.pt:10362/149856
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Triplet extraction leveraging sentence transformers and dependency parsingTriple extractionNatural language processingKnowledge GraphSDG 8 - Decent work and economic growthDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceKnowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.Bação, Fernando José Ferreira LucasPinheiro, Flávio Luís PortasRUNOttersen, Stuart Gallina2024-01-26T01:31:45Z2023-01-262023-01-26T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/149856TID:203239911enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:31:44Zoai:run.unl.pt:10362/149856Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:53:53.409665Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Triplet extraction leveraging sentence transformers and dependency parsing
title Triplet extraction leveraging sentence transformers and dependency parsing
spellingShingle Triplet extraction leveraging sentence transformers and dependency parsing
Ottersen, Stuart Gallina
Triple extraction
Natural language processing
Knowledge Graph
SDG 8 - Decent work and economic growth
title_short Triplet extraction leveraging sentence transformers and dependency parsing
title_full Triplet extraction leveraging sentence transformers and dependency parsing
title_fullStr Triplet extraction leveraging sentence transformers and dependency parsing
title_full_unstemmed Triplet extraction leveraging sentence transformers and dependency parsing
title_sort Triplet extraction leveraging sentence transformers and dependency parsing
author Ottersen, Stuart Gallina
author_facet Ottersen, Stuart Gallina
author_role author
dc.contributor.none.fl_str_mv Bação, Fernando José Ferreira Lucas
Pinheiro, Flávio Luís Portas
RUN
dc.contributor.author.fl_str_mv Ottersen, Stuart Gallina
dc.subject.por.fl_str_mv Triple extraction
Natural language processing
Knowledge Graph
SDG 8 - Decent work and economic growth
topic Triple extraction
Natural language processing
Knowledge Graph
SDG 8 - Decent work and economic growth
description Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
publishDate 2023
dc.date.none.fl_str_mv 2023-01-26
2023-01-26T00:00:00Z
2024-01-26T01:31:45Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/149856
TID:203239911
url http://hdl.handle.net/10362/149856
identifier_str_mv TID:203239911
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138128971169792