BERT Mapper: An Entity Linking Method for Patent Text

Detalhes bibliográficos
Autor(a) principal: Pais, Nuno David Ribeiro
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/148603
Resumo: Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
id RCAP_8ea44dd15bcbe2d1867fe20b3e609528
oai_identifier_str oai:run.unl.pt:10362/148603
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling BERT Mapper: An Entity Linking Method for Patent TextNatural Language ProcessingEntity LinkingDeep LearningMilvusBERTDomínio/Área Científica::Ciências Naturais::Ciências da Computação e da InformaçãoProject Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsTo assess the evolution of technological trends in past decades, the data science team at the European Patent Office aimed at the development of an interactive dashboard tracking the mentions of technologies in patent texts. To improve information quality and avoid cluttering the dashboard visualizations with “noisy” and synonymic keywords, an entity linking system was devised. Thus, the system described in this project inserts itself in the sub-field of Entity Linking, under Natural Language Processing. Its goal was extracting the most important technology-related keywords stemming from patent abstracts and titles, assigning them to an entity in the Wikipedia knowledge base. This way, only the matched entity and not the extracted keyword would be showcased in the final dashboard. This entity linking system distinguishes itself from other methods in the state-of-the-art, generating contextually meaningful entity vectors using BERT, by only extracting and averaging the token vectors corresponding to the entity’s surface form, across the entire knowledge base. It is also the first time that such a system has been applied to the context of patent information, whose linguistic characteristics are unique from other fields. Its main objectives were noise reduction and mapping improvements, particularly in solving disambiguation, overcoming the weaknesses of the system in production. The aforementioned methodology computed vectors which, given the specificity of the downstream task, outperformed the ones calculated using SBERT. This simple yet effective vector generation methodology is the backbone of the full entity linking system proposed in this work, which achieved results that outperformed the baseline evaluation scenarios, such as the system currently in production and DBpedia Spotlight, more than doubling its mapping precision.Bação, Fernando José Ferreira LucasLassoued, YassineRUNPais, Nuno David Ribeiro2023-02-03T10:27:44Z2023-01-242023-01-24T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/148603TID:203218868enginfo:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:30:16Zoai:run.unl.pt:10362/148603Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:53:26.259721Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv BERT Mapper: An Entity Linking Method for Patent Text
title BERT Mapper: An Entity Linking Method for Patent Text
spellingShingle BERT Mapper: An Entity Linking Method for Patent Text
Pais, Nuno David Ribeiro
Natural Language Processing
Entity Linking
Deep Learning
Milvus
BERT
Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação
title_short BERT Mapper: An Entity Linking Method for Patent Text
title_full BERT Mapper: An Entity Linking Method for Patent Text
title_fullStr BERT Mapper: An Entity Linking Method for Patent Text
title_full_unstemmed BERT Mapper: An Entity Linking Method for Patent Text
title_sort BERT Mapper: An Entity Linking Method for Patent Text
author Pais, Nuno David Ribeiro
author_facet Pais, Nuno David Ribeiro
author_role author
dc.contributor.none.fl_str_mv Bação, Fernando José Ferreira Lucas
Lassoued, Yassine
RUN
dc.contributor.author.fl_str_mv Pais, Nuno David Ribeiro
dc.subject.por.fl_str_mv Natural Language Processing
Entity Linking
Deep Learning
Milvus
BERT
Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação
topic Natural Language Processing
Entity Linking
Deep Learning
Milvus
BERT
Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação
description Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
publishDate 2023
dc.date.none.fl_str_mv 2023-02-03T10:27:44Z
2023-01-24
2023-01-24T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/148603
TID:203218868
url http://hdl.handle.net/10362/148603
identifier_str_mv TID:203218868
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv embargoedAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138125158547456