BERT Mapper: An Entity Linking Method for Patent Text
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/148603 |
Resumo: | Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics |
id |
RCAP_8ea44dd15bcbe2d1867fe20b3e609528 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/148603 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
BERT Mapper: An Entity Linking Method for Patent TextNatural Language ProcessingEntity LinkingDeep LearningMilvusBERTDomínio/Área Científica::Ciências Naturais::Ciências da Computação e da InformaçãoProject Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsTo assess the evolution of technological trends in past decades, the data science team at the European Patent Office aimed at the development of an interactive dashboard tracking the mentions of technologies in patent texts. To improve information quality and avoid cluttering the dashboard visualizations with “noisy” and synonymic keywords, an entity linking system was devised. Thus, the system described in this project inserts itself in the sub-field of Entity Linking, under Natural Language Processing. Its goal was extracting the most important technology-related keywords stemming from patent abstracts and titles, assigning them to an entity in the Wikipedia knowledge base. This way, only the matched entity and not the extracted keyword would be showcased in the final dashboard. This entity linking system distinguishes itself from other methods in the state-of-the-art, generating contextually meaningful entity vectors using BERT, by only extracting and averaging the token vectors corresponding to the entity’s surface form, across the entire knowledge base. It is also the first time that such a system has been applied to the context of patent information, whose linguistic characteristics are unique from other fields. Its main objectives were noise reduction and mapping improvements, particularly in solving disambiguation, overcoming the weaknesses of the system in production. The aforementioned methodology computed vectors which, given the specificity of the downstream task, outperformed the ones calculated using SBERT. This simple yet effective vector generation methodology is the backbone of the full entity linking system proposed in this work, which achieved results that outperformed the baseline evaluation scenarios, such as the system currently in production and DBpedia Spotlight, more than doubling its mapping precision.Bação, Fernando José Ferreira LucasLassoued, YassineRUNPais, Nuno David Ribeiro2023-02-03T10:27:44Z2023-01-242023-01-24T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/148603TID:203218868enginfo:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:30:16Zoai:run.unl.pt:10362/148603Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:53:26.259721Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
BERT Mapper: An Entity Linking Method for Patent Text |
title |
BERT Mapper: An Entity Linking Method for Patent Text |
spellingShingle |
BERT Mapper: An Entity Linking Method for Patent Text Pais, Nuno David Ribeiro Natural Language Processing Entity Linking Deep Learning Milvus BERT Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação |
title_short |
BERT Mapper: An Entity Linking Method for Patent Text |
title_full |
BERT Mapper: An Entity Linking Method for Patent Text |
title_fullStr |
BERT Mapper: An Entity Linking Method for Patent Text |
title_full_unstemmed |
BERT Mapper: An Entity Linking Method for Patent Text |
title_sort |
BERT Mapper: An Entity Linking Method for Patent Text |
author |
Pais, Nuno David Ribeiro |
author_facet |
Pais, Nuno David Ribeiro |
author_role |
author |
dc.contributor.none.fl_str_mv |
Bação, Fernando José Ferreira Lucas Lassoued, Yassine RUN |
dc.contributor.author.fl_str_mv |
Pais, Nuno David Ribeiro |
dc.subject.por.fl_str_mv |
Natural Language Processing Entity Linking Deep Learning Milvus BERT Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação |
topic |
Natural Language Processing Entity Linking Deep Learning Milvus BERT Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação |
description |
Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-02-03T10:27:44Z 2023-01-24 2023-01-24T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/148603 TID:203218868 |
url |
http://hdl.handle.net/10362/148603 |
identifier_str_mv |
TID:203218868 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/embargoedAccess |
eu_rights_str_mv |
embargoedAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138125158547456 |