Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/131653 |
Resumo: | Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics |
id |
RCAP_970705b702df77791633bf2b21ad64b2 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/131653 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Portuguese patent classification: A use case of text classification using machine learning and transfer learning approachesNatural Language Processing (NLP)Text MiningPatent classificationTransfer LearningBi-directional Encoder Representations for Transformers (BERT)Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsPatent classification is one of the areas in Intellectual Property Analytics (IPA), and a growing use case since the number of patent applications has been increasing through the years worldwide. Patents are more than ever being used as financial protection for companies that also use patent databases to raise researches and leverage product innovations. Instituto Nacional de Propriedade Industrial, INPI, is the government agency responsible for protecting Industrial Property rights in Portugal. INPI has promoted a competition to explore technologies to solve some challenges related to Industrial Properties, including the classification of patents, one of the critical phases of the grant patent process. In this work project, we used the dataset put available by INPI to explore traditional machine learning algorithms to classify Portuguese patents and evaluate the performance of transfer learning methodologies to solve this task. BERTTimbau, a BERT architecture model pre-trained on a large Portuguese corpus, presented the best results to the task, even though with a performance only 4% superior to a LinearSVC model using TF-IDF feature engineering. In general, the model presents a good performance, despite the low score when classes had few training samples. However, the analysis of misclassified samples showed that the specificity of the context has more influence on the learning than the number of samples itself. Patent classification is a challenging task not just because of 1) the hierarchical structure of the classification but also because of 2) the way a patent is described, 3) the overlap of the contexts, and 4) the underrepresentation of the classes. Nevertheless, it is an area of growing interest, and that can be leveraged by the new researches that are revolutionizing machine learning applications, especially text mining.Henriques, Roberto André PereiraRUNÁdria Lidiane de Oliveira Alves Ferreira2022-01-27T16:40:29Z2021-12-032021-12-03T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/131653TID:202834140enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:10:16Zoai:run.unl.pt:10362/131653Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:47:09.474789Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches |
title |
Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches |
spellingShingle |
Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches Ádria Lidiane de Oliveira Alves Ferreira Natural Language Processing (NLP) Text Mining Patent classification Transfer Learning Bi-directional Encoder Representations for Transformers (BERT) |
title_short |
Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches |
title_full |
Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches |
title_fullStr |
Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches |
title_full_unstemmed |
Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches |
title_sort |
Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches |
author |
Ádria Lidiane de Oliveira Alves Ferreira |
author_facet |
Ádria Lidiane de Oliveira Alves Ferreira |
author_role |
author |
dc.contributor.none.fl_str_mv |
Henriques, Roberto André Pereira RUN |
dc.contributor.author.fl_str_mv |
Ádria Lidiane de Oliveira Alves Ferreira |
dc.subject.por.fl_str_mv |
Natural Language Processing (NLP) Text Mining Patent classification Transfer Learning Bi-directional Encoder Representations for Transformers (BERT) |
topic |
Natural Language Processing (NLP) Text Mining Patent classification Transfer Learning Bi-directional Encoder Representations for Transformers (BERT) |
description |
Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-12-03 2021-12-03T00:00:00Z 2022-01-27T16:40:29Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/131653 TID:202834140 |
url |
http://hdl.handle.net/10362/131653 |
identifier_str_mv |
TID:202834140 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138075038711808 |