Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches

Detalhes bibliográficos
Autor(a) principal: Ádria Lidiane de Oliveira Alves Ferreira
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/131653
Resumo: Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
id RCAP_970705b702df77791633bf2b21ad64b2
oai_identifier_str oai:run.unl.pt:10362/131653
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Portuguese patent classification: A use case of text classification using machine learning and transfer learning approachesNatural Language Processing (NLP)Text MiningPatent classificationTransfer LearningBi-directional Encoder Representations for Transformers (BERT)Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsPatent classification is one of the areas in Intellectual Property Analytics (IPA), and a growing use case since the number of patent applications has been increasing through the years worldwide. Patents are more than ever being used as financial protection for companies that also use patent databases to raise researches and leverage product innovations. Instituto Nacional de Propriedade Industrial, INPI, is the government agency responsible for protecting Industrial Property rights in Portugal. INPI has promoted a competition to explore technologies to solve some challenges related to Industrial Properties, including the classification of patents, one of the critical phases of the grant patent process. In this work project, we used the dataset put available by INPI to explore traditional machine learning algorithms to classify Portuguese patents and evaluate the performance of transfer learning methodologies to solve this task. BERTTimbau, a BERT architecture model pre-trained on a large Portuguese corpus, presented the best results to the task, even though with a performance only 4% superior to a LinearSVC model using TF-IDF feature engineering. In general, the model presents a good performance, despite the low score when classes had few training samples. However, the analysis of misclassified samples showed that the specificity of the context has more influence on the learning than the number of samples itself. Patent classification is a challenging task not just because of 1) the hierarchical structure of the classification but also because of 2) the way a patent is described, 3) the overlap of the contexts, and 4) the underrepresentation of the classes. Nevertheless, it is an area of growing interest, and that can be leveraged by the new researches that are revolutionizing machine learning applications, especially text mining.Henriques, Roberto André PereiraRUNÁdria Lidiane de Oliveira Alves Ferreira2022-01-27T16:40:29Z2021-12-032021-12-03T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/131653TID:202834140enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:10:16Zoai:run.unl.pt:10362/131653Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:47:09.474789Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches
title Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches
spellingShingle Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches
Ádria Lidiane de Oliveira Alves Ferreira
Natural Language Processing (NLP)
Text Mining
Patent classification
Transfer Learning
Bi-directional Encoder Representations for Transformers (BERT)
title_short Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches
title_full Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches
title_fullStr Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches
title_full_unstemmed Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches
title_sort Portuguese patent classification: A use case of text classification using machine learning and transfer learning approaches
author Ádria Lidiane de Oliveira Alves Ferreira
author_facet Ádria Lidiane de Oliveira Alves Ferreira
author_role author
dc.contributor.none.fl_str_mv Henriques, Roberto André Pereira
RUN
dc.contributor.author.fl_str_mv Ádria Lidiane de Oliveira Alves Ferreira
dc.subject.por.fl_str_mv Natural Language Processing (NLP)
Text Mining
Patent classification
Transfer Learning
Bi-directional Encoder Representations for Transformers (BERT)
topic Natural Language Processing (NLP)
Text Mining
Patent classification
Transfer Learning
Bi-directional Encoder Representations for Transformers (BERT)
description Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
publishDate 2021
dc.date.none.fl_str_mv 2021-12-03
2021-12-03T00:00:00Z
2022-01-27T16:40:29Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/131653
TID:202834140
url http://hdl.handle.net/10362/131653
identifier_str_mv TID:202834140
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138075038711808