A Use Case of Patent Classification Using Deep Learning with Transfer Learning

Detalhes bibliográficos
Autor(a) principal: Henriques, Roberto
Data de Publicação: 2022
Outros Autores: Ferreira, Adria, Castelli, Mauro
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/143348
Resumo: Henriques, R., Ferreira, A., & Castelli, M. (2022). A Use Case of Patent Classification Using Deep Learning with Transfer Learning. Journal of Data and Information Science, 7(3), 49-70. https://doi.org/10.2478/jdis-2022-0015 ----- Funding Information: This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project - UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.
id RCAP_438fc63cb5f5ed839c2a3f7cee411ade
oai_identifier_str oai:run.unl.pt:10362/143348
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A Use Case of Patent Classification Using Deep Learning with Transfer LearningBi-directional Encoder Representations for Transformers (BERT)Natural Language Processing (NLP)Patent classificationTransfer LearningPublic AdministrationLibrary and Information SciencesInformation Systems and ManagementHenriques, R., Ferreira, A., & Castelli, M. (2022). A Use Case of Patent Classification Using Deep Learning with Transfer Learning. Journal of Data and Information Science, 7(3), 49-70. https://doi.org/10.2478/jdis-2022-0015 ----- Funding Information: This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project - UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.Purpose: Patent classification is one of the areas in Intellectual Property Analytics (IPA), and a growing use case since the number of patent applications has been increasing worldwide. We propose using machine learning algorithms to classify Portuguese patents and evaluate the performance of transfer learning methodologies to solve this task. Design/methodology/approach: We applied three different approaches in this paper. First, we used a dataset available by INPI to explore traditional machine learning algorithms and ensemble methods. After preprocessing data by applying TF-IDF, FastText and Doc2Vec, the models were evaluated by cross-validation in 5 folds. In a second approach, we used two different Neural Networks architectures, a Convolutional Neural Network (CNN) and a bi-directional Long Short-Term Memory (BiLSTM). Finally, we used pre-trained BERT, DistilBERT, and ULMFiT models in the third approach. Findings: BERTTimbau, a BERT architecture model pre-trained on a large Portuguese corpus, presented the best results for the task, even though with a performance of only 4% superior to a LinearSVC model using TF-IDF feature engineering. Research limitations: The dataset was highly imbalanced, as usual in patent applications, so the classes with the lowest samples were expected to present the worst performance. That result happened in some cases, especially in classes with less than 60 training samples. Practical implications: Patent classification is challenging because of the hierarchical classification system, the context overlap, and the underrepresentation of the classes. However, the final model presented an acceptable performance given the size of the dataset and the task complexity. This model can support the decision and improve the time by proposing a category in the second level of ICP, which is one of the critical phases of the grant patent process. Originality/value: To our knowledge, the proposed models were never implemented for Portuguese patent classification.Information Management Research Center (MagIC) - NOVA Information Management SchoolNOVA Information Management School (NOVA IMS)RUNHenriques, RobertoFerreira, AdriaCastelli, Mauro2022-08-29T22:32:38Z2022-08-012022-08-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article22application/pdfhttp://hdl.handle.net/10362/143348eng2096-157XPURE: 46198486https://doi.org/10.2478/jdis-2022-0015info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:21:39Zoai:run.unl.pt:10362/143348Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:50:53.734023Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A Use Case of Patent Classification Using Deep Learning with Transfer Learning
title A Use Case of Patent Classification Using Deep Learning with Transfer Learning
spellingShingle A Use Case of Patent Classification Using Deep Learning with Transfer Learning
Henriques, Roberto
Bi-directional Encoder Representations for Transformers (BERT)
Natural Language Processing (NLP)
Patent classification
Transfer Learning
Public Administration
Library and Information Sciences
Information Systems and Management
title_short A Use Case of Patent Classification Using Deep Learning with Transfer Learning
title_full A Use Case of Patent Classification Using Deep Learning with Transfer Learning
title_fullStr A Use Case of Patent Classification Using Deep Learning with Transfer Learning
title_full_unstemmed A Use Case of Patent Classification Using Deep Learning with Transfer Learning
title_sort A Use Case of Patent Classification Using Deep Learning with Transfer Learning
author Henriques, Roberto
author_facet Henriques, Roberto
Ferreira, Adria
Castelli, Mauro
author_role author
author2 Ferreira, Adria
Castelli, Mauro
author2_role author
author
dc.contributor.none.fl_str_mv Information Management Research Center (MagIC) - NOVA Information Management School
NOVA Information Management School (NOVA IMS)
RUN
dc.contributor.author.fl_str_mv Henriques, Roberto
Ferreira, Adria
Castelli, Mauro
dc.subject.por.fl_str_mv Bi-directional Encoder Representations for Transformers (BERT)
Natural Language Processing (NLP)
Patent classification
Transfer Learning
Public Administration
Library and Information Sciences
Information Systems and Management
topic Bi-directional Encoder Representations for Transformers (BERT)
Natural Language Processing (NLP)
Patent classification
Transfer Learning
Public Administration
Library and Information Sciences
Information Systems and Management
description Henriques, R., Ferreira, A., & Castelli, M. (2022). A Use Case of Patent Classification Using Deep Learning with Transfer Learning. Journal of Data and Information Science, 7(3), 49-70. https://doi.org/10.2478/jdis-2022-0015 ----- Funding Information: This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project - UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.
publishDate 2022
dc.date.none.fl_str_mv 2022-08-29T22:32:38Z
2022-08-01
2022-08-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/143348
url http://hdl.handle.net/10362/143348
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2096-157X
PURE: 46198486
https://doi.org/10.2478/jdis-2022-0015
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 22
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138103968923648