Development of an information retrieval tool for biomedical patents

Detalhes bibliográficos
Autor(a) principal: Alves, T.
Data de Publicação: 2018
Outros Autores: Rodrigues, Rúben, Hugo Costa, Rocha, Miguel
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/1822/53766
Resumo: Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.cmpb.2018.03.012 .
id RCAP_f55dacee3268ff8024f1b8a6b992c7b3
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/53766
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Development of an information retrieval tool for biomedical patentsBiomedical Text MiningInformation RetrievalInformation ExtractionPatentsPDF to text conversionScience & TechnologySupplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.cmpb.2018.03.012 .Background and objective. The volume of biomedical literature has been increasing in the last years. Patent documents have also followed this trend, being important sources of biomedical knowledge, technical details and curated data, which are put together along the granting process. The field of Biomedical text mining (BioTM) has been creating solutions for the problems posed by the unstructured nature of natural language, which makes the search of information a challenging task. Several BioTM techniques can be applied to patents. From those, Information Retrieval (IR) includes processes where relevant data are obtained from collections of documents. In this work, the main goal was to build a patent pipeline addressing IR tasks over patent repositories to make these documents amenable to BioTM tasks. Methods. The pipeline was developed within @Note2, an open-source computational framework for BioTM, adding a number of modules to the core libraries, including patent metadata and full text retrieval, PDF to text conversion and optical character recognition. Also, user interfaces were developed for the main operations materialized in a new @Note2 plug-in. Results. The integration of these tools in @Note2 opens opportunities to run BioTM tools over patent texts, including tasks from Information Extraction, such as Named Entity Recognition or Relation Extraction. We demonstrated the pipelines main functions with a case study, using an available benchmark dataset from BioCreative challenges. Also, we show the use of the plug-in with a user query related to the production of vanillin. Conclusions. This work makes available all the relevant content from patents to the scientific community, decreasing drastically the time required for this task, and provides graphical interfaces to ease the use of these tools.This work is co-funded by the Programa Operacional Re- gional do Norte, under the “Portugal2020”, through the Euro- pean Regional Development Fund ( ERDF ), within project SISBI- Ref a NORTE-01-0247-FEDER-003381 . This study was also supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01- 0145-FEDER-00 6 684) and BioTecNorte operation (NORTE-01-0145- FEDER-0 0 0 0 04) funded by European Regional Development Fund under the scope of Norte2020 - Programa Operacional Regional do Norte.info:eu-repo/semantics/publishedVersionElsevierUniversidade do MinhoAlves, T.Rodrigues, RúbenHugo CostaRocha, Miguel2018-062018-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/53766engAlves, T.; Rodrigues, Rúben; Hugo Costa; Rocha, Miguel, Development of an information retrieval tool for biomedical patents. Computer Methods and Programs in Biomedicine, 159, 125-134, 20180169-260710.1016/j.cmpb.2018.03.01229650307http://www.cmpbjournal.com/info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:45:13Zoai:repositorium.sdum.uminho.pt:1822/53766Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:43:02.384238Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Development of an information retrieval tool for biomedical patents
title Development of an information retrieval tool for biomedical patents
spellingShingle Development of an information retrieval tool for biomedical patents
Alves, T.
Biomedical Text Mining
Information Retrieval
Information Extraction
Patents
PDF to text conversion
Science & Technology
title_short Development of an information retrieval tool for biomedical patents
title_full Development of an information retrieval tool for biomedical patents
title_fullStr Development of an information retrieval tool for biomedical patents
title_full_unstemmed Development of an information retrieval tool for biomedical patents
title_sort Development of an information retrieval tool for biomedical patents
author Alves, T.
author_facet Alves, T.
Rodrigues, Rúben
Hugo Costa
Rocha, Miguel
author_role author
author2 Rodrigues, Rúben
Hugo Costa
Rocha, Miguel
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Alves, T.
Rodrigues, Rúben
Hugo Costa
Rocha, Miguel
dc.subject.por.fl_str_mv Biomedical Text Mining
Information Retrieval
Information Extraction
Patents
PDF to text conversion
Science & Technology
topic Biomedical Text Mining
Information Retrieval
Information Extraction
Patents
PDF to text conversion
Science & Technology
description Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.cmpb.2018.03.012 .
publishDate 2018
dc.date.none.fl_str_mv 2018-06
2018-06-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/53766
url http://hdl.handle.net/1822/53766
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Alves, T.; Rodrigues, Rúben; Hugo Costa; Rocha, Miguel, Development of an information retrieval tool for biomedical patents. Computer Methods and Programs in Biomedicine, 159, 125-134, 2018
0169-2607
10.1016/j.cmpb.2018.03.012
29650307
http://www.cmpbjournal.com/
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132985355665408