Collaborative annotation and mapping tool for clinical concepts

Detalhes bibliográficos
Autor(a) principal: Neves, André Sousa
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/33928
Resumo: Every day new biomedical information is published in the form of research articles, books and reports, but given its unstructured form it is not useful for knowledge acquisition apart from keyword search. Over the years significant interest has been generated towards text mining and the production of structured data using information retrieval and information extraction techniques, namely named entity recognition. Several natural language processing tools were developed with the main purpose of aiding the manual labor-intensive task conducted by expert curators by implementing automatic pre-processing pipelines that annotate biomedical entities and their relationships in literature, along with interactive interfaces to review and validate them. Moreover, it is essential that the data is harmonized into a common standard that everyone can understand no matter what language, format or encoding it was originally recorded in, in order to provide a collaborative effort among researchers. Some tools provide efficient indexing and searching capabilities to map concepts from various domains into standard vocabulary concepts, or in other words are capable of standardize data into a common format which in turn allow collaborative studies to be conducted. Nevertheless, there is a lack of tools that allow to perform both annotation and mapping. This dissertation presents a web-based tool with the intent to fill this gap by allowing experts to still perform each task individually, but also to form a pipeline and use the output annotations as input for the mapping process. As a result, the tool provides an interactive interface that allows the users to upload text documents and annotate biomedical entities present in them, either manually by selecting portions of text or double clicking words, or automatically with Neji’s web services and manage those generated annotations. For mapping, the users can upload CSV documents containing terms to be mapped to standard vocabulary concepts, using Usagi’s open-source code. Moreover, the users can review and validate suggested mappings based on match score.
id RCAP_9e724f64573dda700e85feaa7f9ef134
oai_identifier_str oai:ria.ua.pt:10773/33928
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Collaborative annotation and mapping tool for clinical conceptsBiomedical text miningNamed entity recognitionNatural language processingInformation retrievalInformation extractionMappingStandard vocabulary conceptsEvery day new biomedical information is published in the form of research articles, books and reports, but given its unstructured form it is not useful for knowledge acquisition apart from keyword search. Over the years significant interest has been generated towards text mining and the production of structured data using information retrieval and information extraction techniques, namely named entity recognition. Several natural language processing tools were developed with the main purpose of aiding the manual labor-intensive task conducted by expert curators by implementing automatic pre-processing pipelines that annotate biomedical entities and their relationships in literature, along with interactive interfaces to review and validate them. Moreover, it is essential that the data is harmonized into a common standard that everyone can understand no matter what language, format or encoding it was originally recorded in, in order to provide a collaborative effort among researchers. Some tools provide efficient indexing and searching capabilities to map concepts from various domains into standard vocabulary concepts, or in other words are capable of standardize data into a common format which in turn allow collaborative studies to be conducted. Nevertheless, there is a lack of tools that allow to perform both annotation and mapping. This dissertation presents a web-based tool with the intent to fill this gap by allowing experts to still perform each task individually, but also to form a pipeline and use the output annotations as input for the mapping process. As a result, the tool provides an interactive interface that allows the users to upload text documents and annotate biomedical entities present in them, either manually by selecting portions of text or double clicking words, or automatically with Neji’s web services and manage those generated annotations. For mapping, the users can upload CSV documents containing terms to be mapped to standard vocabulary concepts, using Usagi’s open-source code. Moreover, the users can review and validate suggested mappings based on match score.Todos os dias são publicadas novas informações biomédicas sob a forma de artigos de investigação, livros e relatórios, mas dada a sua forma não-estruturada não é útil para a aquisição de conhecimento para além da pesquisa por palavraschave. Ao longo dos anos tem surgido um interesse significativo na mineração de texto e a produção de dados estruturados, utilizando técnicas de recuperação de informação e extração de informação, nomeadamente o reconhecimento de entidades mencionadas. Foram desenvolvidas várias ferramentas de processamento de linguagem natural com o objetivo principal de auxiliar a tarefa manual intensiva realizada por curadores especialistas, implementando pipelines automáticos de pré-processamento que anotam entidades biomédicas e as relações entre si na literatura, juntamente com interfaces interativas para as rever e validar. Além disso, é essencial que os dados sejam harmonizados num padrão comum que todos possam compreender, independentemente da língua, formato ou codificação em que foram originalmente registados, a fim de proporcionar um esforço colaborativo entre os investigadores. Algumas ferramentas proporcionam capacidades eficientes de indexação e pesquisa para mapear conceitos de vários domínios em conceitos de vocabulários padrão, ou por outras palavras, são capazes de padronizar os dados num formato comum que, por sua vez, permite a realização de estudos colaborativos. No entanto, ferramentas que permitem realizar tanto a anotação como o mapeamento são escassas. Esta dissertação apresenta uma ferramenta web-based com a intenção de preencher esta lacuna, permitindo aos especialistas realizar cada tarefa individualmente, mas também formar um pipeline e utilizar as anotações resultantes como input para o processo de mapeamento. Como resultado, a ferramenta fornece uma interface interativa que permite aos utilizadores carregar documentos de texto e anotar entidades biomédicas presentes nos mesmos, quer manualmente selecionando porções de texto ou palavras com duplo clique, quer automaticamente com os serviços web do Neji e gerir as anotações geradas. Para mapeamento, os utilizadores podem carregar documentos CSV contendo termos para serem mapeados para conceitos de vocabulário padrão, utilizando o código open-source do Usagi. Além disso, os utilizadores podem rever e validar os mapeamentos sugeridos com base na pontuação dos mesmos.2022-05-20T12:20:55Z2021-12-07T00:00:00Z2021-12-07info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/33928engNeves, André Sousainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:05:17Zoai:ria.ua.pt:10773/33928Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:05:16.775731Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Collaborative annotation and mapping tool for clinical concepts
title Collaborative annotation and mapping tool for clinical concepts
spellingShingle Collaborative annotation and mapping tool for clinical concepts
Neves, André Sousa
Biomedical text mining
Named entity recognition
Natural language processing
Information retrieval
Information extraction
Mapping
Standard vocabulary concepts
title_short Collaborative annotation and mapping tool for clinical concepts
title_full Collaborative annotation and mapping tool for clinical concepts
title_fullStr Collaborative annotation and mapping tool for clinical concepts
title_full_unstemmed Collaborative annotation and mapping tool for clinical concepts
title_sort Collaborative annotation and mapping tool for clinical concepts
author Neves, André Sousa
author_facet Neves, André Sousa
author_role author
dc.contributor.author.fl_str_mv Neves, André Sousa
dc.subject.por.fl_str_mv Biomedical text mining
Named entity recognition
Natural language processing
Information retrieval
Information extraction
Mapping
Standard vocabulary concepts
topic Biomedical text mining
Named entity recognition
Natural language processing
Information retrieval
Information extraction
Mapping
Standard vocabulary concepts
description Every day new biomedical information is published in the form of research articles, books and reports, but given its unstructured form it is not useful for knowledge acquisition apart from keyword search. Over the years significant interest has been generated towards text mining and the production of structured data using information retrieval and information extraction techniques, namely named entity recognition. Several natural language processing tools were developed with the main purpose of aiding the manual labor-intensive task conducted by expert curators by implementing automatic pre-processing pipelines that annotate biomedical entities and their relationships in literature, along with interactive interfaces to review and validate them. Moreover, it is essential that the data is harmonized into a common standard that everyone can understand no matter what language, format or encoding it was originally recorded in, in order to provide a collaborative effort among researchers. Some tools provide efficient indexing and searching capabilities to map concepts from various domains into standard vocabulary concepts, or in other words are capable of standardize data into a common format which in turn allow collaborative studies to be conducted. Nevertheless, there is a lack of tools that allow to perform both annotation and mapping. This dissertation presents a web-based tool with the intent to fill this gap by allowing experts to still perform each task individually, but also to form a pipeline and use the output annotations as input for the mapping process. As a result, the tool provides an interactive interface that allows the users to upload text documents and annotate biomedical entities present in them, either manually by selecting portions of text or double clicking words, or automatically with Neji’s web services and manage those generated annotations. For mapping, the users can upload CSV documents containing terms to be mapped to standard vocabulary concepts, using Usagi’s open-source code. Moreover, the users can review and validate suggested mappings based on match score.
publishDate 2021
dc.date.none.fl_str_mv 2021-12-07T00:00:00Z
2021-12-07
2022-05-20T12:20:55Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/33928
url http://hdl.handle.net/10773/33928
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137707807473664