Collaborative annotation and mapping tool for clinical concepts

Neves, André Sousa

Collaborative annotation and mapping tool for clinical concepts

Detalhes bibliográficos
Autor(a) principal:	Neves, André Sousa
Data de Publicação:	2021
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10773/33928
Resumo:	Every day new biomedical information is published in the form of research articles, books and reports, but given its unstructured form it is not useful for knowledge acquisition apart from keyword search. Over the years significant interest has been generated towards text mining and the production of structured data using information retrieval and information extraction techniques, namely named entity recognition. Several natural language processing tools were developed with the main purpose of aiding the manual labor-intensive task conducted by expert curators by implementing automatic pre-processing pipelines that annotate biomedical entities and their relationships in literature, along with interactive interfaces to review and validate them. Moreover, it is essential that the data is harmonized into a common standard that everyone can understand no matter what language, format or encoding it was originally recorded in, in order to provide a collaborative effort among researchers. Some tools provide efficient indexing and searching capabilities to map concepts from various domains into standard vocabulary concepts, or in other words are capable of standardize data into a common format which in turn allow collaborative studies to be conducted. Nevertheless, there is a lack of tools that allow to perform both annotation and mapping. This dissertation presents a web-based tool with the intent to fill this gap by allowing experts to still perform each task individually, but also to form a pipeline and use the output annotations as input for the mapping process. As a result, the tool provides an interactive interface that allows the users to upload text documents and annotate biomedical entities present in them, either manually by selecting portions of text or double clicking words, or automatically with Neji’s web services and manage those generated annotations. For mapping, the users can upload CSV documents containing terms to be mapped to standard vocabulary concepts, using Usagi’s open-source code. Moreover, the users can review and validate suggested mappings based on match score.

Metadados do item

id	RCAP_9e724f64573dda700e85feaa7f9ef134
oai_identifier_str	oai:ria.ua.pt:10773/33928
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Collaborative annotation and mapping tool for clinical conceptsBiomedical text miningNamed entity recognitionNatural language processingInformation retrievalInformation extractionMappingStandard vocabulary conceptsEvery day new biomedical information is published in the form of research articles, books and reports, but given its unstructured form it is not useful for knowledge acquisition apart from keyword search. Over the years significant interest has been generated towards text mining and the production of structured data using information retrieval and information extraction techniques, namely named entity recognition. Several natural language processing tools were developed with the main purpose of aiding the manual labor-intensive task conducted by expert curators by implementing automatic pre-processing pipelines that annotate biomedical entities and their relationships in literature, along with interactive interfaces to review and validate them. Moreover, it is essential that the data is harmonized into a common standard that everyone can understand no matter what language, format or encoding it was originally recorded in, in order to provide a collaborative effort among researchers. Some tools provide efficient indexing and searching capabilities to map concepts from various domains into standard vocabulary concepts, or in other words are capable of standardize data into a common format which in turn allow collaborative studies to be conducted. Nevertheless, there is a lack of tools that allow to perform both annotation and mapping. This dissertation presents a web-based tool with the intent to fill this gap by allowing experts to still perform each task individually, but also to form a pipeline and use the output annotations as input for the mapping process. As a result, the tool provides an interactive interface that allows the users to upload text documents and annotate biomedical entities present in them, either manually by selecting portions of text or double clicking words, or automatically with Neji’s web services and manage those generated annotations. For mapping, the users can upload CSV documents containing terms to be mapped to standard vocabulary concepts, using Usagi’s open-source code. Moreover, the users can review and validate suggested mappings based on match score.Todos os dias são publicadas novas informações biomédicas sob a forma de artigos de investigação, livros e relatórios, mas dada a sua forma não-estruturada não é útil para a aquisição de conhecimento para além da pesquisa por palavraschave. Ao longo dos anos tem surgido um interesse significativo na mineração de texto e a produção de dados estruturados, utilizando técnicas de recuperação de informação e extração de informação, nomeadamente o reconhecimento de entidades mencionadas. Foram desenvolvidas várias ferramentas de processamento de linguagem natural com o objetivo principal de auxiliar a tarefa manual intensiva realizada por curadores especialistas, implementando pipelines automáticos de pré-processamento que anotam entidades biomédicas e as relações entre si na literatura, juntamente com interfaces interativas para as rever e validar. Além disso, é essencial que os dados sejam harmonizados num padrão comum que todos possam compreender, independentemente da língua, formato ou codificação em que foram originalmente registados, a fim de proporcionar um esforço colaborativo entre os investigadores. Algumas ferramentas proporcionam capacidades eficientes de indexação e pesquisa para mapear conceitos de vários domínios em conceitos de vocabulários padrão, ou por outras palavras, são capazes de padronizar os dados num formato comum que, por sua vez, permite a realização de estudos colaborativos. No entanto, ferramentas que permitem realizar tanto a anotação como o mapeamento são escassas. Esta dissertação apresenta uma ferramenta web-based com a intenção de preencher esta lacuna, permitindo aos especialistas realizar cada tarefa individualmente, mas também formar um pipeline e utilizar as anotações resultantes como input para o processo de mapeamento. Como resultado, a ferramenta fornece uma interface interativa que permite aos utilizadores carregar documentos de texto e anotar entidades biomédicas presentes nos mesmos, quer manualmente selecionando porções de texto ou palavras com duplo clique, quer automaticamente com os serviços web do Neji e gerir as anotações geradas. Para mapeamento, os utilizadores podem carregar documentos CSV contendo termos para serem mapeados para conceitos de vocabulário padrão, utilizando o código open-source do Usagi. Além disso, os utilizadores podem rever e validar os mapeamentos sugeridos com base na pontuação dos mesmos.2022-05-20T12:20:55Z2021-12-07T00:00:00Z2021-12-07info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/33928engNeves, André Sousainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:05:17Zoai:ria.ua.pt:10773/33928Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:05:16.775731Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Collaborative annotation and mapping tool for clinical concepts
title	Collaborative annotation and mapping tool for clinical concepts
spellingShingle	Collaborative annotation and mapping tool for clinical concepts Neves, André Sousa Biomedical text mining Named entity recognition Natural language processing Information retrieval Information extraction Mapping Standard vocabulary concepts
title_short	Collaborative annotation and mapping tool for clinical concepts
title_full	Collaborative annotation and mapping tool for clinical concepts
title_fullStr	Collaborative annotation and mapping tool for clinical concepts
title_full_unstemmed	Collaborative annotation and mapping tool for clinical concepts
title_sort	Collaborative annotation and mapping tool for clinical concepts
author	Neves, André Sousa
author_facet	Neves, André Sousa
author_role	author
dc.contributor.author.fl_str_mv	Neves, André Sousa
dc.subject.por.fl_str_mv	Biomedical text mining Named entity recognition Natural language processing Information retrieval Information extraction Mapping Standard vocabulary concepts
topic	Biomedical text mining Named entity recognition Natural language processing Information retrieval Information extraction Mapping Standard vocabulary concepts
description	Every day new biomedical information is published in the form of research articles, books and reports, but given its unstructured form it is not useful for knowledge acquisition apart from keyword search. Over the years significant interest has been generated towards text mining and the production of structured data using information retrieval and information extraction techniques, namely named entity recognition. Several natural language processing tools were developed with the main purpose of aiding the manual labor-intensive task conducted by expert curators by implementing automatic pre-processing pipelines that annotate biomedical entities and their relationships in literature, along with interactive interfaces to review and validate them. Moreover, it is essential that the data is harmonized into a common standard that everyone can understand no matter what language, format or encoding it was originally recorded in, in order to provide a collaborative effort among researchers. Some tools provide efficient indexing and searching capabilities to map concepts from various domains into standard vocabulary concepts, or in other words are capable of standardize data into a common format which in turn allow collaborative studies to be conducted. Nevertheless, there is a lack of tools that allow to perform both annotation and mapping. This dissertation presents a web-based tool with the intent to fill this gap by allowing experts to still perform each task individually, but also to form a pipeline and use the output annotations as input for the mapping process. As a result, the tool provides an interactive interface that allows the users to upload text documents and annotate biomedical entities present in them, either manually by selecting portions of text or double clicking words, or automatically with Neji’s web services and manage those generated annotations. For mapping, the users can upload CSV documents containing terms to be mapped to standard vocabulary concepts, using Usagi’s open-source code. Moreover, the users can review and validate suggested mappings based on match score.
publishDate	2021
dc.date.none.fl_str_mv	2021-12-07T00:00:00Z 2021-12-07 2022-05-20T12:20:55Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10773/33928
url	http://hdl.handle.net/10773/33928
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799137707807473664

Collaborative annotation and mapping tool for clinical concepts

Registros relacionados