Web-based tool for searching tables’ contents

Detalhes bibliográficos
Autor(a) principal: Oliveira, Alexandre Daniel Moreira
Data de Publicação: 2018
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/25876
Resumo: The number of biomedical articles is constantly growing and researchers have more and more difficulty to efficiently find relevant information, compare results and identify new hypotheses. Text mining techniques have been explored to develop systems with the aim of providing easy and fast access to scientific literature. The problem is that most of these tools completely ignore tables and just process textual parts. This dissertation focuses on the analysis and indexing of tables extracted from scientific articles, as they often include a lot of information that can be useful to researchers and it is not available in the remaining content of the publications. So, the main objective of the work is to create a flexible indexing structure to handle different table formats and recognize biomedical concepts referred in the tables themselves, their captions and texts that reference them. A web-based tool was developed to allow users to search and visualize annotated tables extracted from scientific articles. The solution found uses some open-source frameworks, namely Neji for concept recognition and Elasticsearch for text indexing.
id RCAP_c6ddbde4459e96e6f3b52c530e01daf1
oai_identifier_str oai:ria.ua.pt:10773/25876
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Web-based tool for searching tables’ contentsText MiningTable MiningConcept RecognitionInformation RetrievalBioinformaticsThe number of biomedical articles is constantly growing and researchers have more and more difficulty to efficiently find relevant information, compare results and identify new hypotheses. Text mining techniques have been explored to develop systems with the aim of providing easy and fast access to scientific literature. The problem is that most of these tools completely ignore tables and just process textual parts. This dissertation focuses on the analysis and indexing of tables extracted from scientific articles, as they often include a lot of information that can be useful to researchers and it is not available in the remaining content of the publications. So, the main objective of the work is to create a flexible indexing structure to handle different table formats and recognize biomedical concepts referred in the tables themselves, their captions and texts that reference them. A web-based tool was developed to allow users to search and visualize annotated tables extracted from scientific articles. The solution found uses some open-source frameworks, namely Neji for concept recognition and Elasticsearch for text indexing.O número de artigos biomédicos está constantemente a crescer e os investigadores têm cada vez mais dificuldade em encontrar informação relevante, comparar resultados e identificar novas hipóteses de forma eficiente. As técnicas de mineração de texto têm sido exploradas para desenvolver sistemas que forneçam acesso fácil e rápido à literatura científica. O problema é que muitas destas ferramentas ignoram completamente as tabelas e apenas processam as partes textuais. Esta dissertação foca-se na análise e indexação de tabelas extraídas de artigos científicos, dado que muitas vezes estas incluem bastante informação que pode ser útil para os investigadores e não está disponível no restante conteúdo das publicações. Assim, o principal objetivo deste trabalho é criar uma estrutura de indexação flexível capaz de lidar com diferentes formatos de tabelas e identificar conceitos biomédicos referidos nas próprias tabelas, nas legendas e no texto que referencia as tabelas. Foi então desenvolvida uma ferramenta web que permite aos utilizadores pesquisar e visualizar tabelas anotadas extraídas de artigos científicos. A solução encontrada usa algumas ferramentas de código aberto, nomeadamente o Neji para o reconhecimento de conceitos e o Elasticsearch para a indexação de texto.2019-05-02T08:43:04Z2018-01-01T00:00:00Z2018info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/25876TID:202234207engOliveira, Alexandre Daniel Moreirainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:50:09Zoai:ria.ua.pt:10773/25876Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:59:01.455893Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Web-based tool for searching tables’ contents
title Web-based tool for searching tables’ contents
spellingShingle Web-based tool for searching tables’ contents
Oliveira, Alexandre Daniel Moreira
Text Mining
Table Mining
Concept Recognition
Information Retrieval
Bioinformatics
title_short Web-based tool for searching tables’ contents
title_full Web-based tool for searching tables’ contents
title_fullStr Web-based tool for searching tables’ contents
title_full_unstemmed Web-based tool for searching tables’ contents
title_sort Web-based tool for searching tables’ contents
author Oliveira, Alexandre Daniel Moreira
author_facet Oliveira, Alexandre Daniel Moreira
author_role author
dc.contributor.author.fl_str_mv Oliveira, Alexandre Daniel Moreira
dc.subject.por.fl_str_mv Text Mining
Table Mining
Concept Recognition
Information Retrieval
Bioinformatics
topic Text Mining
Table Mining
Concept Recognition
Information Retrieval
Bioinformatics
description The number of biomedical articles is constantly growing and researchers have more and more difficulty to efficiently find relevant information, compare results and identify new hypotheses. Text mining techniques have been explored to develop systems with the aim of providing easy and fast access to scientific literature. The problem is that most of these tools completely ignore tables and just process textual parts. This dissertation focuses on the analysis and indexing of tables extracted from scientific articles, as they often include a lot of information that can be useful to researchers and it is not available in the remaining content of the publications. So, the main objective of the work is to create a flexible indexing structure to handle different table formats and recognize biomedical concepts referred in the tables themselves, their captions and texts that reference them. A web-based tool was developed to allow users to search and visualize annotated tables extracted from scientific articles. The solution found uses some open-source frameworks, namely Neji for concept recognition and Elasticsearch for text indexing.
publishDate 2018
dc.date.none.fl_str_mv 2018-01-01T00:00:00Z
2018
2019-05-02T08:43:04Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/25876
TID:202234207
url http://hdl.handle.net/10773/25876
identifier_str_mv TID:202234207
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137644081315840