Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein

Detalhes bibliográficos
Autor(a) principal: Ferreira, Tânia Raquel Moreira
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10348/11160
Resumo: The increase in scientific research documents continues to grow significantly, making it very difficult to access relevant information from the domain we want. To solve this problem, several text mining techniques for automatic text processing have been developed to help researchers find information of interest and infer new theories. For this automatic document processing, the prior existence of a large amount of manually annotated documents is necessary for the development of robust text mining and machine learning models. To this work, a total of 5,300 documents related to the gluten protein were retrieved from PubMed, which were manually annotated, classified and from which relationships between the various biological entities were extracted for the creation of classification and relationship extraction models. Gluten is a protein that can be found in various foods and is related to the onset of several diseases, the most common being celiac disease. Following a gluten-free diet is currently the only known treatment for diseases related to this protein. However, nowadays more and more people follow a gluten-free diet on their initiative and without any related health issues. People are seeking answers through the internet and social networks, as there is an abundance of information, often not very credible. Thus, with this work, we intend to create reliable sources of knowledge to be used both by researchers for the formulation of new theories and discovery of new therapies related to gluten, and by patients who are just looking for answers to their health issues. With the methodologies applied, we were then able to create models to reduce the time and cost required to curate documents related to the gluten bibliome.
id RCAP_18a35b26b577319b277968285be69427
oai_identifier_str oai:repositorio.utad.pt:10348/11160
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten proteinGlutenbiocurationThe increase in scientific research documents continues to grow significantly, making it very difficult to access relevant information from the domain we want. To solve this problem, several text mining techniques for automatic text processing have been developed to help researchers find information of interest and infer new theories. For this automatic document processing, the prior existence of a large amount of manually annotated documents is necessary for the development of robust text mining and machine learning models. To this work, a total of 5,300 documents related to the gluten protein were retrieved from PubMed, which were manually annotated, classified and from which relationships between the various biological entities were extracted for the creation of classification and relationship extraction models. Gluten is a protein that can be found in various foods and is related to the onset of several diseases, the most common being celiac disease. Following a gluten-free diet is currently the only known treatment for diseases related to this protein. However, nowadays more and more people follow a gluten-free diet on their initiative and without any related health issues. People are seeking answers through the internet and social networks, as there is an abundance of information, often not very credible. Thus, with this work, we intend to create reliable sources of knowledge to be used both by researchers for the formulation of new theories and discovery of new therapies related to gluten, and by patients who are just looking for answers to their health issues. With the methodologies applied, we were then able to create models to reduce the time and cost required to curate documents related to the gluten bibliome.O aumento de documentos de pesquisa científica continua a crescer de forma acentuada, tornando-se muito difícil de ter acesso a informação relevante do domínio que pretendemos. Para a resolução deste problema, várias técnicas de text mining para o processamento automático de texto têm sido desenvolvidas para ajudar investigadores a encontrar informação de interesse e inferirem novas teorias. Para este processamento automático de documentos, é necessária a existência prévia de uma grande quantidade de documentos manualmente anotados para o desenvolvimento de modelos robustos de text mining e machine learning. Neste trabalho, foram retirados do PubMed um total de 5,300 documentos relacionados com a proteína do glúten, que foram manualmente anotados, classificados e de onde foram extraídas relações entre as várias entidades biológicas, para a criação de modelos de classificação e de extração de relações. O glúten é uma proteína que pode ser encontrada em vários alimentos e está relacionado com o aparecimento de várias doenças, sendo a mais comum a doença celíaca. Seguir uma dieta livre de glúten é atualmente o único tratamento conhecido para as doenças relacionadas com esta proteína. Cada vez mais os pacientes procuram obter respostas através da internet e de redes sociais, pois existe uma grande abundância de informação, muitas vezes pouco credível. Assim com este trabalho, pretendemos criar fontes fidedignas de conhecimento para serem usadas tanto por pesquisadores para a formulação de novas teorias e descoberta de novas terapias relacionadas com o glúten, como por doentes que apenas procurem respostas para as suas questões de saúde. Com as metodologias aplicadas, foi-nos então possível criar modelos para reduzir o tempo e custo necessários para a curação de documentos relacionados com o bibliome do glúten.2022-04-20T15:33:35Z2022-01-04T00:00:00Z2022-01-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/10348/11160TID:202981134engmetadata only accessinfo:eu-repo/semantics/openAccessFerreira, Tânia Raquel Moreirareponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-02T12:44:59Zoai:repositorio.utad.pt:10348/11160Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:03:53.974310Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein
title Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein
spellingShingle Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein
Ferreira, Tânia Raquel Moreira
Gluten
biocuration
title_short Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein
title_full Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein
title_fullStr Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein
title_full_unstemmed Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein
title_sort Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein
author Ferreira, Tânia Raquel Moreira
author_facet Ferreira, Tânia Raquel Moreira
author_role author
dc.contributor.author.fl_str_mv Ferreira, Tânia Raquel Moreira
dc.subject.por.fl_str_mv Gluten
biocuration
topic Gluten
biocuration
description The increase in scientific research documents continues to grow significantly, making it very difficult to access relevant information from the domain we want. To solve this problem, several text mining techniques for automatic text processing have been developed to help researchers find information of interest and infer new theories. For this automatic document processing, the prior existence of a large amount of manually annotated documents is necessary for the development of robust text mining and machine learning models. To this work, a total of 5,300 documents related to the gluten protein were retrieved from PubMed, which were manually annotated, classified and from which relationships between the various biological entities were extracted for the creation of classification and relationship extraction models. Gluten is a protein that can be found in various foods and is related to the onset of several diseases, the most common being celiac disease. Following a gluten-free diet is currently the only known treatment for diseases related to this protein. However, nowadays more and more people follow a gluten-free diet on their initiative and without any related health issues. People are seeking answers through the internet and social networks, as there is an abundance of information, often not very credible. Thus, with this work, we intend to create reliable sources of knowledge to be used both by researchers for the formulation of new theories and discovery of new therapies related to gluten, and by patients who are just looking for answers to their health issues. With the methodologies applied, we were then able to create models to reduce the time and cost required to curate documents related to the gluten bibliome.
publishDate 2022
dc.date.none.fl_str_mv 2022-04-20T15:33:35Z
2022-01-04T00:00:00Z
2022-01-04
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10348/11160
TID:202981134
url http://hdl.handle.net/10348/11160
identifier_str_mv TID:202981134
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv metadata only access
info:eu-repo/semantics/openAccess
rights_invalid_str_mv metadata only access
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137124602085376