Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10348/11160 |
Resumo: | The increase in scientific research documents continues to grow significantly, making it very difficult to access relevant information from the domain we want. To solve this problem, several text mining techniques for automatic text processing have been developed to help researchers find information of interest and infer new theories. For this automatic document processing, the prior existence of a large amount of manually annotated documents is necessary for the development of robust text mining and machine learning models. To this work, a total of 5,300 documents related to the gluten protein were retrieved from PubMed, which were manually annotated, classified and from which relationships between the various biological entities were extracted for the creation of classification and relationship extraction models. Gluten is a protein that can be found in various foods and is related to the onset of several diseases, the most common being celiac disease. Following a gluten-free diet is currently the only known treatment for diseases related to this protein. However, nowadays more and more people follow a gluten-free diet on their initiative and without any related health issues. People are seeking answers through the internet and social networks, as there is an abundance of information, often not very credible. Thus, with this work, we intend to create reliable sources of knowledge to be used both by researchers for the formulation of new theories and discovery of new therapies related to gluten, and by patients who are just looking for answers to their health issues. With the methodologies applied, we were then able to create models to reduce the time and cost required to curate documents related to the gluten bibliome. |
id |
RCAP_18a35b26b577319b277968285be69427 |
---|---|
oai_identifier_str |
oai:repositorio.utad.pt:10348/11160 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten proteinGlutenbiocurationThe increase in scientific research documents continues to grow significantly, making it very difficult to access relevant information from the domain we want. To solve this problem, several text mining techniques for automatic text processing have been developed to help researchers find information of interest and infer new theories. For this automatic document processing, the prior existence of a large amount of manually annotated documents is necessary for the development of robust text mining and machine learning models. To this work, a total of 5,300 documents related to the gluten protein were retrieved from PubMed, which were manually annotated, classified and from which relationships between the various biological entities were extracted for the creation of classification and relationship extraction models. Gluten is a protein that can be found in various foods and is related to the onset of several diseases, the most common being celiac disease. Following a gluten-free diet is currently the only known treatment for diseases related to this protein. However, nowadays more and more people follow a gluten-free diet on their initiative and without any related health issues. People are seeking answers through the internet and social networks, as there is an abundance of information, often not very credible. Thus, with this work, we intend to create reliable sources of knowledge to be used both by researchers for the formulation of new theories and discovery of new therapies related to gluten, and by patients who are just looking for answers to their health issues. With the methodologies applied, we were then able to create models to reduce the time and cost required to curate documents related to the gluten bibliome.O aumento de documentos de pesquisa científica continua a crescer de forma acentuada, tornando-se muito difícil de ter acesso a informação relevante do domínio que pretendemos. Para a resolução deste problema, várias técnicas de text mining para o processamento automático de texto têm sido desenvolvidas para ajudar investigadores a encontrar informação de interesse e inferirem novas teorias. Para este processamento automático de documentos, é necessária a existência prévia de uma grande quantidade de documentos manualmente anotados para o desenvolvimento de modelos robustos de text mining e machine learning. Neste trabalho, foram retirados do PubMed um total de 5,300 documentos relacionados com a proteína do glúten, que foram manualmente anotados, classificados e de onde foram extraídas relações entre as várias entidades biológicas, para a criação de modelos de classificação e de extração de relações. O glúten é uma proteína que pode ser encontrada em vários alimentos e está relacionado com o aparecimento de várias doenças, sendo a mais comum a doença celíaca. Seguir uma dieta livre de glúten é atualmente o único tratamento conhecido para as doenças relacionadas com esta proteína. Cada vez mais os pacientes procuram obter respostas através da internet e de redes sociais, pois existe uma grande abundância de informação, muitas vezes pouco credível. Assim com este trabalho, pretendemos criar fontes fidedignas de conhecimento para serem usadas tanto por pesquisadores para a formulação de novas teorias e descoberta de novas terapias relacionadas com o glúten, como por doentes que apenas procurem respostas para as suas questões de saúde. Com as metodologias aplicadas, foi-nos então possível criar modelos para reduzir o tempo e custo necessários para a curação de documentos relacionados com o bibliome do glúten.2022-04-20T15:33:35Z2022-01-04T00:00:00Z2022-01-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/10348/11160TID:202981134engmetadata only accessinfo:eu-repo/semantics/openAccessFerreira, Tânia Raquel Moreirareponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-02T12:44:59Zoai:repositorio.utad.pt:10348/11160Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:03:53.974310Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein |
title |
Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein |
spellingShingle |
Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein Ferreira, Tânia Raquel Moreira Gluten biocuration |
title_short |
Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein |
title_full |
Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein |
title_fullStr |
Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein |
title_full_unstemmed |
Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein |
title_sort |
Data gathering and bibliome curation to support biomedical intelligent algorithms for the reconstruction of knowledge related to the gluten protein |
author |
Ferreira, Tânia Raquel Moreira |
author_facet |
Ferreira, Tânia Raquel Moreira |
author_role |
author |
dc.contributor.author.fl_str_mv |
Ferreira, Tânia Raquel Moreira |
dc.subject.por.fl_str_mv |
Gluten biocuration |
topic |
Gluten biocuration |
description |
The increase in scientific research documents continues to grow significantly, making it very difficult to access relevant information from the domain we want. To solve this problem, several text mining techniques for automatic text processing have been developed to help researchers find information of interest and infer new theories. For this automatic document processing, the prior existence of a large amount of manually annotated documents is necessary for the development of robust text mining and machine learning models. To this work, a total of 5,300 documents related to the gluten protein were retrieved from PubMed, which were manually annotated, classified and from which relationships between the various biological entities were extracted for the creation of classification and relationship extraction models. Gluten is a protein that can be found in various foods and is related to the onset of several diseases, the most common being celiac disease. Following a gluten-free diet is currently the only known treatment for diseases related to this protein. However, nowadays more and more people follow a gluten-free diet on their initiative and without any related health issues. People are seeking answers through the internet and social networks, as there is an abundance of information, often not very credible. Thus, with this work, we intend to create reliable sources of knowledge to be used both by researchers for the formulation of new theories and discovery of new therapies related to gluten, and by patients who are just looking for answers to their health issues. With the methodologies applied, we were then able to create models to reduce the time and cost required to curate documents related to the gluten bibliome. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-04-20T15:33:35Z 2022-01-04T00:00:00Z 2022-01-04 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10348/11160 TID:202981134 |
url |
http://hdl.handle.net/10348/11160 |
identifier_str_mv |
TID:202981134 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
metadata only access info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
metadata only access |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137124602085376 |