Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/10216/137172 |
Resumo: | Increasingly more complex problems are being addressed in life sciences. Acquiring all the data that may be related to the problem in question is paramount. Equally important is to know how the data is related to each other and to the problem itself. On the other hand, there are large amounts of data and information available on the Web. Researchers are already using Data Mining and Machine Learning as a valuable tool in their researches, albeit the usual procedure is to look for the information based on induction models. So far, despite the great successes already achieved using Data Mining and Machine Learning, it is not easy to integrate this vast amount of available information in the inductive process with propositional algorithms. Our main motivation is to address the problem of integrating domain information into the inductive process of propositional Data Mining and Machine Learning techniques by enriching the training data to be used in inductive logic programming systems. The algorithms of propositional machine learning are very dependent on data attributes. It still is hard to identify which attributes are more suitable for a particular task in the research. It is also hard to extract relevant information from the enormous quantity of data available. We will concentrate the available data, derive features that ILP algorithms can use to induce descriptions, solving the problems. We are creating a web platform to obtain relevant bioinformatics (particularly Genomics) and Cheminformatics problems. It fetches the data from public repositories with genomics, protein and chemical data. After the data enrichment, Prolog systems use inductive logic programming to induce rules and solve specific Bioinformatics and Cheminformatics case studies. To assess the impact of the data enrichment with ILP, we compare with the results obtained solving the same cases using propositional algorithms. |
id |
RCAP_e448083eea02daa399d58b07f211f2c0 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/137172 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics DomainsEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringIncreasingly more complex problems are being addressed in life sciences. Acquiring all the data that may be related to the problem in question is paramount. Equally important is to know how the data is related to each other and to the problem itself. On the other hand, there are large amounts of data and information available on the Web. Researchers are already using Data Mining and Machine Learning as a valuable tool in their researches, albeit the usual procedure is to look for the information based on induction models. So far, despite the great successes already achieved using Data Mining and Machine Learning, it is not easy to integrate this vast amount of available information in the inductive process with propositional algorithms. Our main motivation is to address the problem of integrating domain information into the inductive process of propositional Data Mining and Machine Learning techniques by enriching the training data to be used in inductive logic programming systems. The algorithms of propositional machine learning are very dependent on data attributes. It still is hard to identify which attributes are more suitable for a particular task in the research. It is also hard to extract relevant information from the enormous quantity of data available. We will concentrate the available data, derive features that ILP algorithms can use to induce descriptions, solving the problems. We are creating a web platform to obtain relevant bioinformatics (particularly Genomics) and Cheminformatics problems. It fetches the data from public repositories with genomics, protein and chemical data. After the data enrichment, Prolog systems use inductive logic programming to induce rules and solve specific Bioinformatics and Cheminformatics case studies. To assess the impact of the data enrichment with ILP, we compare with the results obtained solving the same cases using propositional algorithms.2021-10-142021-10-14T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/137172TID:202827607engLuís Ricardo Marques Oliveirainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:26:56Zoai:repositorio-aberto.up.pt:10216/137172Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:01:27.662459Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains |
title |
Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains |
spellingShingle |
Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains Luís Ricardo Marques Oliveira Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains |
title_full |
Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains |
title_fullStr |
Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains |
title_full_unstemmed |
Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains |
title_sort |
Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains |
author |
Luís Ricardo Marques Oliveira |
author_facet |
Luís Ricardo Marques Oliveira |
author_role |
author |
dc.contributor.author.fl_str_mv |
Luís Ricardo Marques Oliveira |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
Increasingly more complex problems are being addressed in life sciences. Acquiring all the data that may be related to the problem in question is paramount. Equally important is to know how the data is related to each other and to the problem itself. On the other hand, there are large amounts of data and information available on the Web. Researchers are already using Data Mining and Machine Learning as a valuable tool in their researches, albeit the usual procedure is to look for the information based on induction models. So far, despite the great successes already achieved using Data Mining and Machine Learning, it is not easy to integrate this vast amount of available information in the inductive process with propositional algorithms. Our main motivation is to address the problem of integrating domain information into the inductive process of propositional Data Mining and Machine Learning techniques by enriching the training data to be used in inductive logic programming systems. The algorithms of propositional machine learning are very dependent on data attributes. It still is hard to identify which attributes are more suitable for a particular task in the research. It is also hard to extract relevant information from the enormous quantity of data available. We will concentrate the available data, derive features that ILP algorithms can use to induce descriptions, solving the problems. We are creating a web platform to obtain relevant bioinformatics (particularly Genomics) and Cheminformatics problems. It fetches the data from public repositories with genomics, protein and chemical data. After the data enrichment, Prolog systems use inductive logic programming to induce rules and solve specific Bioinformatics and Cheminformatics case studies. To assess the impact of the data enrichment with ILP, we compare with the results obtained solving the same cases using propositional algorithms. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-10-14 2021-10-14T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/137172 TID:202827607 |
url |
https://hdl.handle.net/10216/137172 |
identifier_str_mv |
TID:202827607 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799135938633269248 |