A genetic programming model for association studies to detect epistasis in low heritability data.

Detalhes bibliográficos
Autor(a) principal: RIBEIRO, I. M.
Data de Publicação: 2018
Outros Autores: BORGES, C. C. H., SILVA, B. Z., ARBEX, W. A.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
Texto Completo: http://www.alice.cnptia.embrapa.br/alice/handle/doc/1102526
Resumo: Abstract The genome-wide associations studies (GWAS) aims to identify the most influential markers in relation to the phenotype values. One of the substantial challenges is to find a non-linear mapping between genotype and phenotype, also known as epistasis, that usually becomes the process of searching and identifying functional SNPs more complex. Some diseases such as cervical cancer, leukemia and type 2 diabetes have low heritability. The heritability of the sample is directly related to the explanation defined by the genotype, so the lower the heritability the greater the influence of the environmental factors and the less the genotypic explanation. In this work, an algorithm capable of identifying epistatic associations at different levels of heritability is proposed. The developing model is a aplication of genetic programming with a specialized initialization for the initial population consisting of a random forest strategy. The initialization process aims to rank the most important SNPs increasing the probability of their insertion in the initial population of the genetic programming model. The expected behavior of the presented model for the obtainment of the causal markers intends to be robust in relation to the heritability level. The simulated experiments are case-control type with heritability level of 0.4, 0.3, 0.2 and 0.1 considering scenarios with 100 and 1000 markers. Our approach was compared with the GPAS software and a genetic programming algorithm without the initialization step. The results show that the use of an efficient population initialization method based on ranking strategy is very promising compared to other models.
id EMBR_596ea4d76e435fb45f956da36e3adcd0
oai_identifier_str oai:www.alice.cnptia.embrapa.br:doc/1102526
network_acronym_str EMBR
network_name_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository_id_str 2154
spelling A genetic programming model for association studies to detect epistasis in low heritability data.GWASSNPGenetic ProgrammingRandom ForestComputational ModelingMathematical ModelingBioinformaticsAbstract The genome-wide associations studies (GWAS) aims to identify the most influential markers in relation to the phenotype values. One of the substantial challenges is to find a non-linear mapping between genotype and phenotype, also known as epistasis, that usually becomes the process of searching and identifying functional SNPs more complex. Some diseases such as cervical cancer, leukemia and type 2 diabetes have low heritability. The heritability of the sample is directly related to the explanation defined by the genotype, so the lower the heritability the greater the influence of the environmental factors and the less the genotypic explanation. In this work, an algorithm capable of identifying epistatic associations at different levels of heritability is proposed. The developing model is a aplication of genetic programming with a specialized initialization for the initial population consisting of a random forest strategy. The initialization process aims to rank the most important SNPs increasing the probability of their insertion in the initial population of the genetic programming model. The expected behavior of the presented model for the obtainment of the causal markers intends to be robust in relation to the heritability level. The simulated experiments are case-control type with heritability level of 0.4, 0.3, 0.2 and 0.1 considering scenarios with 100 and 1000 markers. Our approach was compared with the GPAS software and a genetic programming algorithm without the initialization step. The results show that the use of an efficient population initialization method based on ranking strategy is very promising compared to other models.WAGNER ANTONIO ARBEX, CNPGL.RIBEIRO, I. M.BORGES, C. C. H.SILVA, B. Z.ARBEX, W. A.2018-12-26T23:42:00Z2018-12-26T23:42:00Z2018-12-2620182018-12-26T23:42:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleRevista de Informática Teórica e Aplicada, v. 25, n. 2, p. 85-92, 2018.http://www.alice.cnptia.embrapa.br/alice/handle/doc/1102526enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)instacron:EMBRAPA2018-12-26T23:42:05Zoai:www.alice.cnptia.embrapa.br:doc/1102526Repositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestopendoar:21542018-12-26T23:42:05falseRepositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestcg-riaa@embrapa.bropendoar:21542018-12-26T23:42:05Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)false
dc.title.none.fl_str_mv A genetic programming model for association studies to detect epistasis in low heritability data.
title A genetic programming model for association studies to detect epistasis in low heritability data.
spellingShingle A genetic programming model for association studies to detect epistasis in low heritability data.
RIBEIRO, I. M.
GWAS
SNP
Genetic Programming
Random Forest
Computational Modeling
Mathematical Modeling
Bioinformatics
title_short A genetic programming model for association studies to detect epistasis in low heritability data.
title_full A genetic programming model for association studies to detect epistasis in low heritability data.
title_fullStr A genetic programming model for association studies to detect epistasis in low heritability data.
title_full_unstemmed A genetic programming model for association studies to detect epistasis in low heritability data.
title_sort A genetic programming model for association studies to detect epistasis in low heritability data.
author RIBEIRO, I. M.
author_facet RIBEIRO, I. M.
BORGES, C. C. H.
SILVA, B. Z.
ARBEX, W. A.
author_role author
author2 BORGES, C. C. H.
SILVA, B. Z.
ARBEX, W. A.
author2_role author
author
author
dc.contributor.none.fl_str_mv WAGNER ANTONIO ARBEX, CNPGL.
dc.contributor.author.fl_str_mv RIBEIRO, I. M.
BORGES, C. C. H.
SILVA, B. Z.
ARBEX, W. A.
dc.subject.por.fl_str_mv GWAS
SNP
Genetic Programming
Random Forest
Computational Modeling
Mathematical Modeling
Bioinformatics
topic GWAS
SNP
Genetic Programming
Random Forest
Computational Modeling
Mathematical Modeling
Bioinformatics
description Abstract The genome-wide associations studies (GWAS) aims to identify the most influential markers in relation to the phenotype values. One of the substantial challenges is to find a non-linear mapping between genotype and phenotype, also known as epistasis, that usually becomes the process of searching and identifying functional SNPs more complex. Some diseases such as cervical cancer, leukemia and type 2 diabetes have low heritability. The heritability of the sample is directly related to the explanation defined by the genotype, so the lower the heritability the greater the influence of the environmental factors and the less the genotypic explanation. In this work, an algorithm capable of identifying epistatic associations at different levels of heritability is proposed. The developing model is a aplication of genetic programming with a specialized initialization for the initial population consisting of a random forest strategy. The initialization process aims to rank the most important SNPs increasing the probability of their insertion in the initial population of the genetic programming model. The expected behavior of the presented model for the obtainment of the causal markers intends to be robust in relation to the heritability level. The simulated experiments are case-control type with heritability level of 0.4, 0.3, 0.2 and 0.1 considering scenarios with 100 and 1000 markers. Our approach was compared with the GPAS software and a genetic programming algorithm without the initialization step. The results show that the use of an efficient population initialization method based on ranking strategy is very promising compared to other models.
publishDate 2018
dc.date.none.fl_str_mv 2018-12-26T23:42:00Z
2018-12-26T23:42:00Z
2018-12-26
2018
2018-12-26T23:42:00Z
dc.type.driver.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv Revista de Informática Teórica e Aplicada, v. 25, n. 2, p. 85-92, 2018.
http://www.alice.cnptia.embrapa.br/alice/handle/doc/1102526
identifier_str_mv Revista de Informática Teórica e Aplicada, v. 25, n. 2, p. 85-92, 2018.
url http://www.alice.cnptia.embrapa.br/alice/handle/doc/1102526
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron:EMBRAPA
instname_str Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron_str EMBRAPA
institution EMBRAPA
reponame_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
collection Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository.name.fl_str_mv Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
repository.mail.fl_str_mv cg-riaa@embrapa.br
_version_ 1794503467924979712