Entropy, mutual information, and population structure in genome-wide selection

Simiqueli, Guilherme Ferreira

Entropy, mutual information, and population structure in genome-wide selection

Detalhes bibliográficos
Autor(a) principal:	Simiqueli, Guilherme Ferreira
Data de Publicação:	2020
Tipo de documento:	Tese
Idioma:	eng
Título da fonte:	LOCUS Repositório Institucional da UFV
Texto Completo:	https://locus.ufv.br//handle/123456789/28716
Resumo:	Different populations can compose the training set aiming for a better predictive ability of genomic prediction models. However, this practice has not always resulted in higher predictive ability and some studies have proposed to account population structure effect for a better prediction. Different strategies like principal components covariates, uni and multi-population models, alternative genomic relationships matrices, admixed proportions covariates, or a mix of them have been applied to genomic prediction. Thus, the first chapter aims to evaluate some combinations of these strategies to help the decision making about considering or not considering population structure on genomic prediction. Simulated polygenic traits with 0.1 and 0.5 heritability and real data were used to evaluated the strategies. Bias was lower, when multi-population model was used for low-heritability simulated trait. The accuracy of high-heritability trait was lower for strategies that used alternative genomic matrices that accounted for differences in allele frequency, only in admixed populations. Further, for real data, two commonly used genomic relationship matrices showed lower values of predictive ability for all traits, which are likely controlled by few quantitative trait loci. Therefore, accounting for population structure depends on trait heritability, trait architecture, and admixture level of population for obtaining lower bias without reduction of accuracy, and, consequently, success of genomic prediction. The second chapter address the fact that random k-fold cross-validation in genome wide selection can provide high estimates of predictive ability, due to the high degree of kinship between the training and validation sets. However, many breeding tree populations are less genetically related to the training sets and have different levels of phenotypic diversity. Therefore, this chapter proposed novel methods of splitting cross-validation sets, accounting genetic similarity and phenotypic diversity estimated via mutual information and entropy, respectively. These methods also verified how distribution of phenotypic and genotypic information affects genome wide selection of trees. The methods trustworthily fitted models, according to the entropy of tree breeding populations and their genetic relatedness to the training sets. Validations sets with more phenotypic diversity showed higher predictive ability and lower bias. Therefore, the phenotypic diversity should be added in tree breeding populations for higher genetic gain and better estimation of genomic breeding values and a consistent long-term tree breeding success. Keywords: Population structure. Accuracy. Bias. Mutual information. Entropy. K-fold cross-validation.

Metadados do item

id	UFV_7516e8cb6ab5798e909753c274f61ead
oai_identifier_str	oai:locus.ufv.br:123456789/28716
network_acronym_str	UFV
network_name_str	LOCUS Repositório Institucional da UFV
repository_id_str	2145
spelling	Entropy, mutual information, and population structure in genome-wide selectionEntropia, informação mútua e estrutura de populações na seleção genômica amplaGenômicaMelhoramento genéticoEstrutura populacionalEntropiaPrediçãoGenética QuantitativaDifferent populations can compose the training set aiming for a better predictive ability of genomic prediction models. However, this practice has not always resulted in higher predictive ability and some studies have proposed to account population structure effect for a better prediction. Different strategies like principal components covariates, uni and multi-population models, alternative genomic relationships matrices, admixed proportions covariates, or a mix of them have been applied to genomic prediction. Thus, the first chapter aims to evaluate some combinations of these strategies to help the decision making about considering or not considering population structure on genomic prediction. Simulated polygenic traits with 0.1 and 0.5 heritability and real data were used to evaluated the strategies. Bias was lower, when multi-population model was used for low-heritability simulated trait. The accuracy of high-heritability trait was lower for strategies that used alternative genomic matrices that accounted for differences in allele frequency, only in admixed populations. Further, for real data, two commonly used genomic relationship matrices showed lower values of predictive ability for all traits, which are likely controlled by few quantitative trait loci. Therefore, accounting for population structure depends on trait heritability, trait architecture, and admixture level of population for obtaining lower bias without reduction of accuracy, and, consequently, success of genomic prediction. The second chapter address the fact that random k-fold cross-validation in genome wide selection can provide high estimates of predictive ability, due to the high degree of kinship between the training and validation sets. However, many breeding tree populations are less genetically related to the training sets and have different levels of phenotypic diversity. Therefore, this chapter proposed novel methods of splitting cross-validation sets, accounting genetic similarity and phenotypic diversity estimated via mutual information and entropy, respectively. These methods also verified how distribution of phenotypic and genotypic information affects genome wide selection of trees. The methods trustworthily fitted models, according to the entropy of tree breeding populations and their genetic relatedness to the training sets. Validations sets with more phenotypic diversity showed higher predictive ability and lower bias. Therefore, the phenotypic diversity should be added in tree breeding populations for higher genetic gain and better estimation of genomic breeding values and a consistent long-term tree breeding success. Keywords: Population structure. Accuracy. Bias. Mutual information. Entropy. K-fold cross-validation.Na predição genômica, diferentes populações podem compor o conjunto de treinamento para melhorar a capacidade preditiva. Entretanto, esta prática não tem resultado em maiores capacidades preditivas e alguns estudos propuseram acomodar o efeito de estrutura populacional para melhor predição. Diferentes estratégias como componentes principais, modelos uni e multipopulacionais, matrizes alternativas de parentesco genômico, proporção de indivíduos misturados ou uma mistura destas estratégias tem sido empregada na predição genômica. Portanto, o objetivo deste primeiro capítulo foi avaliar algumas combinações destas estratégias para ajudar no processo de decisão sobre considerar ou não o efeito de estrutura populacional na predição genômica. Duas características poligênicas foram simuladas com herdabilidade de 0,1 e 0,5 e dados reais foram utilizados na avaliação. O viés de predição foi menor quando modelos multipopulacionais foram empregados para característica simulada de baixa herdabilidade. A acurácia da característica com alta herdabilidade (0,5) em populações misturadas foi baixa para estratégias que utilizaram matrizes de parentesco genômico que consideravam diferenças na frequência alélica. Além disso, nos dados reais, duas matrizes alternativas de parentesco genômico apresentaram baixa capacidade preditiva para as características avaliadas, as quais são provavelmente governadas por poucos loci. Portanto, a acomodação de estrutura populacional depende da arquitetura genética da característica, da herdabilidade e do nível de mistura da população para obtenção de menor viés sem reduzir a acurácia e, consequentemente, sucesso da predição genômica. O segundo capítulo aborda a validação cruzada na seleção genômica ampla. Esta validação quando feita aleatoriamente ocasiona em altos valores das estimativas de capacidade preditiva, provavelmente, devido ao alto grau de parentesco entre os conjuntos de treinamento e validação. No entanto, muitas populações de melhoramento florestal são fracamente relacionadas geneticamente com os conjuntos de treinamento e possuem diferentes níveis de diversidade fenotípica. Portanto, este capítulo propôs novos métodos de separação dos conjuntos de validação cruzada, considerando a similaridade genética e a diversidade fenotípica, obtidas por meio da informação mútua e entropia, respectivamente. Esses novos métodos também verificaram como a distribuição das informações fenotípicas e genotípicas afeta a seleção genômica ampla de espécies florestais. Os novos métodos ajustaram modelos mais confiáveis e que estão de acordo com a entropia das populações de melhoramento e sua relação genética com os conjuntos de treinamento. Os conjuntos de validação com maior diversidade fenotípica apresentaram maior capacidade preditiva e menor viés. Portanto, a diversidade fenotípica deve ser adicionada nas populações de melhoramento para maior ganho genético e melhor estimativa dos valores genéticos genômicos. Palavras-chave: Estrutura populacional. Acurácia. Viés. Informação mútua. Entropia. Validação cruzada.Universidade Federal de ViçosaResende, Marcos Deon Vilela dehttp://lattes.cnpq.br/3748640680505163Simiqueli, Guilherme Ferreira2022-03-04T13:28:01Z2022-03-04T13:28:01Z2020-07-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfSIMIQUELI, Guilherme Ferreira. Entropy, mutual information, and population structure in genome-wide selection. 2020. 122 f. Tese (Doutorado em Genética e Melhoramento) - Universidade Federal de Viçosa, Viçosa. 2020.https://locus.ufv.br//handle/123456789/28716enginfo:eu-repo/semantics/openAccessreponame:LOCUS Repositório Institucional da UFVinstname:Universidade Federal de Viçosa (UFV)instacron:UFV2024-07-12T08:42:25Zoai:locus.ufv.br:123456789/28716Repositório InstitucionalPUBhttps://www.locus.ufv.br/oai/requestfabiojreis@ufv.bropendoar:21452024-07-12T08:42:25LOCUS Repositório Institucional da UFV - Universidade Federal de Viçosa (UFV)false
dc.title.none.fl_str_mv	Entropy, mutual information, and population structure in genome-wide selection Entropia, informação mútua e estrutura de populações na seleção genômica ampla
title	Entropy, mutual information, and population structure in genome-wide selection
spellingShingle	Entropy, mutual information, and population structure in genome-wide selection Simiqueli, Guilherme Ferreira Genômica Melhoramento genético Estrutura populacional Entropia Predição Genética Quantitativa
title_short	Entropy, mutual information, and population structure in genome-wide selection
title_full	Entropy, mutual information, and population structure in genome-wide selection
title_fullStr	Entropy, mutual information, and population structure in genome-wide selection
title_full_unstemmed	Entropy, mutual information, and population structure in genome-wide selection
title_sort	Entropy, mutual information, and population structure in genome-wide selection
author	Simiqueli, Guilherme Ferreira
author_facet	Simiqueli, Guilherme Ferreira
author_role	author
dc.contributor.none.fl_str_mv	Resende, Marcos Deon Vilela de http://lattes.cnpq.br/3748640680505163
dc.contributor.author.fl_str_mv	Simiqueli, Guilherme Ferreira
dc.subject.por.fl_str_mv	Genômica Melhoramento genético Estrutura populacional Entropia Predição Genética Quantitativa
topic	Genômica Melhoramento genético Estrutura populacional Entropia Predição Genética Quantitativa
description	Different populations can compose the training set aiming for a better predictive ability of genomic prediction models. However, this practice has not always resulted in higher predictive ability and some studies have proposed to account population structure effect for a better prediction. Different strategies like principal components covariates, uni and multi-population models, alternative genomic relationships matrices, admixed proportions covariates, or a mix of them have been applied to genomic prediction. Thus, the first chapter aims to evaluate some combinations of these strategies to help the decision making about considering or not considering population structure on genomic prediction. Simulated polygenic traits with 0.1 and 0.5 heritability and real data were used to evaluated the strategies. Bias was lower, when multi-population model was used for low-heritability simulated trait. The accuracy of high-heritability trait was lower for strategies that used alternative genomic matrices that accounted for differences in allele frequency, only in admixed populations. Further, for real data, two commonly used genomic relationship matrices showed lower values of predictive ability for all traits, which are likely controlled by few quantitative trait loci. Therefore, accounting for population structure depends on trait heritability, trait architecture, and admixture level of population for obtaining lower bias without reduction of accuracy, and, consequently, success of genomic prediction. The second chapter address the fact that random k-fold cross-validation in genome wide selection can provide high estimates of predictive ability, due to the high degree of kinship between the training and validation sets. However, many breeding tree populations are less genetically related to the training sets and have different levels of phenotypic diversity. Therefore, this chapter proposed novel methods of splitting cross-validation sets, accounting genetic similarity and phenotypic diversity estimated via mutual information and entropy, respectively. These methods also verified how distribution of phenotypic and genotypic information affects genome wide selection of trees. The methods trustworthily fitted models, according to the entropy of tree breeding populations and their genetic relatedness to the training sets. Validations sets with more phenotypic diversity showed higher predictive ability and lower bias. Therefore, the phenotypic diversity should be added in tree breeding populations for higher genetic gain and better estimation of genomic breeding values and a consistent long-term tree breeding success. Keywords: Population structure. Accuracy. Bias. Mutual information. Entropy. K-fold cross-validation.
publishDate	2020
dc.date.none.fl_str_mv	2020-07-23 2022-03-04T13:28:01Z 2022-03-04T13:28:01Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	SIMIQUELI, Guilherme Ferreira. Entropy, mutual information, and population structure in genome-wide selection. 2020. 122 f. Tese (Doutorado em Genética e Melhoramento) - Universidade Federal de Viçosa, Viçosa. 2020. https://locus.ufv.br//handle/123456789/28716
identifier_str_mv	SIMIQUELI, Guilherme Ferreira. Entropy, mutual information, and population structure in genome-wide selection. 2020. 122 f. Tese (Doutorado em Genética e Melhoramento) - Universidade Federal de Viçosa, Viçosa. 2020.
url	https://locus.ufv.br//handle/123456789/28716
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Viçosa
publisher.none.fl_str_mv	Universidade Federal de Viçosa
dc.source.none.fl_str_mv	reponame:LOCUS Repositório Institucional da UFV instname:Universidade Federal de Viçosa (UFV) instacron:UFV
instname_str	Universidade Federal de Viçosa (UFV)
instacron_str	UFV
institution	UFV
reponame_str	LOCUS Repositório Institucional da UFV
collection	LOCUS Repositório Institucional da UFV
repository.name.fl_str_mv	LOCUS Repositório Institucional da UFV - Universidade Federal de Viçosa (UFV)
repository.mail.fl_str_mv	fabiojreis@ufv.br
_version_	1817560037004935168

Entropy, mutual information, and population structure in genome-wide selection

Registros relacionados