Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente

Detalhes bibliográficos
Autor(a) principal: Nuvunga, Joel Jorge
Data de Publicação: 2017
Tipo de documento: Tese
Idioma: por
Título da fonte: Repositório Institucional da UFLA
Texto Completo: http://repositorio.ufla.br/jspui/handle/1/12273
Resumo: One of the main challenges in plant breeding programs is the efficient study of the genotypes x environments interaction (GEI). The presence of significant GE interaction hinders the work of the breeder for the recommendation and selection of superior genotypes. Among the various statistical procedures developed for this purpose, special emphasis should be given to those based on mixed models through factor analysis, commonly referred to as factor-analytic model (FA). This consists of a parsimonious approach and presents suggestive advantages when compared with classical methodologies, such as the great flexibility to deal with unbalanced data and heterogeneous variances. However, some problems are related to the factor analytic model: computational cost in analyzes with large number of environments and the Heywood cases, which makes the model unidentifiable. Moreover, the model representation in conventional biplot does not include any measure of uncertainty regarding the scores that describe GEI effect or Genotype (G) + GEI effects, plotted. The present proposal seeks to describe general forms of how heterogeneity of genetic and residual covariance can be modeled from the perspective of factorial analysis in mixed models, using spectral decomposition of the genetic effects within Bayesian approach, different from other procedures present in the literature in which the factor loads are directly sampled. In addition, the objective was to develop a procedure to incorporate inference to the biplot, through the construction of regions of credibility for genotypic and environmental scores. In this study, spherical distributions were assumed as prioris for eigenvectors and truncated normal distribution for singular values, as well as scaled inverse chi-squared distribution for residual variances and non-informative priori for the effect of genotypes. This approach differs from the Bayesian methods presented so far that assume the same constraints present in the mixed effects model. To exemplify the proposed method, we used simulated data and real data which study variable is the yield of spikes in t.ha-1. Samples for the inference process were obtained directly using the Gibbs sampler. Random unbalancing was performed in the data considering levels of 10%, 33% and 50% of losses of the genotype in the environment. According to the results, the FA analysis with two loads presented higher predictive capacity than the competing models. Unbalancing of 10% and 33% had mean values of correlation above 0.40 and with 50%, of 0.46. It was also observed that the performance of the model was better in the order of 50%, 33% and 10% of imbalance. We also verified that the analysis with the Bayesian FA model is robust under large levels of data unbalance. A relevant detail in this study concerns the selection of models, which proved not to be a trivial task in the case of real data, requiring additional criteria. In addition, the model proposed in this work showed greater predictive capacity than the equivalent frequentist model and the parameters were adequately estimated, being identifiable, without the need for rotationality of factor loads or imposition of restrictions, which represents a great advantage of the method proposed here.
id UFLA_f8a484d24448966e1338a63cef50c9cf
oai_identifier_str oai:localhost:1/12273
network_acronym_str UFLA
network_name_str Repositório Institucional da UFLA
repository_id_str
spelling Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambienteBayesian factor analytic model applied to multi-environment trialsPlantas – Melhoramento genético – Métodos estatísticosInteração genótipo-ambienteModelo fatorial analíticoTeoria bayesiana de decisão estatísticaPlant breeding – Statistical methodsGenotype-environment interactionFactor-analytic modelBayesian statistical decision theoryEstatísticaGenética VegetalOne of the main challenges in plant breeding programs is the efficient study of the genotypes x environments interaction (GEI). The presence of significant GE interaction hinders the work of the breeder for the recommendation and selection of superior genotypes. Among the various statistical procedures developed for this purpose, special emphasis should be given to those based on mixed models through factor analysis, commonly referred to as factor-analytic model (FA). This consists of a parsimonious approach and presents suggestive advantages when compared with classical methodologies, such as the great flexibility to deal with unbalanced data and heterogeneous variances. However, some problems are related to the factor analytic model: computational cost in analyzes with large number of environments and the Heywood cases, which makes the model unidentifiable. Moreover, the model representation in conventional biplot does not include any measure of uncertainty regarding the scores that describe GEI effect or Genotype (G) + GEI effects, plotted. The present proposal seeks to describe general forms of how heterogeneity of genetic and residual covariance can be modeled from the perspective of factorial analysis in mixed models, using spectral decomposition of the genetic effects within Bayesian approach, different from other procedures present in the literature in which the factor loads are directly sampled. In addition, the objective was to develop a procedure to incorporate inference to the biplot, through the construction of regions of credibility for genotypic and environmental scores. In this study, spherical distributions were assumed as prioris for eigenvectors and truncated normal distribution for singular values, as well as scaled inverse chi-squared distribution for residual variances and non-informative priori for the effect of genotypes. This approach differs from the Bayesian methods presented so far that assume the same constraints present in the mixed effects model. To exemplify the proposed method, we used simulated data and real data which study variable is the yield of spikes in t.ha-1. Samples for the inference process were obtained directly using the Gibbs sampler. Random unbalancing was performed in the data considering levels of 10%, 33% and 50% of losses of the genotype in the environment. According to the results, the FA analysis with two loads presented higher predictive capacity than the competing models. Unbalancing of 10% and 33% had mean values of correlation above 0.40 and with 50%, of 0.46. It was also observed that the performance of the model was better in the order of 50%, 33% and 10% of imbalance. We also verified that the analysis with the Bayesian FA model is robust under large levels of data unbalance. A relevant detail in this study concerns the selection of models, which proved not to be a trivial task in the case of real data, requiring additional criteria. In addition, the model proposed in this work showed greater predictive capacity than the equivalent frequentist model and the parameters were adequately estimated, being identifiable, without the need for rotationality of factor loads or imposition of restrictions, which represents a great advantage of the method proposed here.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Um dos principais desafios presentes em programas de melhoramento de plantas é o estudo eficiente da interação entre genótipos e ambientes (GEI). A presença de interação GE significativa dificulta o trabalho do meliorista para a seleção e recomendação ampla de genótipos superiores. Dentre os diversos procedimentos estatísticos desenvolvidos para esse fim, merecem especial destaque aqueles baseados em modelos mistos via análise de fatores, comumente referidos como modelo fatorial analítico (FA). Esse consiste em uma abordagem parcimoniosa e apresenta vantagens sugestivas quando comparadas com metodologias clássicas, como a grande flexibilidade para lidar com dados desbalanceados e com variâncias heterogêneas. Entretanto, alguns problemas estão relacionados ao modelo fatorial analítico: o custo computacional em análises com grande número de ambientes e os casos Heywood, que torna o modelo não identificável. Além do que, a representação do modelo em biplot convencional não comporta nenhuma medida da incerteza referente aos escores que descrevem o efeito da GEI ou efeitos de Genótipo (G) + GEI, plotados. A proposta aqui apresentada busca descrever formas gerais de como a heterogeneidade das covariâncias genéticas e residuais podem ser modeladas na perspectiva da análise fatorial em modelos mistos, utilizando-se decomposição espectral dos efeitos genéticos dentro da ótica Bayesiana, diferente de outros procedimentos presentes na literatura, em que as cargas fatoriais são amostradas diretamente. Além disso, objetivou-se desenvolver um procedimento para incorporar inferência ao biplot, por meio da construção de regiões de credibilidade para os escores genotípicos e ambientais. Neste estudo, foram assumidas distribuições esféricas como prioris para autovetores e normais truncadas para valores singulares, além de inversas qui-quadrados escaladas para as variâncias residuais e priori não informativa para o efeito de genótipos. Essa abordagem difere dos métodos bayesianos apresentados até o momento que assumem as mesmas restrições presentes no modelo para efeitos mistos. Para exemplificar o método proposto foram usados dados simulados e dados reais cuja variável em estudo é a produtividade de espigas em t.ha-1. As amostras para o processo de inferência foram obtidas diretamente, utilizando o amostrador de Gibbs. Realizaram-se desbalanceamentos aleatórios nos dados considerando níveis de 10%, 33% e 50% de perdas do genótipo no ambiente. De acordo com os resultados, a análise FA com duas cargas apresentou maior capacidade preditiva em relação aos modelos em competição. Desbalanceamentos de 10% e 33% apresentaram valores médios de correlação acima de 0,40 e com 50%, de 0,46. Observou-se também que o desempenho do modelo foi melhor na ordem de 50%, 33% e 10% de desbalanceamento. Verificou-se, ainda, que a análise com o modelo FA bayesiano é robusta sob grandes níveis de desbalanceamento dos dados. Um detalhe relevante nesse estudo diz respeito à seleção de modelos, que, no caso de dados reais, mostrou não ser uma tarefa trivial, necessitando de critérios adicionais. Além disso, o modelo proposto neste trabalho mostrou maior capacidade preditiva que o modelo frequentista equivalente e os parâmetros foram estimados adequadamente, sendo identificável, sem necessidade de rotacionalidade das cargas fatoriais ou de imposição de restrições, o que representa uma grande vantagem do método aqui proposto.Universidade Federal de LavrasPrograma de Pós-Graduação em Estatística e Experimentação AgropecuáriaUFLAbrasilDepartamento de Ciências ExatasBalestre, MárcioLima, Renato Ribeiro deBueno Filho, Júlio Sílvio de SouzaSafadi, ThelmaLima, Renato Ribeiro deToledo, Fernando H. R. BarrozoSilva, Alessandra Querino daNuvunga, Joel Jorge2017-02-16T11:34:45Z2017-02-16T11:34:45Z2017-02-152017-01-19info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfNUVUNGA, J. J. Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente. 2017. 129 p. Tese (Doutorado em Estatística e Experimentação Agropecuária)-Universidade Federal de Lavras, Lavras, 2017.http://repositorio.ufla.br/jspui/handle/1/12273porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFLAinstname:Universidade Federal de Lavras (UFLA)instacron:UFLA2023-05-11T15:47:42Zoai:localhost:1/12273Repositório InstitucionalPUBhttp://repositorio.ufla.br/oai/requestnivaldo@ufla.br || repositorio.biblioteca@ufla.bropendoar:2023-05-11T15:47:42Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)false
dc.title.none.fl_str_mv Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente
Bayesian factor analytic model applied to multi-environment trials
title Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente
spellingShingle Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente
Nuvunga, Joel Jorge
Plantas – Melhoramento genético – Métodos estatísticos
Interação genótipo-ambiente
Modelo fatorial analítico
Teoria bayesiana de decisão estatística
Plant breeding – Statistical methods
Genotype-environment interaction
Factor-analytic model
Bayesian statistical decision theory
Estatística
Genética Vegetal
title_short Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente
title_full Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente
title_fullStr Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente
title_full_unstemmed Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente
title_sort Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente
author Nuvunga, Joel Jorge
author_facet Nuvunga, Joel Jorge
author_role author
dc.contributor.none.fl_str_mv Balestre, Márcio
Lima, Renato Ribeiro de
Bueno Filho, Júlio Sílvio de Souza
Safadi, Thelma
Lima, Renato Ribeiro de
Toledo, Fernando H. R. Barrozo
Silva, Alessandra Querino da
dc.contributor.author.fl_str_mv Nuvunga, Joel Jorge
dc.subject.por.fl_str_mv Plantas – Melhoramento genético – Métodos estatísticos
Interação genótipo-ambiente
Modelo fatorial analítico
Teoria bayesiana de decisão estatística
Plant breeding – Statistical methods
Genotype-environment interaction
Factor-analytic model
Bayesian statistical decision theory
Estatística
Genética Vegetal
topic Plantas – Melhoramento genético – Métodos estatísticos
Interação genótipo-ambiente
Modelo fatorial analítico
Teoria bayesiana de decisão estatística
Plant breeding – Statistical methods
Genotype-environment interaction
Factor-analytic model
Bayesian statistical decision theory
Estatística
Genética Vegetal
description One of the main challenges in plant breeding programs is the efficient study of the genotypes x environments interaction (GEI). The presence of significant GE interaction hinders the work of the breeder for the recommendation and selection of superior genotypes. Among the various statistical procedures developed for this purpose, special emphasis should be given to those based on mixed models through factor analysis, commonly referred to as factor-analytic model (FA). This consists of a parsimonious approach and presents suggestive advantages when compared with classical methodologies, such as the great flexibility to deal with unbalanced data and heterogeneous variances. However, some problems are related to the factor analytic model: computational cost in analyzes with large number of environments and the Heywood cases, which makes the model unidentifiable. Moreover, the model representation in conventional biplot does not include any measure of uncertainty regarding the scores that describe GEI effect or Genotype (G) + GEI effects, plotted. The present proposal seeks to describe general forms of how heterogeneity of genetic and residual covariance can be modeled from the perspective of factorial analysis in mixed models, using spectral decomposition of the genetic effects within Bayesian approach, different from other procedures present in the literature in which the factor loads are directly sampled. In addition, the objective was to develop a procedure to incorporate inference to the biplot, through the construction of regions of credibility for genotypic and environmental scores. In this study, spherical distributions were assumed as prioris for eigenvectors and truncated normal distribution for singular values, as well as scaled inverse chi-squared distribution for residual variances and non-informative priori for the effect of genotypes. This approach differs from the Bayesian methods presented so far that assume the same constraints present in the mixed effects model. To exemplify the proposed method, we used simulated data and real data which study variable is the yield of spikes in t.ha-1. Samples for the inference process were obtained directly using the Gibbs sampler. Random unbalancing was performed in the data considering levels of 10%, 33% and 50% of losses of the genotype in the environment. According to the results, the FA analysis with two loads presented higher predictive capacity than the competing models. Unbalancing of 10% and 33% had mean values of correlation above 0.40 and with 50%, of 0.46. It was also observed that the performance of the model was better in the order of 50%, 33% and 10% of imbalance. We also verified that the analysis with the Bayesian FA model is robust under large levels of data unbalance. A relevant detail in this study concerns the selection of models, which proved not to be a trivial task in the case of real data, requiring additional criteria. In addition, the model proposed in this work showed greater predictive capacity than the equivalent frequentist model and the parameters were adequately estimated, being identifiable, without the need for rotationality of factor loads or imposition of restrictions, which represents a great advantage of the method proposed here.
publishDate 2017
dc.date.none.fl_str_mv 2017-02-16T11:34:45Z
2017-02-16T11:34:45Z
2017-02-15
2017-01-19
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv NUVUNGA, J. J. Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente. 2017. 129 p. Tese (Doutorado em Estatística e Experimentação Agropecuária)-Universidade Federal de Lavras, Lavras, 2017.
http://repositorio.ufla.br/jspui/handle/1/12273
identifier_str_mv NUVUNGA, J. J. Modelo fatorial analítico bayesiano aplicado à experimentos multi-ambiente. 2017. 129 p. Tese (Doutorado em Estatística e Experimentação Agropecuária)-Universidade Federal de Lavras, Lavras, 2017.
url http://repositorio.ufla.br/jspui/handle/1/12273
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Lavras
Programa de Pós-Graduação em Estatística e Experimentação Agropecuária
UFLA
brasil
Departamento de Ciências Exatas
publisher.none.fl_str_mv Universidade Federal de Lavras
Programa de Pós-Graduação em Estatística e Experimentação Agropecuária
UFLA
brasil
Departamento de Ciências Exatas
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFLA
instname:Universidade Federal de Lavras (UFLA)
instacron:UFLA
instname_str Universidade Federal de Lavras (UFLA)
instacron_str UFLA
institution UFLA
reponame_str Repositório Institucional da UFLA
collection Repositório Institucional da UFLA
repository.name.fl_str_mv Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv nivaldo@ufla.br || repositorio.biblioteca@ufla.br
_version_ 1807835197962452992