Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas

Detalhes bibliográficos
Autor(a) principal: Olivoto, Tiago
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Manancial - Repositório Digital da UFSM
dARK ID: ark:/26339/001300000f265
Texto Completo: http://repositorio.ufsm.br/handle/1/18161
Resumo: Some data arrangement methods currently used may overestimate Pearson correlation coefficient (r) among explanatory traits, increasing multicollinearity in analysis that uses multiple regression. In this sense, the aims of the present research were to reveal the impact of different data arrangement scenarios on the multicollinearity of matrices, on the efficiency of the used methods to adjust it, on the estimates of coefficients and accuracy of the path analysis, as well as to use simulations to reveal the statistical behavior of the r and the optimal sample size for estimating r between maize traits. For this, data from an experiment conducted in a randomized complete design in a 15 × 3 factorial scheme (15 maize hybrids × three growing sites), arranged in four replicates were used. The traits analyzed in five plants of each plot were: plant height, ear insertion height, diameter and length of ear, number of rows per ear, number of kernels per row, diameter and length of cob, cob diameter/ear diameter ratio, number of kernels per ear, kernel mass per ear and thousand-kernel weight. At first, three path analysis methods (traditional, with k inclusion and with the exclusion of traits) having as a dependent trait the kernel mass per ear were tested in two scenarios: 1) with the linear correlation matrix (X’X) between the traits estimated with all sampled observations, n = 900 and 2) with the X’X matrix estimated with the average value of the five sampled plants in each plot, n = 180. Subsequently, aiming to evaluate the statistical behavior of r, in addition to the two described scenarios, the average value of treatments at each site, n = 45, was also considered. In each scenario, 60 sample sizes were simulated by using bootstrap simulations with replacement. Confidence intervals for combinations of different magnitudes were estimated in each scenario and sample size. One hundred and eighty correlation matrices (three scenarios × 60 sample sizes) were estimated and the multicollinearity evaluated. The number of kernels per ear and the thousand-kernel weight presented the most expressive direct effects to kernel mass per ear (r = 0.892 and r = 0.733, respectively). The use of average values reduces the individual variance of a set of n-traits, overestimates the magnitude of the r between the trait pairs, increases the multicollinearity of the matrix, and reduces the effectiveness of the used methods to adjust it as well as the accuracy of the path coefficient estimates. The number of plants required to estimate correlation coefficients with a 95% bootstrap confidence interval is greater when all sampled observations are used and increases in the sense of combination pairs with lower magnitude. By using all sampled observations, 210 plants are sufficient to estimate r between traits of simple maize hybrids in the 95% bootstrap confidence interval < 0.30. A simple method that reduces the multicollinearity of matrices and improves the accuracy of path analysis is proposed.
id UFSM_07e1c02c6471e557b7961497a912b23f
oai_identifier_str oai:repositorio.ufsm.br:1/18161
network_acronym_str UFSM
network_name_str Manancial - Repositório Digital da UFSM
repository_id_str
spelling Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantasBias associated with data arrangement and sample size and its implications on the accuracy of indirect selection in plant breedingZea mays L.Coeficiente de correlaçãoMulticolinearidadeSimulaçõesCorrelation coefficientMulticollinearitySimulationsCNPQ::CIENCIAS AGRARIAS::AGRONOMIASome data arrangement methods currently used may overestimate Pearson correlation coefficient (r) among explanatory traits, increasing multicollinearity in analysis that uses multiple regression. In this sense, the aims of the present research were to reveal the impact of different data arrangement scenarios on the multicollinearity of matrices, on the efficiency of the used methods to adjust it, on the estimates of coefficients and accuracy of the path analysis, as well as to use simulations to reveal the statistical behavior of the r and the optimal sample size for estimating r between maize traits. For this, data from an experiment conducted in a randomized complete design in a 15 × 3 factorial scheme (15 maize hybrids × three growing sites), arranged in four replicates were used. The traits analyzed in five plants of each plot were: plant height, ear insertion height, diameter and length of ear, number of rows per ear, number of kernels per row, diameter and length of cob, cob diameter/ear diameter ratio, number of kernels per ear, kernel mass per ear and thousand-kernel weight. At first, three path analysis methods (traditional, with k inclusion and with the exclusion of traits) having as a dependent trait the kernel mass per ear were tested in two scenarios: 1) with the linear correlation matrix (X’X) between the traits estimated with all sampled observations, n = 900 and 2) with the X’X matrix estimated with the average value of the five sampled plants in each plot, n = 180. Subsequently, aiming to evaluate the statistical behavior of r, in addition to the two described scenarios, the average value of treatments at each site, n = 45, was also considered. In each scenario, 60 sample sizes were simulated by using bootstrap simulations with replacement. Confidence intervals for combinations of different magnitudes were estimated in each scenario and sample size. One hundred and eighty correlation matrices (three scenarios × 60 sample sizes) were estimated and the multicollinearity evaluated. The number of kernels per ear and the thousand-kernel weight presented the most expressive direct effects to kernel mass per ear (r = 0.892 and r = 0.733, respectively). The use of average values reduces the individual variance of a set of n-traits, overestimates the magnitude of the r between the trait pairs, increases the multicollinearity of the matrix, and reduces the effectiveness of the used methods to adjust it as well as the accuracy of the path coefficient estimates. The number of plants required to estimate correlation coefficients with a 95% bootstrap confidence interval is greater when all sampled observations are used and increases in the sense of combination pairs with lower magnitude. By using all sampled observations, 210 plants are sufficient to estimate r between traits of simple maize hybrids in the 95% bootstrap confidence interval < 0.30. A simple method that reduces the multicollinearity of matrices and improves the accuracy of path analysis is proposed.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESAlguns métodos de arranjo de dados utilizados atualmente podem superestimar os coeficientes de correlação de Pearson (r) entre variáveis explicativas, aumentando a multicolinearidade em análises que utilizam regressão múltipla. Neste sentido, os objetivos da presente pesquisa foram revelar o impacto de diferentes cenários de arranjos de dados na multicolinearidade de matrizes, na eficiência dos métodos utilizados para ajusta-la, nas estimativas dos coeficientes e acurácia da análise de trilha, bem como fazer uso de simulações para revelar o comportamento estatístico do r e o tamanho amostral ótimo para estimativas de r entre caracteres do milho. Para isto, foram utilizados dados de um experimento conduzido em delineamento de blocos completos casualizados em esquema fatorial 15 × 3 (15 híbridos simples de milho e três locais), dispostos em quatro repetições. As variáveis analisadas em cinco plantas de cada parcela foram: altura de planta, altura de inserção da espiga, diâmetro e comprimento da espiga, número de fileiras de grãos por espiga, número de grãos por fileira, diâmetro e comprimento do sabugo, relação diâmetro do sabugo/diâmetro da espiga, número de grãos por espiga, massa de grãos por espiga e massa de mil grãos. Em um primeiro momento, três métodos de análise de trilha (tradicional, com inclusão de k e com exclusão de variáveis) tendo como variável dependente a massa de grãos por espiga, foram testados em dois cenários: 1) com a matriz de correlação linear (X’X) entre as variáveis estimada com todas as observações amostradas, n = 900 e 2) com a matriz X’X estimada com o valor médio das cinco plantas amostradas em cada parcela, n = 180. Posteriormente, visando avaliar o comportamento estatístico do r, além dos dois cenários descritos, o valor médio dos tratamentos em cada local, n = 45, também foi considerado. Em cada cenário foram simulados 60 tamanhos amostrais utilizando simulações bootstrap com reposição. Intervalos de confiança para combinações de diferentes magnitudes foram estimados em cada cenário e tamanho amostral. Cento e oitenta matrizes de correlação (três cenários × 60 tamanhos amostrais) foram estimadas e a multicolinearidade avaliada. O número de grãos por espiga e a massa de mil grãos apresentam os efeitos diretos mais expressivos sob a massa de grãos por espiga (r = 0,892 e r = 0,733, respectivamente). A utilização de valores oriundos de médias reduz a variância individual de um conjunto de n-variáveis, superestima a magnitude do r entre os pares de combinação, aumenta a multicolinearidade da matriz e reduz a eficiência dos métodos utilizados para ajustá-la, bem como a acurácia das estimativas dos coeficientes de trilha. O número de plantas necessário para estimativa de coeficientes de correlação com intervalo de confiança bootstrap de 95% é maior quando todas as observações da amostra são utilizadas e aumenta no sentido de pares de combinação com menor magnitude. Utilizando todas as observações amostradas, 210 plantas são suficientes para estimativa do r entre caracteres de híbridos simples de milho, no intervalo de confiança “bootstrap” de 95% < 0,30. Um método simples para reduzir a multicolinearidade das matrizes e melhorar a acurácia da análise de trilha é proposto.Universidade Federal de Santa MariaBrasilAgronomiaUFSMPrograma de Pós-Graduação em Agronomia - Agricultura e AmbienteUFSM Frederico WestphalenSouza, Velci Queiróz dehttp://lattes.cnpq.br/6515305945460230Marchioro, Volmir Sergiohttp://lattes.cnpq.br/3744130894870798Pinheiro, Marcos Vinícius Marqueshttp://lattes.cnpq.br/2241316326554301Olivoto, Tiago2019-09-06T17:47:29Z2019-09-06T17:47:29Z2017-02-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://repositorio.ufsm.br/handle/1/18161ark:/26339/001300000f265porAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessreponame:Manancial - Repositório Digital da UFSMinstname:Universidade Federal de Santa Maria (UFSM)instacron:UFSM2019-09-07T06:00:29Zoai:repositorio.ufsm.br:1/18161Biblioteca Digital de Teses e Dissertaçõeshttps://repositorio.ufsm.br/ONGhttps://repositorio.ufsm.br/oai/requestatendimento.sib@ufsm.br||tedebc@gmail.comopendoar:2019-09-07T06:00:29Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)false
dc.title.none.fl_str_mv Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas
Bias associated with data arrangement and sample size and its implications on the accuracy of indirect selection in plant breeding
title Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas
spellingShingle Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas
Olivoto, Tiago
Zea mays L.
Coeficiente de correlação
Multicolinearidade
Simulações
Correlation coefficient
Multicollinearity
Simulations
CNPQ::CIENCIAS AGRARIAS::AGRONOMIA
title_short Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas
title_full Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas
title_fullStr Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas
title_full_unstemmed Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas
title_sort Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas
author Olivoto, Tiago
author_facet Olivoto, Tiago
author_role author
dc.contributor.none.fl_str_mv Souza, Velci Queiróz de
http://lattes.cnpq.br/6515305945460230
Marchioro, Volmir Sergio
http://lattes.cnpq.br/3744130894870798
Pinheiro, Marcos Vinícius Marques
http://lattes.cnpq.br/2241316326554301
dc.contributor.author.fl_str_mv Olivoto, Tiago
dc.subject.por.fl_str_mv Zea mays L.
Coeficiente de correlação
Multicolinearidade
Simulações
Correlation coefficient
Multicollinearity
Simulations
CNPQ::CIENCIAS AGRARIAS::AGRONOMIA
topic Zea mays L.
Coeficiente de correlação
Multicolinearidade
Simulações
Correlation coefficient
Multicollinearity
Simulations
CNPQ::CIENCIAS AGRARIAS::AGRONOMIA
description Some data arrangement methods currently used may overestimate Pearson correlation coefficient (r) among explanatory traits, increasing multicollinearity in analysis that uses multiple regression. In this sense, the aims of the present research were to reveal the impact of different data arrangement scenarios on the multicollinearity of matrices, on the efficiency of the used methods to adjust it, on the estimates of coefficients and accuracy of the path analysis, as well as to use simulations to reveal the statistical behavior of the r and the optimal sample size for estimating r between maize traits. For this, data from an experiment conducted in a randomized complete design in a 15 × 3 factorial scheme (15 maize hybrids × three growing sites), arranged in four replicates were used. The traits analyzed in five plants of each plot were: plant height, ear insertion height, diameter and length of ear, number of rows per ear, number of kernels per row, diameter and length of cob, cob diameter/ear diameter ratio, number of kernels per ear, kernel mass per ear and thousand-kernel weight. At first, three path analysis methods (traditional, with k inclusion and with the exclusion of traits) having as a dependent trait the kernel mass per ear were tested in two scenarios: 1) with the linear correlation matrix (X’X) between the traits estimated with all sampled observations, n = 900 and 2) with the X’X matrix estimated with the average value of the five sampled plants in each plot, n = 180. Subsequently, aiming to evaluate the statistical behavior of r, in addition to the two described scenarios, the average value of treatments at each site, n = 45, was also considered. In each scenario, 60 sample sizes were simulated by using bootstrap simulations with replacement. Confidence intervals for combinations of different magnitudes were estimated in each scenario and sample size. One hundred and eighty correlation matrices (three scenarios × 60 sample sizes) were estimated and the multicollinearity evaluated. The number of kernels per ear and the thousand-kernel weight presented the most expressive direct effects to kernel mass per ear (r = 0.892 and r = 0.733, respectively). The use of average values reduces the individual variance of a set of n-traits, overestimates the magnitude of the r between the trait pairs, increases the multicollinearity of the matrix, and reduces the effectiveness of the used methods to adjust it as well as the accuracy of the path coefficient estimates. The number of plants required to estimate correlation coefficients with a 95% bootstrap confidence interval is greater when all sampled observations are used and increases in the sense of combination pairs with lower magnitude. By using all sampled observations, 210 plants are sufficient to estimate r between traits of simple maize hybrids in the 95% bootstrap confidence interval < 0.30. A simple method that reduces the multicollinearity of matrices and improves the accuracy of path analysis is proposed.
publishDate 2017
dc.date.none.fl_str_mv 2017-02-20
2019-09-06T17:47:29Z
2019-09-06T17:47:29Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://repositorio.ufsm.br/handle/1/18161
dc.identifier.dark.fl_str_mv ark:/26339/001300000f265
url http://repositorio.ufsm.br/handle/1/18161
identifier_str_mv ark:/26339/001300000f265
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivatives 4.0 International
http://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivatives 4.0 International
http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Santa Maria
Brasil
Agronomia
UFSM
Programa de Pós-Graduação em Agronomia - Agricultura e Ambiente
UFSM Frederico Westphalen
publisher.none.fl_str_mv Universidade Federal de Santa Maria
Brasil
Agronomia
UFSM
Programa de Pós-Graduação em Agronomia - Agricultura e Ambiente
UFSM Frederico Westphalen
dc.source.none.fl_str_mv reponame:Manancial - Repositório Digital da UFSM
instname:Universidade Federal de Santa Maria (UFSM)
instacron:UFSM
instname_str Universidade Federal de Santa Maria (UFSM)
instacron_str UFSM
institution UFSM
reponame_str Manancial - Repositório Digital da UFSM
collection Manancial - Repositório Digital da UFSM
repository.name.fl_str_mv Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)
repository.mail.fl_str_mv atendimento.sib@ufsm.br||tedebc@gmail.com
_version_ 1815172330819682304