Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Manancial - Repositório Digital da UFSM |
dARK ID: | ark:/26339/001300000f265 |
Texto Completo: | http://repositorio.ufsm.br/handle/1/18161 |
Resumo: | Some data arrangement methods currently used may overestimate Pearson correlation coefficient (r) among explanatory traits, increasing multicollinearity in analysis that uses multiple regression. In this sense, the aims of the present research were to reveal the impact of different data arrangement scenarios on the multicollinearity of matrices, on the efficiency of the used methods to adjust it, on the estimates of coefficients and accuracy of the path analysis, as well as to use simulations to reveal the statistical behavior of the r and the optimal sample size for estimating r between maize traits. For this, data from an experiment conducted in a randomized complete design in a 15 × 3 factorial scheme (15 maize hybrids × three growing sites), arranged in four replicates were used. The traits analyzed in five plants of each plot were: plant height, ear insertion height, diameter and length of ear, number of rows per ear, number of kernels per row, diameter and length of cob, cob diameter/ear diameter ratio, number of kernels per ear, kernel mass per ear and thousand-kernel weight. At first, three path analysis methods (traditional, with k inclusion and with the exclusion of traits) having as a dependent trait the kernel mass per ear were tested in two scenarios: 1) with the linear correlation matrix (X’X) between the traits estimated with all sampled observations, n = 900 and 2) with the X’X matrix estimated with the average value of the five sampled plants in each plot, n = 180. Subsequently, aiming to evaluate the statistical behavior of r, in addition to the two described scenarios, the average value of treatments at each site, n = 45, was also considered. In each scenario, 60 sample sizes were simulated by using bootstrap simulations with replacement. Confidence intervals for combinations of different magnitudes were estimated in each scenario and sample size. One hundred and eighty correlation matrices (three scenarios × 60 sample sizes) were estimated and the multicollinearity evaluated. The number of kernels per ear and the thousand-kernel weight presented the most expressive direct effects to kernel mass per ear (r = 0.892 and r = 0.733, respectively). The use of average values reduces the individual variance of a set of n-traits, overestimates the magnitude of the r between the trait pairs, increases the multicollinearity of the matrix, and reduces the effectiveness of the used methods to adjust it as well as the accuracy of the path coefficient estimates. The number of plants required to estimate correlation coefficients with a 95% bootstrap confidence interval is greater when all sampled observations are used and increases in the sense of combination pairs with lower magnitude. By using all sampled observations, 210 plants are sufficient to estimate r between traits of simple maize hybrids in the 95% bootstrap confidence interval < 0.30. A simple method that reduces the multicollinearity of matrices and improves the accuracy of path analysis is proposed. |
id |
UFSM_07e1c02c6471e557b7961497a912b23f |
---|---|
oai_identifier_str |
oai:repositorio.ufsm.br:1/18161 |
network_acronym_str |
UFSM |
network_name_str |
Manancial - Repositório Digital da UFSM |
repository_id_str |
|
spelling |
Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantasBias associated with data arrangement and sample size and its implications on the accuracy of indirect selection in plant breedingZea mays L.Coeficiente de correlaçãoMulticolinearidadeSimulaçõesCorrelation coefficientMulticollinearitySimulationsCNPQ::CIENCIAS AGRARIAS::AGRONOMIASome data arrangement methods currently used may overestimate Pearson correlation coefficient (r) among explanatory traits, increasing multicollinearity in analysis that uses multiple regression. In this sense, the aims of the present research were to reveal the impact of different data arrangement scenarios on the multicollinearity of matrices, on the efficiency of the used methods to adjust it, on the estimates of coefficients and accuracy of the path analysis, as well as to use simulations to reveal the statistical behavior of the r and the optimal sample size for estimating r between maize traits. For this, data from an experiment conducted in a randomized complete design in a 15 × 3 factorial scheme (15 maize hybrids × three growing sites), arranged in four replicates were used. The traits analyzed in five plants of each plot were: plant height, ear insertion height, diameter and length of ear, number of rows per ear, number of kernels per row, diameter and length of cob, cob diameter/ear diameter ratio, number of kernels per ear, kernel mass per ear and thousand-kernel weight. At first, three path analysis methods (traditional, with k inclusion and with the exclusion of traits) having as a dependent trait the kernel mass per ear were tested in two scenarios: 1) with the linear correlation matrix (X’X) between the traits estimated with all sampled observations, n = 900 and 2) with the X’X matrix estimated with the average value of the five sampled plants in each plot, n = 180. Subsequently, aiming to evaluate the statistical behavior of r, in addition to the two described scenarios, the average value of treatments at each site, n = 45, was also considered. In each scenario, 60 sample sizes were simulated by using bootstrap simulations with replacement. Confidence intervals for combinations of different magnitudes were estimated in each scenario and sample size. One hundred and eighty correlation matrices (three scenarios × 60 sample sizes) were estimated and the multicollinearity evaluated. The number of kernels per ear and the thousand-kernel weight presented the most expressive direct effects to kernel mass per ear (r = 0.892 and r = 0.733, respectively). The use of average values reduces the individual variance of a set of n-traits, overestimates the magnitude of the r between the trait pairs, increases the multicollinearity of the matrix, and reduces the effectiveness of the used methods to adjust it as well as the accuracy of the path coefficient estimates. The number of plants required to estimate correlation coefficients with a 95% bootstrap confidence interval is greater when all sampled observations are used and increases in the sense of combination pairs with lower magnitude. By using all sampled observations, 210 plants are sufficient to estimate r between traits of simple maize hybrids in the 95% bootstrap confidence interval < 0.30. A simple method that reduces the multicollinearity of matrices and improves the accuracy of path analysis is proposed.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESAlguns métodos de arranjo de dados utilizados atualmente podem superestimar os coeficientes de correlação de Pearson (r) entre variáveis explicativas, aumentando a multicolinearidade em análises que utilizam regressão múltipla. Neste sentido, os objetivos da presente pesquisa foram revelar o impacto de diferentes cenários de arranjos de dados na multicolinearidade de matrizes, na eficiência dos métodos utilizados para ajusta-la, nas estimativas dos coeficientes e acurácia da análise de trilha, bem como fazer uso de simulações para revelar o comportamento estatístico do r e o tamanho amostral ótimo para estimativas de r entre caracteres do milho. Para isto, foram utilizados dados de um experimento conduzido em delineamento de blocos completos casualizados em esquema fatorial 15 × 3 (15 híbridos simples de milho e três locais), dispostos em quatro repetições. As variáveis analisadas em cinco plantas de cada parcela foram: altura de planta, altura de inserção da espiga, diâmetro e comprimento da espiga, número de fileiras de grãos por espiga, número de grãos por fileira, diâmetro e comprimento do sabugo, relação diâmetro do sabugo/diâmetro da espiga, número de grãos por espiga, massa de grãos por espiga e massa de mil grãos. Em um primeiro momento, três métodos de análise de trilha (tradicional, com inclusão de k e com exclusão de variáveis) tendo como variável dependente a massa de grãos por espiga, foram testados em dois cenários: 1) com a matriz de correlação linear (X’X) entre as variáveis estimada com todas as observações amostradas, n = 900 e 2) com a matriz X’X estimada com o valor médio das cinco plantas amostradas em cada parcela, n = 180. Posteriormente, visando avaliar o comportamento estatístico do r, além dos dois cenários descritos, o valor médio dos tratamentos em cada local, n = 45, também foi considerado. Em cada cenário foram simulados 60 tamanhos amostrais utilizando simulações bootstrap com reposição. Intervalos de confiança para combinações de diferentes magnitudes foram estimados em cada cenário e tamanho amostral. Cento e oitenta matrizes de correlação (três cenários × 60 tamanhos amostrais) foram estimadas e a multicolinearidade avaliada. O número de grãos por espiga e a massa de mil grãos apresentam os efeitos diretos mais expressivos sob a massa de grãos por espiga (r = 0,892 e r = 0,733, respectivamente). A utilização de valores oriundos de médias reduz a variância individual de um conjunto de n-variáveis, superestima a magnitude do r entre os pares de combinação, aumenta a multicolinearidade da matriz e reduz a eficiência dos métodos utilizados para ajustá-la, bem como a acurácia das estimativas dos coeficientes de trilha. O número de plantas necessário para estimativa de coeficientes de correlação com intervalo de confiança bootstrap de 95% é maior quando todas as observações da amostra são utilizadas e aumenta no sentido de pares de combinação com menor magnitude. Utilizando todas as observações amostradas, 210 plantas são suficientes para estimativa do r entre caracteres de híbridos simples de milho, no intervalo de confiança “bootstrap” de 95% < 0,30. Um método simples para reduzir a multicolinearidade das matrizes e melhorar a acurácia da análise de trilha é proposto.Universidade Federal de Santa MariaBrasilAgronomiaUFSMPrograma de Pós-Graduação em Agronomia - Agricultura e AmbienteUFSM Frederico WestphalenSouza, Velci Queiróz dehttp://lattes.cnpq.br/6515305945460230Marchioro, Volmir Sergiohttp://lattes.cnpq.br/3744130894870798Pinheiro, Marcos Vinícius Marqueshttp://lattes.cnpq.br/2241316326554301Olivoto, Tiago2019-09-06T17:47:29Z2019-09-06T17:47:29Z2017-02-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://repositorio.ufsm.br/handle/1/18161ark:/26339/001300000f265porAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessreponame:Manancial - Repositório Digital da UFSMinstname:Universidade Federal de Santa Maria (UFSM)instacron:UFSM2019-09-07T06:00:29Zoai:repositorio.ufsm.br:1/18161Biblioteca Digital de Teses e Dissertaçõeshttps://repositorio.ufsm.br/ONGhttps://repositorio.ufsm.br/oai/requestatendimento.sib@ufsm.br||tedebc@gmail.comopendoar:2019-09-07T06:00:29Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)false |
dc.title.none.fl_str_mv |
Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas Bias associated with data arrangement and sample size and its implications on the accuracy of indirect selection in plant breeding |
title |
Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas |
spellingShingle |
Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas Olivoto, Tiago Zea mays L. Coeficiente de correlação Multicolinearidade Simulações Correlation coefficient Multicollinearity Simulations CNPQ::CIENCIAS AGRARIAS::AGRONOMIA |
title_short |
Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas |
title_full |
Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas |
title_fullStr |
Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas |
title_full_unstemmed |
Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas |
title_sort |
Viés associado ao arranjo de dados e tamanho amostral e suas implicações na acurácia da seleção indireta no melhoramento de plantas |
author |
Olivoto, Tiago |
author_facet |
Olivoto, Tiago |
author_role |
author |
dc.contributor.none.fl_str_mv |
Souza, Velci Queiróz de http://lattes.cnpq.br/6515305945460230 Marchioro, Volmir Sergio http://lattes.cnpq.br/3744130894870798 Pinheiro, Marcos Vinícius Marques http://lattes.cnpq.br/2241316326554301 |
dc.contributor.author.fl_str_mv |
Olivoto, Tiago |
dc.subject.por.fl_str_mv |
Zea mays L. Coeficiente de correlação Multicolinearidade Simulações Correlation coefficient Multicollinearity Simulations CNPQ::CIENCIAS AGRARIAS::AGRONOMIA |
topic |
Zea mays L. Coeficiente de correlação Multicolinearidade Simulações Correlation coefficient Multicollinearity Simulations CNPQ::CIENCIAS AGRARIAS::AGRONOMIA |
description |
Some data arrangement methods currently used may overestimate Pearson correlation coefficient (r) among explanatory traits, increasing multicollinearity in analysis that uses multiple regression. In this sense, the aims of the present research were to reveal the impact of different data arrangement scenarios on the multicollinearity of matrices, on the efficiency of the used methods to adjust it, on the estimates of coefficients and accuracy of the path analysis, as well as to use simulations to reveal the statistical behavior of the r and the optimal sample size for estimating r between maize traits. For this, data from an experiment conducted in a randomized complete design in a 15 × 3 factorial scheme (15 maize hybrids × three growing sites), arranged in four replicates were used. The traits analyzed in five plants of each plot were: plant height, ear insertion height, diameter and length of ear, number of rows per ear, number of kernels per row, diameter and length of cob, cob diameter/ear diameter ratio, number of kernels per ear, kernel mass per ear and thousand-kernel weight. At first, three path analysis methods (traditional, with k inclusion and with the exclusion of traits) having as a dependent trait the kernel mass per ear were tested in two scenarios: 1) with the linear correlation matrix (X’X) between the traits estimated with all sampled observations, n = 900 and 2) with the X’X matrix estimated with the average value of the five sampled plants in each plot, n = 180. Subsequently, aiming to evaluate the statistical behavior of r, in addition to the two described scenarios, the average value of treatments at each site, n = 45, was also considered. In each scenario, 60 sample sizes were simulated by using bootstrap simulations with replacement. Confidence intervals for combinations of different magnitudes were estimated in each scenario and sample size. One hundred and eighty correlation matrices (three scenarios × 60 sample sizes) were estimated and the multicollinearity evaluated. The number of kernels per ear and the thousand-kernel weight presented the most expressive direct effects to kernel mass per ear (r = 0.892 and r = 0.733, respectively). The use of average values reduces the individual variance of a set of n-traits, overestimates the magnitude of the r between the trait pairs, increases the multicollinearity of the matrix, and reduces the effectiveness of the used methods to adjust it as well as the accuracy of the path coefficient estimates. The number of plants required to estimate correlation coefficients with a 95% bootstrap confidence interval is greater when all sampled observations are used and increases in the sense of combination pairs with lower magnitude. By using all sampled observations, 210 plants are sufficient to estimate r between traits of simple maize hybrids in the 95% bootstrap confidence interval < 0.30. A simple method that reduces the multicollinearity of matrices and improves the accuracy of path analysis is proposed. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-02-20 2019-09-06T17:47:29Z 2019-09-06T17:47:29Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://repositorio.ufsm.br/handle/1/18161 |
dc.identifier.dark.fl_str_mv |
ark:/26339/001300000f265 |
url |
http://repositorio.ufsm.br/handle/1/18161 |
identifier_str_mv |
ark:/26339/001300000f265 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade Federal de Santa Maria Brasil Agronomia UFSM Programa de Pós-Graduação em Agronomia - Agricultura e Ambiente UFSM Frederico Westphalen |
publisher.none.fl_str_mv |
Universidade Federal de Santa Maria Brasil Agronomia UFSM Programa de Pós-Graduação em Agronomia - Agricultura e Ambiente UFSM Frederico Westphalen |
dc.source.none.fl_str_mv |
reponame:Manancial - Repositório Digital da UFSM instname:Universidade Federal de Santa Maria (UFSM) instacron:UFSM |
instname_str |
Universidade Federal de Santa Maria (UFSM) |
instacron_str |
UFSM |
institution |
UFSM |
reponame_str |
Manancial - Repositório Digital da UFSM |
collection |
Manancial - Repositório Digital da UFSM |
repository.name.fl_str_mv |
Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM) |
repository.mail.fl_str_mv |
atendimento.sib@ufsm.br||tedebc@gmail.com |
_version_ |
1815172330819682304 |