Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja

Souza, Rafael Rodrigues de

Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja

Detalhes bibliográficos
Autor(a) principal:	Souza, Rafael Rodrigues de
Data de Publicação:	2024
Tipo de documento:	Tese
Idioma:	por
Título da fonte:	Manancial - Repositório Digital da UFSM
dARK ID:	ark:/26339/001300000n39x
Texto Completo:	http://repositorio.ufsm.br/handle/1/32041
Resumo:	Research on soybean genetic divergence is intensively based on indirect methodologies that use phenotypic characters. Principal components, canonical variables, and hierarchical analyses are among the main applied methodologies. Although these tools possess wide applicability, it is important to highlight that their use does not always include a representative sample basement. In other words, there is a lack of previous sampling definition, so that, a lot of times, empirical decisions are taken. In this sense, the present study aims to analyze the response of genetic divergence techniques to the variation in the number of sampled plants; to define a reference sample size for principal component techniques, canonical variables, and grouping techniques in soybean; and, to propose new robust approaches to define sample size. Therefore, field trials were conducted during the 2017/2018 growing season, in two locations of Rio Grande do Sul and on three sowing dates, totaling six experiments. The experimental units were composed of five rows, with three meters in length, spaced by 0.45 meters. A completely randomized block design was used to evaluate 20 soybean cultivars, with three repetitions in each experiment. During grain maturation, ten morphological characters were assessed in 20 plants per experimental unit, totaling 7,200 individually measured plants. Next, simulations with reposition were performed (bootstrap resampling) in sampling scenarios varying from 1 to 100 plants per experimental unit to evaluate the eigenvalues of the principal components, the canonical components of the canonical variables, and the coefficient of cophenetic correlation deriving from the combination of nine dissimilarity measures and seven grouping methods. These bootstrap simulations were carried out individually for the six experiments, followed by a joint analysis of the experiments. Regarding the sample dimensioning for the principal component technique, the method of error as a percentage of the average was used. For the second study, related to canonical variables, an approach which combined nonlinear models and a maximum curvature point was used to estimate sample size. In the third study, a methodology was developed for sample size definition, which was based on unsupervised machine learning, along with bayesian optimization, plus a modification of the maximum curvature point through perpendicular distances. An overall gradual improvement was observed in the estimate of the eigenvalues of the canonical variables and the cophenetic coefficient with an increase in the number of sampled plants. It was observed that 18 plants per experimental unit were enough to estimate the first two principal components, whereas 36 plants were necessary to estimate the canonical variables. In the hierarchical analyses, a variation in the representative sample size was verified, which was dependent on the dissimilarity measure and the grouping method used. However, it is suggested that 27 plants per experimental unit were enough for a representative sampling in hierarchical analyses. Thus, it is possible to optimize the use of the methodologies of principal components, canonical variables, and hierarchical analyses, ensuring the reliability of its results and avoiding empirical decision-making on the sampling number in soybean.

Metadados do item

id	UFSM_d1f8c4bf93d1d15785c98f4aaecd3885
oai_identifier_str	oai:repositorio.ufsm.br:1/32041
network_acronym_str	UFSM
network_name_str	Manancial - Repositório Digital da UFSM
repository_id_str
spelling	Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de sojaSample dimensioning in principal component analyses, canonical variables, and grouping in soybean cultivarsBootstrapExtreme Gradient BoostingModelingExperimental planningModelagemPlanejamento experimentalCNPQ::CIENCIAS AGRARIAS::AGRONOMIAResearch on soybean genetic divergence is intensively based on indirect methodologies that use phenotypic characters. Principal components, canonical variables, and hierarchical analyses are among the main applied methodologies. Although these tools possess wide applicability, it is important to highlight that their use does not always include a representative sample basement. In other words, there is a lack of previous sampling definition, so that, a lot of times, empirical decisions are taken. In this sense, the present study aims to analyze the response of genetic divergence techniques to the variation in the number of sampled plants; to define a reference sample size for principal component techniques, canonical variables, and grouping techniques in soybean; and, to propose new robust approaches to define sample size. Therefore, field trials were conducted during the 2017/2018 growing season, in two locations of Rio Grande do Sul and on three sowing dates, totaling six experiments. The experimental units were composed of five rows, with three meters in length, spaced by 0.45 meters. A completely randomized block design was used to evaluate 20 soybean cultivars, with three repetitions in each experiment. During grain maturation, ten morphological characters were assessed in 20 plants per experimental unit, totaling 7,200 individually measured plants. Next, simulations with reposition were performed (bootstrap resampling) in sampling scenarios varying from 1 to 100 plants per experimental unit to evaluate the eigenvalues of the principal components, the canonical components of the canonical variables, and the coefficient of cophenetic correlation deriving from the combination of nine dissimilarity measures and seven grouping methods. These bootstrap simulations were carried out individually for the six experiments, followed by a joint analysis of the experiments. Regarding the sample dimensioning for the principal component technique, the method of error as a percentage of the average was used. For the second study, related to canonical variables, an approach which combined nonlinear models and a maximum curvature point was used to estimate sample size. In the third study, a methodology was developed for sample size definition, which was based on unsupervised machine learning, along with bayesian optimization, plus a modification of the maximum curvature point through perpendicular distances. An overall gradual improvement was observed in the estimate of the eigenvalues of the canonical variables and the cophenetic coefficient with an increase in the number of sampled plants. It was observed that 18 plants per experimental unit were enough to estimate the first two principal components, whereas 36 plants were necessary to estimate the canonical variables. In the hierarchical analyses, a variation in the representative sample size was verified, which was dependent on the dissimilarity measure and the grouping method used. However, it is suggested that 27 plants per experimental unit were enough for a representative sampling in hierarchical analyses. Thus, it is possible to optimize the use of the methodologies of principal components, canonical variables, and hierarchical analyses, ensuring the reliability of its results and avoiding empirical decision-making on the sampling number in soybean.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESIntensivamente, pesquisas fundamentam-se no uso de metodologias indiretas para determinar a divergência genética por meio de caracteres fenotípicos em soja. Entre as principais metodologias, destacam-se o uso de componentes principais, variáveis canônicas e análises hierárquicas. Embora essas ferramentas apresentem ampla aplicabilidade, é importante ressaltar que seu uso nem sempre é acompanhado de um embasamento amostral representativo. Ou seja, normalmente há uma ausência de prévia definição amostral, de modo que decisões empíricas são na maioria das vezes tomadas. Neste sentido, o presente estudo tem como objetivos analisarem a resposta de técnicas de divergência genética frente às variações no número de plantas amostradas; definir um tamanho de amostra referência para técnicas componentes principais, variáveis canônicas e técnicas de agrupamento em soja; e, propor novas abordagens robustas de definição do tamanho amostral. Logo, foram conduzidos ensaios de campo durante a safra agrícola de 2017/2018, em dois locais no Rio Grande do Sul e três épocas de semeadura, totalizando seis experimentos. As unidades experimentais foram compostas por cinco fileiras, com três metros de comprimento, espaçadas em 0,45 metros. O delineamento de blocos completos ao acaso foi utilizado para avaliar 20 cultivares de soja, com três repetições em cada experimento. Durante a maturação, foram avaliadas dez características morfológicas em 20 plantas por unidade experimental, totalizando 7.200 plantas mensuradas individualmente. A seguir, realizaram-se simulações com reposição (reamostragem bootstrap) em cenários amostrais variando de 1 a 100 plantas por unidade experimental para avaliar os autovalores dos componentes principais, os componentes canônicos das variáveis canônicas e o coeficiente de correlação cofenético oriundo da combinação de nove medidas de dissimilaridade e sete métodos de agrupamento. Essas simulações bootstrap foram conduzidas individualmente para os seis experimentos, seguida por uma análise conjunta dos experimentos. No que diz respeito ao dimensionamento amostral para a técnica de componentes principais, utilizou-se o método do erro em porcentagem da média. Para o segundo estudo, relacionado às variáveis canônicas, empregou-se uma abordagem que combinou modelos não lineares e ponto de máxima curvatura para estimar o tamanho da amostra. No terceiro estudo, desenvolveu-se uma metodologia para definição amostral baseada em aprendizado de máquina não supervisionado, juntamente com otimização bayesiana, somado a uma modificação do método de máxima curvatura por meio de distâncias perpendiculares. Foi observada uma melhoria gradual na estimativa dos autovalores das variáveis canônicas e do coeficiente de correlação cofenético com o aumento do número de plantas amostradas. Constatou-se que 18 plantas por unidade experimental foram suficientes para estimar os dois primeiros componentes principais, enquanto 36 plantas foram necessárias para as variáveis canônicas. Nas análises hierárquicas, verificou-se uma variação no tamanho amostral representativo, sendo este dependente da medida de dissimilaridade e do método de agrupamento utilizado. No entanto, sugere-se que 27 plantas por unidade experimental foram suficientes para uma amostragem representativa em análises hierárquicas. Deste modo, se possibilita otimizar o uso das metodologias de componentes principais, variáveis canônicas e análises hierárquicas, assegurando a confiabilidade dos seus resultados e inibindo tomadas de decisões empíricas sobre o tamanho amostral em soja.Universidade Federal de Santa MariaBrasilAgronomiaUFSMPrograma de Pós-Graduação em AgronomiaCentro de Ciências RuraisCargnelutti Filho, Albertohttp://lattes.cnpq.br/0233728865094243Haesbaert, Fernando MachadoCarvalho, Ivan RicardoToebe, MarcosMaziero, Sandra MariaSouza, Rafael Rodrigues de2024-06-17T11:14:56Z2024-06-17T11:14:56Z2024-05-17info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://repositorio.ufsm.br/handle/1/32041ark:/26339/001300000n39xporAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessreponame:Manancial - Repositório Digital da UFSMinstname:Universidade Federal de Santa Maria (UFSM)instacron:UFSM2024-06-17T11:14:56Zoai:repositorio.ufsm.br:1/32041Biblioteca Digital de Teses e Dissertaçõeshttps://repositorio.ufsm.br/ONGhttps://repositorio.ufsm.br/oai/requestatendimento.sib@ufsm.br\|\|tedebc@gmail.comopendoar:2024-06-17T11:14:56Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)false
dc.title.none.fl_str_mv	Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja Sample dimensioning in principal component analyses, canonical variables, and grouping in soybean cultivars
title	Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja
spellingShingle	Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja Souza, Rafael Rodrigues de Bootstrap Extreme Gradient Boosting Modeling Experimental planning Modelagem Planejamento experimental CNPQ::CIENCIAS AGRARIAS::AGRONOMIA
title_short	Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja
title_full	Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja
title_fullStr	Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja
title_full_unstemmed	Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja
title_sort	Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja
author	Souza, Rafael Rodrigues de
author_facet	Souza, Rafael Rodrigues de
author_role	author
dc.contributor.none.fl_str_mv	Cargnelutti Filho, Alberto http://lattes.cnpq.br/0233728865094243 Haesbaert, Fernando Machado Carvalho, Ivan Ricardo Toebe, Marcos Maziero, Sandra Maria
dc.contributor.author.fl_str_mv	Souza, Rafael Rodrigues de
dc.subject.por.fl_str_mv	Bootstrap Extreme Gradient Boosting Modeling Experimental planning Modelagem Planejamento experimental CNPQ::CIENCIAS AGRARIAS::AGRONOMIA
topic	Bootstrap Extreme Gradient Boosting Modeling Experimental planning Modelagem Planejamento experimental CNPQ::CIENCIAS AGRARIAS::AGRONOMIA
description	Research on soybean genetic divergence is intensively based on indirect methodologies that use phenotypic characters. Principal components, canonical variables, and hierarchical analyses are among the main applied methodologies. Although these tools possess wide applicability, it is important to highlight that their use does not always include a representative sample basement. In other words, there is a lack of previous sampling definition, so that, a lot of times, empirical decisions are taken. In this sense, the present study aims to analyze the response of genetic divergence techniques to the variation in the number of sampled plants; to define a reference sample size for principal component techniques, canonical variables, and grouping techniques in soybean; and, to propose new robust approaches to define sample size. Therefore, field trials were conducted during the 2017/2018 growing season, in two locations of Rio Grande do Sul and on three sowing dates, totaling six experiments. The experimental units were composed of five rows, with three meters in length, spaced by 0.45 meters. A completely randomized block design was used to evaluate 20 soybean cultivars, with three repetitions in each experiment. During grain maturation, ten morphological characters were assessed in 20 plants per experimental unit, totaling 7,200 individually measured plants. Next, simulations with reposition were performed (bootstrap resampling) in sampling scenarios varying from 1 to 100 plants per experimental unit to evaluate the eigenvalues of the principal components, the canonical components of the canonical variables, and the coefficient of cophenetic correlation deriving from the combination of nine dissimilarity measures and seven grouping methods. These bootstrap simulations were carried out individually for the six experiments, followed by a joint analysis of the experiments. Regarding the sample dimensioning for the principal component technique, the method of error as a percentage of the average was used. For the second study, related to canonical variables, an approach which combined nonlinear models and a maximum curvature point was used to estimate sample size. In the third study, a methodology was developed for sample size definition, which was based on unsupervised machine learning, along with bayesian optimization, plus a modification of the maximum curvature point through perpendicular distances. An overall gradual improvement was observed in the estimate of the eigenvalues of the canonical variables and the cophenetic coefficient with an increase in the number of sampled plants. It was observed that 18 plants per experimental unit were enough to estimate the first two principal components, whereas 36 plants were necessary to estimate the canonical variables. In the hierarchical analyses, a variation in the representative sample size was verified, which was dependent on the dissimilarity measure and the grouping method used. However, it is suggested that 27 plants per experimental unit were enough for a representative sampling in hierarchical analyses. Thus, it is possible to optimize the use of the methodologies of principal components, canonical variables, and hierarchical analyses, ensuring the reliability of its results and avoiding empirical decision-making on the sampling number in soybean.
publishDate	2024
dc.date.none.fl_str_mv	2024-06-17T11:14:56Z 2024-06-17T11:14:56Z 2024-05-17
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://repositorio.ufsm.br/handle/1/32041
dc.identifier.dark.fl_str_mv	ark:/26339/001300000n39x
url	http://repositorio.ufsm.br/handle/1/32041
identifier_str_mv	ark:/26339/001300000n39x
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Santa Maria Brasil Agronomia UFSM Programa de Pós-Graduação em Agronomia Centro de Ciências Rurais
publisher.none.fl_str_mv	Universidade Federal de Santa Maria Brasil Agronomia UFSM Programa de Pós-Graduação em Agronomia Centro de Ciências Rurais
dc.source.none.fl_str_mv	reponame:Manancial - Repositório Digital da UFSM instname:Universidade Federal de Santa Maria (UFSM) instacron:UFSM
instname_str	Universidade Federal de Santa Maria (UFSM)
instacron_str	UFSM
institution	UFSM
reponame_str	Manancial - Repositório Digital da UFSM
collection	Manancial - Repositório Digital da UFSM
repository.name.fl_str_mv	Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)
repository.mail.fl_str_mv	atendimento.sib@ufsm.br\|\|tedebc@gmail.com
_version_	1815172365354532864

Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja

Registros relacionados