A comparison of regression methods based on dimensional reduction for genomic prediction.

Detalhes bibliográficos
Autor(a) principal: COSTA, J. A. da
Data de Publicação: 2021
Outros Autores: AZEVEDO, C. F., NASCIMENTO, M., SILVA, F. F. e, RESENDE, M. D. V. de, NASCIMENTO, A. C. C.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
Texto Completo: http://www.alice.cnptia.embrapa.br/alice/handle/doc/1139234
https://doi.org/10.4238/gmr18877
Resumo: multicollinearity and high dimensionality problems, making it impossible to obtain stable estimates through the traditional method of estimation based on ordinary least squares. To overcome such challenges, dimensionality reduction methods have been proposed, because of their simple theory and easy application. We compared three dimensionality reduction methods: Principal Components Regression (PCR), Partial Least Squares (PLS), and Independent Components Regression (ICR). An important step for dimensionality reduction and prediction is selecting the number of components, as it affects the linear combinations of the explanatory variables. The linear combinations are inserted into the model to predict the response based on a reduced number of parameters. We examined the criteria for the selection of the number of components. The dimensionality reduction methods were applied to genomic and phenotype data. We evaluated 370 accessions of Asian rice, Oryza sativa, which were genotyped for 36,901 SNPs markers considered to predict the genomic values for the number of panicles per plant trait.This data set presented multicollinearity and high dimensionality. The computational time for each method was also recorded. Among the methods, PCR and ICR gave the highest accuracy values, with ICR standing out for presenting estimates of the least biased genomic values. However, ICR required more computational time than the other methodologies.
id EMBR_bc922f9d0ea63104134572be3f4945dd
oai_identifier_str oai:www.alice.cnptia.embrapa.br:doc/1139234
network_acronym_str EMBR
network_name_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository_id_str 2154
spelling A comparison of regression methods based on dimensional reduction for genomic prediction.Regression analysisGenomicsmulticollinearity and high dimensionality problems, making it impossible to obtain stable estimates through the traditional method of estimation based on ordinary least squares. To overcome such challenges, dimensionality reduction methods have been proposed, because of their simple theory and easy application. We compared three dimensionality reduction methods: Principal Components Regression (PCR), Partial Least Squares (PLS), and Independent Components Regression (ICR). An important step for dimensionality reduction and prediction is selecting the number of components, as it affects the linear combinations of the explanatory variables. The linear combinations are inserted into the model to predict the response based on a reduced number of parameters. We examined the criteria for the selection of the number of components. The dimensionality reduction methods were applied to genomic and phenotype data. We evaluated 370 accessions of Asian rice, Oryza sativa, which were genotyped for 36,901 SNPs markers considered to predict the genomic values for the number of panicles per plant trait.This data set presented multicollinearity and high dimensionality. The computational time for each method was also recorded. Among the methods, PCR and ICR gave the highest accuracy values, with ICR standing out for presenting estimates of the least biased genomic values. However, ICR required more computational time than the other methodologies.JAQUICELE APARECIDA DA COSTA, UFV; CAMILA FERREIRA AZEVEDO, UFV; MOYSÉS NASCIMENTO, UFV; FABYANO FONSECA E SILVA, UFV; MARCOS DEON VILELA DE RESENDE, CNPCa; ANA CAROLINA CAMPANA NASCIMENTO, UFV.COSTA, J. A. daAZEVEDO, C. F.NASCIMENTO, M.SILVA, F. F. eRESENDE, M. D. V. deNASCIMENTO, A. C. C.2022-01-21T14:30:04Z2022-01-21T14:30:04Z2022-01-212021info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleGenetics and Molecular Research, v. 20, n. 2, p. 1-15, 2021.http://www.alice.cnptia.embrapa.br/alice/handle/doc/1139234https://doi.org/10.4238/gmr18877enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)instacron:EMBRAPA2022-01-21T14:30:13Zoai:www.alice.cnptia.embrapa.br:doc/1139234Repositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestopendoar:21542022-01-21T14:30:13falseRepositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestcg-riaa@embrapa.bropendoar:21542022-01-21T14:30:13Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)false
dc.title.none.fl_str_mv A comparison of regression methods based on dimensional reduction for genomic prediction.
title A comparison of regression methods based on dimensional reduction for genomic prediction.
spellingShingle A comparison of regression methods based on dimensional reduction for genomic prediction.
COSTA, J. A. da
Regression analysis
Genomics
title_short A comparison of regression methods based on dimensional reduction for genomic prediction.
title_full A comparison of regression methods based on dimensional reduction for genomic prediction.
title_fullStr A comparison of regression methods based on dimensional reduction for genomic prediction.
title_full_unstemmed A comparison of regression methods based on dimensional reduction for genomic prediction.
title_sort A comparison of regression methods based on dimensional reduction for genomic prediction.
author COSTA, J. A. da
author_facet COSTA, J. A. da
AZEVEDO, C. F.
NASCIMENTO, M.
SILVA, F. F. e
RESENDE, M. D. V. de
NASCIMENTO, A. C. C.
author_role author
author2 AZEVEDO, C. F.
NASCIMENTO, M.
SILVA, F. F. e
RESENDE, M. D. V. de
NASCIMENTO, A. C. C.
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv JAQUICELE APARECIDA DA COSTA, UFV; CAMILA FERREIRA AZEVEDO, UFV; MOYSÉS NASCIMENTO, UFV; FABYANO FONSECA E SILVA, UFV; MARCOS DEON VILELA DE RESENDE, CNPCa; ANA CAROLINA CAMPANA NASCIMENTO, UFV.
dc.contributor.author.fl_str_mv COSTA, J. A. da
AZEVEDO, C. F.
NASCIMENTO, M.
SILVA, F. F. e
RESENDE, M. D. V. de
NASCIMENTO, A. C. C.
dc.subject.por.fl_str_mv Regression analysis
Genomics
topic Regression analysis
Genomics
description multicollinearity and high dimensionality problems, making it impossible to obtain stable estimates through the traditional method of estimation based on ordinary least squares. To overcome such challenges, dimensionality reduction methods have been proposed, because of their simple theory and easy application. We compared three dimensionality reduction methods: Principal Components Regression (PCR), Partial Least Squares (PLS), and Independent Components Regression (ICR). An important step for dimensionality reduction and prediction is selecting the number of components, as it affects the linear combinations of the explanatory variables. The linear combinations are inserted into the model to predict the response based on a reduced number of parameters. We examined the criteria for the selection of the number of components. The dimensionality reduction methods were applied to genomic and phenotype data. We evaluated 370 accessions of Asian rice, Oryza sativa, which were genotyped for 36,901 SNPs markers considered to predict the genomic values for the number of panicles per plant trait.This data set presented multicollinearity and high dimensionality. The computational time for each method was also recorded. Among the methods, PCR and ICR gave the highest accuracy values, with ICR standing out for presenting estimates of the least biased genomic values. However, ICR required more computational time than the other methodologies.
publishDate 2021
dc.date.none.fl_str_mv 2021
2022-01-21T14:30:04Z
2022-01-21T14:30:04Z
2022-01-21
dc.type.driver.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv Genetics and Molecular Research, v. 20, n. 2, p. 1-15, 2021.
http://www.alice.cnptia.embrapa.br/alice/handle/doc/1139234
https://doi.org/10.4238/gmr18877
identifier_str_mv Genetics and Molecular Research, v. 20, n. 2, p. 1-15, 2021.
url http://www.alice.cnptia.embrapa.br/alice/handle/doc/1139234
https://doi.org/10.4238/gmr18877
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron:EMBRAPA
instname_str Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron_str EMBRAPA
institution EMBRAPA
reponame_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
collection Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository.name.fl_str_mv Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
repository.mail.fl_str_mv cg-riaa@embrapa.br
_version_ 1794503516828467200