A comparison of statistical methods for genomic selection in a mice population

Detalhes bibliográficos
Autor(a) principal: Neves, Haroldo H. R. [UNESP]
Data de Publicação: 2012
Outros Autores: Carvalheiro, Roberto, Queiroz, Sandra A. [UNESP]
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1186/1471-2156-13-100
http://hdl.handle.net/11449/4673
Resumo: Background: The availability of high-density panels of SNP markers has opened new perspectives for marker-assisted selection strategies, such that genotypes for these markers are used to predict the genetic merit of selection candidates. Because the number of markers is often much larger than the number of phenotypes, marker effect estimation is not a trivial task. The objective of this research was to compare the predictive performance of ten different statistical methods employed in genomic selection, by analyzing data from a heterogeneous stock mice population.Results: For the five traits analyzed (W6W: weight at six weeks, WGS: growth slope, BL: body length, %CD8+: percentage of CD8+ cells, CD4+/ CD8+: ratio between CD4+ and CD8+ cells), within-family predictions were more accurate than across-family predictions, although this superiority in accuracy varied markedly across traits. For within-family prediction, two kernel methods, Reproducing Kernel Hilbert Spaces Regression (RKHS) and Support Vector Regression (SVR), were the most accurate for W6W, while a polygenic model also had comparable performance. A form of ridge regression assuming that all markers contribute to the additive variance (RR_GBLUP) figured among the most accurate for WGS and BL, while two variable selection methods (LASSO and Random Forest, RF) had the greatest predictive abilities for % CD8+ and CD4+/ CD8+. RF, RKHS, SVR and RR_GBLUP outperformed the remainder methods in terms of bias and inflation of predictions.Conclusions: Methods with large conceptual differences reached very similar predictive abilities and a clear re-ranking of methods was observed in function of the trait analyzed. Variable selection methods were more accurate than the remainder in the case of % CD8+ and CD4+/ CD8+ and these traits are likely to be influenced by a smaller number of QTL than the remainder. Judged by their overall performance across traits and computational requirements, RR_GBLUP, RKHS and SVR are particularly appealing for application in genomic selection.
id UNSP_61fe191cbea6d3bb4a8695d107ffa090
oai_identifier_str oai:repositorio.unesp.br:11449/4673
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling A comparison of statistical methods for genomic selection in a mice populationKernel regressionLASSORandom forestridge regressionSNPSubset selectionBackground: The availability of high-density panels of SNP markers has opened new perspectives for marker-assisted selection strategies, such that genotypes for these markers are used to predict the genetic merit of selection candidates. Because the number of markers is often much larger than the number of phenotypes, marker effect estimation is not a trivial task. The objective of this research was to compare the predictive performance of ten different statistical methods employed in genomic selection, by analyzing data from a heterogeneous stock mice population.Results: For the five traits analyzed (W6W: weight at six weeks, WGS: growth slope, BL: body length, %CD8+: percentage of CD8+ cells, CD4+/ CD8+: ratio between CD4+ and CD8+ cells), within-family predictions were more accurate than across-family predictions, although this superiority in accuracy varied markedly across traits. For within-family prediction, two kernel methods, Reproducing Kernel Hilbert Spaces Regression (RKHS) and Support Vector Regression (SVR), were the most accurate for W6W, while a polygenic model also had comparable performance. A form of ridge regression assuming that all markers contribute to the additive variance (RR_GBLUP) figured among the most accurate for WGS and BL, while two variable selection methods (LASSO and Random Forest, RF) had the greatest predictive abilities for % CD8+ and CD4+/ CD8+. RF, RKHS, SVR and RR_GBLUP outperformed the remainder methods in terms of bias and inflation of predictions.Conclusions: Methods with large conceptual differences reached very similar predictive abilities and a clear re-ranking of methods was observed in function of the trait analyzed. Variable selection methods were more accurate than the remainder in the case of % CD8+ and CD4+/ CD8+ and these traits are likely to be influenced by a smaller number of QTL than the remainder. Judged by their overall performance across traits and computational requirements, RR_GBLUP, RKHS and SVR are particularly appealing for application in genomic selection.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)UNESP, FCAV, Dept Zootecnia, BR-14884900 Jaboticabal, SP, BrazilGenSys Consultores Assoc SS Ltda, Porto Alegre, RS, BrazilUNESP, FCAV, Dept Zootecnia, BR-14884900 Jaboticabal, SP, BrazilBiomed Central Ltd.Universidade Estadual Paulista (Unesp)GenSys Consultores Assoc SS LtdaNeves, Haroldo H. R. [UNESP]Carvalheiro, RobertoQueiroz, Sandra A. [UNESP]2014-05-20T13:18:40Z2014-05-20T13:18:40Z2012-11-08info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article17application/pdfhttp://dx.doi.org/10.1186/1471-2156-13-100Bmc Genetics. London: Biomed Central Ltd., v. 13, p. 17, 2012.1471-2156http://hdl.handle.net/11449/467310.1186/1471-2156-13-100WOS:000314596300001WOS000314596300001.pdf9096087557977610Web of Sciencereponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengBMC Genetics2.4691,160info:eu-repo/semantics/openAccess2024-06-07T18:41:05Zoai:repositorio.unesp.br:11449/4673Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T17:05:41.958641Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv A comparison of statistical methods for genomic selection in a mice population
title A comparison of statistical methods for genomic selection in a mice population
spellingShingle A comparison of statistical methods for genomic selection in a mice population
Neves, Haroldo H. R. [UNESP]
Kernel regression
LASSO
Random forest
ridge regression
SNP
Subset selection
title_short A comparison of statistical methods for genomic selection in a mice population
title_full A comparison of statistical methods for genomic selection in a mice population
title_fullStr A comparison of statistical methods for genomic selection in a mice population
title_full_unstemmed A comparison of statistical methods for genomic selection in a mice population
title_sort A comparison of statistical methods for genomic selection in a mice population
author Neves, Haroldo H. R. [UNESP]
author_facet Neves, Haroldo H. R. [UNESP]
Carvalheiro, Roberto
Queiroz, Sandra A. [UNESP]
author_role author
author2 Carvalheiro, Roberto
Queiroz, Sandra A. [UNESP]
author2_role author
author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (Unesp)
GenSys Consultores Assoc SS Ltda
dc.contributor.author.fl_str_mv Neves, Haroldo H. R. [UNESP]
Carvalheiro, Roberto
Queiroz, Sandra A. [UNESP]
dc.subject.por.fl_str_mv Kernel regression
LASSO
Random forest
ridge regression
SNP
Subset selection
topic Kernel regression
LASSO
Random forest
ridge regression
SNP
Subset selection
description Background: The availability of high-density panels of SNP markers has opened new perspectives for marker-assisted selection strategies, such that genotypes for these markers are used to predict the genetic merit of selection candidates. Because the number of markers is often much larger than the number of phenotypes, marker effect estimation is not a trivial task. The objective of this research was to compare the predictive performance of ten different statistical methods employed in genomic selection, by analyzing data from a heterogeneous stock mice population.Results: For the five traits analyzed (W6W: weight at six weeks, WGS: growth slope, BL: body length, %CD8+: percentage of CD8+ cells, CD4+/ CD8+: ratio between CD4+ and CD8+ cells), within-family predictions were more accurate than across-family predictions, although this superiority in accuracy varied markedly across traits. For within-family prediction, two kernel methods, Reproducing Kernel Hilbert Spaces Regression (RKHS) and Support Vector Regression (SVR), were the most accurate for W6W, while a polygenic model also had comparable performance. A form of ridge regression assuming that all markers contribute to the additive variance (RR_GBLUP) figured among the most accurate for WGS and BL, while two variable selection methods (LASSO and Random Forest, RF) had the greatest predictive abilities for % CD8+ and CD4+/ CD8+. RF, RKHS, SVR and RR_GBLUP outperformed the remainder methods in terms of bias and inflation of predictions.Conclusions: Methods with large conceptual differences reached very similar predictive abilities and a clear re-ranking of methods was observed in function of the trait analyzed. Variable selection methods were more accurate than the remainder in the case of % CD8+ and CD4+/ CD8+ and these traits are likely to be influenced by a smaller number of QTL than the remainder. Judged by their overall performance across traits and computational requirements, RR_GBLUP, RKHS and SVR are particularly appealing for application in genomic selection.
publishDate 2012
dc.date.none.fl_str_mv 2012-11-08
2014-05-20T13:18:40Z
2014-05-20T13:18:40Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1186/1471-2156-13-100
Bmc Genetics. London: Biomed Central Ltd., v. 13, p. 17, 2012.
1471-2156
http://hdl.handle.net/11449/4673
10.1186/1471-2156-13-100
WOS:000314596300001
WOS000314596300001.pdf
9096087557977610
url http://dx.doi.org/10.1186/1471-2156-13-100
http://hdl.handle.net/11449/4673
identifier_str_mv Bmc Genetics. London: Biomed Central Ltd., v. 13, p. 17, 2012.
1471-2156
10.1186/1471-2156-13-100
WOS:000314596300001
WOS000314596300001.pdf
9096087557977610
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv BMC Genetics
2.469
1,160
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 17
application/pdf
dc.publisher.none.fl_str_mv Biomed Central Ltd.
publisher.none.fl_str_mv Biomed Central Ltd.
dc.source.none.fl_str_mv Web of Science
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808128754732498944