Enhancing genomic prediction with stacking ensemble learning in arabica coffee.

Detalhes bibliográficos
Autor(a) principal: NASCIMENTO, M.
Data de Publicação: 2024
Outros Autores: NASCIMENTO, A. C. C., AZEVEDO, C. F., OLIVEIRA, A. C. B. de, CAIXETA, E. T., JARQUIN, D.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
Texto Completo: http://www.alice.cnptia.embrapa.br/alice/handle/doc/1166920
Resumo: Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits.
id EMBR_4e50574adc7c249a355fbb9352453379
oai_identifier_str oai:www.alice.cnptia.embrapa.br:doc/1166920
network_acronym_str EMBR
network_name_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository_id_str 2154
spelling Enhancing genomic prediction with stacking ensemble learning in arabica coffee.Coffea ArábicaPlant breedingGenomicsGenetic traitsCoffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits.MOYSES NASCIMENTO, UNIVERSIDADE FEDERAL DE VIÇOSA; ANA CAROLINA CAMPANA NASCIMENTO, UNIVERSIDADE FEDERAL DE VIÇOSA; CAMILA FERREIRA AZEVEDO, UNIVERSIDADE FEDERAL DE VIÇOSA; ANTONIO CARLOS BAIAO DE OLIVEIRA, CNPCA; EVELINE TEIXEIRA CAIXETA MOURA, CNPCA; DIEGO JARQUIN, UNIVERSITY OF FLORIDA.NASCIMENTO, M.NASCIMENTO, A. C. C.AZEVEDO, C. F.OLIVEIRA, A. C. B. deCAIXETA, E. T.JARQUIN, D.2024-08-29T19:54:02Z2024-08-29T19:54:02Z2024-08-292024info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article14 p.Frontiers in Plant Science, v. 15, 2024.http://www.alice.cnptia.embrapa.br/alice/handle/doc/1166920enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)instacron:EMBRAPA2024-08-29T19:54:02Zoai:www.alice.cnptia.embrapa.br:doc/1166920Repositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestcg-riaa@embrapa.bropendoar:21542024-08-29T19:54:02Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)false
dc.title.none.fl_str_mv Enhancing genomic prediction with stacking ensemble learning in arabica coffee.
title Enhancing genomic prediction with stacking ensemble learning in arabica coffee.
spellingShingle Enhancing genomic prediction with stacking ensemble learning in arabica coffee.
NASCIMENTO, M.
Coffea Arábica
Plant breeding
Genomics
Genetic traits
title_short Enhancing genomic prediction with stacking ensemble learning in arabica coffee.
title_full Enhancing genomic prediction with stacking ensemble learning in arabica coffee.
title_fullStr Enhancing genomic prediction with stacking ensemble learning in arabica coffee.
title_full_unstemmed Enhancing genomic prediction with stacking ensemble learning in arabica coffee.
title_sort Enhancing genomic prediction with stacking ensemble learning in arabica coffee.
author NASCIMENTO, M.
author_facet NASCIMENTO, M.
NASCIMENTO, A. C. C.
AZEVEDO, C. F.
OLIVEIRA, A. C. B. de
CAIXETA, E. T.
JARQUIN, D.
author_role author
author2 NASCIMENTO, A. C. C.
AZEVEDO, C. F.
OLIVEIRA, A. C. B. de
CAIXETA, E. T.
JARQUIN, D.
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv MOYSES NASCIMENTO, UNIVERSIDADE FEDERAL DE VIÇOSA; ANA CAROLINA CAMPANA NASCIMENTO, UNIVERSIDADE FEDERAL DE VIÇOSA; CAMILA FERREIRA AZEVEDO, UNIVERSIDADE FEDERAL DE VIÇOSA; ANTONIO CARLOS BAIAO DE OLIVEIRA, CNPCA; EVELINE TEIXEIRA CAIXETA MOURA, CNPCA; DIEGO JARQUIN, UNIVERSITY OF FLORIDA.
dc.contributor.author.fl_str_mv NASCIMENTO, M.
NASCIMENTO, A. C. C.
AZEVEDO, C. F.
OLIVEIRA, A. C. B. de
CAIXETA, E. T.
JARQUIN, D.
dc.subject.por.fl_str_mv Coffea Arábica
Plant breeding
Genomics
Genetic traits
topic Coffea Arábica
Plant breeding
Genomics
Genetic traits
description Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits.
publishDate 2024
dc.date.none.fl_str_mv 2024-08-29T19:54:02Z
2024-08-29T19:54:02Z
2024-08-29
2024
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv Frontiers in Plant Science, v. 15, 2024.
http://www.alice.cnptia.embrapa.br/alice/handle/doc/1166920
identifier_str_mv Frontiers in Plant Science, v. 15, 2024.
url http://www.alice.cnptia.embrapa.br/alice/handle/doc/1166920
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 14 p.
dc.source.none.fl_str_mv reponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron:EMBRAPA
instname_str Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron_str EMBRAPA
institution EMBRAPA
reponame_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
collection Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository.name.fl_str_mv Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
repository.mail.fl_str_mv cg-riaa@embrapa.br
_version_ 1817695720057077760