Enhancing genomic prediction with stacking ensemble learning in arabica coffee.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2024 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
Texto Completo: | http://www.alice.cnptia.embrapa.br/alice/handle/doc/1166920 |
Resumo: | Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits. |
id |
EMBR_4e50574adc7c249a355fbb9352453379 |
---|---|
oai_identifier_str |
oai:www.alice.cnptia.embrapa.br:doc/1166920 |
network_acronym_str |
EMBR |
network_name_str |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
repository_id_str |
2154 |
spelling |
Enhancing genomic prediction with stacking ensemble learning in arabica coffee.Coffea ArábicaPlant breedingGenomicsGenetic traitsCoffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits.MOYSES NASCIMENTO, UNIVERSIDADE FEDERAL DE VIÇOSA; ANA CAROLINA CAMPANA NASCIMENTO, UNIVERSIDADE FEDERAL DE VIÇOSA; CAMILA FERREIRA AZEVEDO, UNIVERSIDADE FEDERAL DE VIÇOSA; ANTONIO CARLOS BAIAO DE OLIVEIRA, CNPCA; EVELINE TEIXEIRA CAIXETA MOURA, CNPCA; DIEGO JARQUIN, UNIVERSITY OF FLORIDA.NASCIMENTO, M.NASCIMENTO, A. C. C.AZEVEDO, C. F.OLIVEIRA, A. C. B. deCAIXETA, E. T.JARQUIN, D.2024-08-29T19:54:02Z2024-08-29T19:54:02Z2024-08-292024info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article14 p.Frontiers in Plant Science, v. 15, 2024.http://www.alice.cnptia.embrapa.br/alice/handle/doc/1166920enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)instacron:EMBRAPA2024-08-29T19:54:02Zoai:www.alice.cnptia.embrapa.br:doc/1166920Repositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestcg-riaa@embrapa.bropendoar:21542024-08-29T19:54:02Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)false |
dc.title.none.fl_str_mv |
Enhancing genomic prediction with stacking ensemble learning in arabica coffee. |
title |
Enhancing genomic prediction with stacking ensemble learning in arabica coffee. |
spellingShingle |
Enhancing genomic prediction with stacking ensemble learning in arabica coffee. NASCIMENTO, M. Coffea Arábica Plant breeding Genomics Genetic traits |
title_short |
Enhancing genomic prediction with stacking ensemble learning in arabica coffee. |
title_full |
Enhancing genomic prediction with stacking ensemble learning in arabica coffee. |
title_fullStr |
Enhancing genomic prediction with stacking ensemble learning in arabica coffee. |
title_full_unstemmed |
Enhancing genomic prediction with stacking ensemble learning in arabica coffee. |
title_sort |
Enhancing genomic prediction with stacking ensemble learning in arabica coffee. |
author |
NASCIMENTO, M. |
author_facet |
NASCIMENTO, M. NASCIMENTO, A. C. C. AZEVEDO, C. F. OLIVEIRA, A. C. B. de CAIXETA, E. T. JARQUIN, D. |
author_role |
author |
author2 |
NASCIMENTO, A. C. C. AZEVEDO, C. F. OLIVEIRA, A. C. B. de CAIXETA, E. T. JARQUIN, D. |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
MOYSES NASCIMENTO, UNIVERSIDADE FEDERAL DE VIÇOSA; ANA CAROLINA CAMPANA NASCIMENTO, UNIVERSIDADE FEDERAL DE VIÇOSA; CAMILA FERREIRA AZEVEDO, UNIVERSIDADE FEDERAL DE VIÇOSA; ANTONIO CARLOS BAIAO DE OLIVEIRA, CNPCA; EVELINE TEIXEIRA CAIXETA MOURA, CNPCA; DIEGO JARQUIN, UNIVERSITY OF FLORIDA. |
dc.contributor.author.fl_str_mv |
NASCIMENTO, M. NASCIMENTO, A. C. C. AZEVEDO, C. F. OLIVEIRA, A. C. B. de CAIXETA, E. T. JARQUIN, D. |
dc.subject.por.fl_str_mv |
Coffea Arábica Plant breeding Genomics Genetic traits |
topic |
Coffea Arábica Plant breeding Genomics Genetic traits |
description |
Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-08-29T19:54:02Z 2024-08-29T19:54:02Z 2024-08-29 2024 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
Frontiers in Plant Science, v. 15, 2024. http://www.alice.cnptia.embrapa.br/alice/handle/doc/1166920 |
identifier_str_mv |
Frontiers in Plant Science, v. 15, 2024. |
url |
http://www.alice.cnptia.embrapa.br/alice/handle/doc/1166920 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
14 p. |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa) instacron:EMBRAPA |
instname_str |
Empresa Brasileira de Pesquisa Agropecuária (Embrapa) |
instacron_str |
EMBRAPA |
institution |
EMBRAPA |
reponame_str |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
collection |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
repository.name.fl_str_mv |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa) |
repository.mail.fl_str_mv |
cg-riaa@embrapa.br |
_version_ |
1817695720057077760 |