Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms

Detalhes bibliográficos
Autor(a) principal: Sousa,Ithalo Coelho de
Data de Publicação: 2021
Outros Autores: Nascimento,Moysés, Silva,Gabi Nunes, Nascimento,Ana Carolina Campana, Cruz,Cosme Damião, Silva,Fabyano Fonseca e, Almeida,Dênia Pires de, Pestana,Kátia Nogueira, Azevedo,Camila Ferreira, Zambolim,Laércio, Caixeta,Eveline Teixeira
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Scientia Agrícola (Online)
Texto Completo: http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162021000401102
Resumo: ABSTRACT Genomic selection (GS) emphasizes the simultaneous prediction of the genetic effects of thousands of scattered markers over the genome. Several statistical methodologies have been used in GS for the prediction of genetic merit. In general, such methodologies require certain assumptions about the data, such as the normality of the distribution of phenotypic values. To circumvent the non-normality of phenotypic values, the literature suggests the use of Bayesian Generalized Linear Regression (GBLASSO). Another alternative is the models based on machine learning, represented by methodologies such as Artificial Neural Networks (ANN), Decision Trees (DT) and related possible refinements such as Bagging, Random Forest and Boosting. This study aimed to use DT and its refinements for predicting resistance to orange rust in Arabica coffee. Additionally, DT and its refinements were used to identify the importance of markers related to the characteristic of interest. The results were compared with those from GBLASSO and ANN. Data on coffee rust resistance of 245 Arabica coffee plants genotyped for 137 markers were used. The DT refinements presented equal or inferior values of Apparent Error Rate compared to those obtained by DT, GBLASSO, and ANN. Moreover, DT refinements were able to identify important markers for the characteristic of interest. Out of 14 of the most important markers analyzed in each methodology, 9.3 markers on average were in regions of quantitative trait loci (QTLs) related to resistance to disease listed in the literature.
id USP-18_4a66d42456180cb7550e237e256086e4
oai_identifier_str oai:scielo:S0103-90162021000401102
network_acronym_str USP-18
network_name_str Scientia Agrícola (Online)
repository_id_str
spelling Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithmsHemileia vastatrixstatistical learningplant breedingartificial intelligenceABSTRACT Genomic selection (GS) emphasizes the simultaneous prediction of the genetic effects of thousands of scattered markers over the genome. Several statistical methodologies have been used in GS for the prediction of genetic merit. In general, such methodologies require certain assumptions about the data, such as the normality of the distribution of phenotypic values. To circumvent the non-normality of phenotypic values, the literature suggests the use of Bayesian Generalized Linear Regression (GBLASSO). Another alternative is the models based on machine learning, represented by methodologies such as Artificial Neural Networks (ANN), Decision Trees (DT) and related possible refinements such as Bagging, Random Forest and Boosting. This study aimed to use DT and its refinements for predicting resistance to orange rust in Arabica coffee. Additionally, DT and its refinements were used to identify the importance of markers related to the characteristic of interest. The results were compared with those from GBLASSO and ANN. Data on coffee rust resistance of 245 Arabica coffee plants genotyped for 137 markers were used. The DT refinements presented equal or inferior values of Apparent Error Rate compared to those obtained by DT, GBLASSO, and ANN. Moreover, DT refinements were able to identify important markers for the characteristic of interest. Out of 14 of the most important markers analyzed in each methodology, 9.3 markers on average were in regions of quantitative trait loci (QTLs) related to resistance to disease listed in the literature.Escola Superior de Agricultura "Luiz de Queiroz"2021-01-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162021000401102Scientia Agricola v.78 n.4 2021reponame:Scientia Agrícola (Online)instname:Universidade de São Paulo (USP)instacron:USP10.1590/1678-992x-2020-0021info:eu-repo/semantics/openAccessSousa,Ithalo Coelho deNascimento,MoysésSilva,Gabi NunesNascimento,Ana Carolina CampanaCruz,Cosme DamiãoSilva,Fabyano Fonseca eAlmeida,Dênia Pires dePestana,Kátia NogueiraAzevedo,Camila FerreiraZambolim,LaércioCaixeta,Eveline Teixeiraeng2020-07-06T00:00:00Zoai:scielo:S0103-90162021000401102Revistahttp://revistas.usp.br/sa/indexPUBhttps://old.scielo.br/oai/scielo-oai.phpscientia@usp.br||alleoni@usp.br1678-992X0103-9016opendoar:2020-07-06T00:00Scientia Agrícola (Online) - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms
title Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms
spellingShingle Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms
Sousa,Ithalo Coelho de
Hemileia vastatrix
statistical learning
plant breeding
artificial intelligence
title_short Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms
title_full Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms
title_fullStr Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms
title_full_unstemmed Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms
title_sort Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms
author Sousa,Ithalo Coelho de
author_facet Sousa,Ithalo Coelho de
Nascimento,Moysés
Silva,Gabi Nunes
Nascimento,Ana Carolina Campana
Cruz,Cosme Damião
Silva,Fabyano Fonseca e
Almeida,Dênia Pires de
Pestana,Kátia Nogueira
Azevedo,Camila Ferreira
Zambolim,Laércio
Caixeta,Eveline Teixeira
author_role author
author2 Nascimento,Moysés
Silva,Gabi Nunes
Nascimento,Ana Carolina Campana
Cruz,Cosme Damião
Silva,Fabyano Fonseca e
Almeida,Dênia Pires de
Pestana,Kátia Nogueira
Azevedo,Camila Ferreira
Zambolim,Laércio
Caixeta,Eveline Teixeira
author2_role author
author
author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Sousa,Ithalo Coelho de
Nascimento,Moysés
Silva,Gabi Nunes
Nascimento,Ana Carolina Campana
Cruz,Cosme Damião
Silva,Fabyano Fonseca e
Almeida,Dênia Pires de
Pestana,Kátia Nogueira
Azevedo,Camila Ferreira
Zambolim,Laércio
Caixeta,Eveline Teixeira
dc.subject.por.fl_str_mv Hemileia vastatrix
statistical learning
plant breeding
artificial intelligence
topic Hemileia vastatrix
statistical learning
plant breeding
artificial intelligence
description ABSTRACT Genomic selection (GS) emphasizes the simultaneous prediction of the genetic effects of thousands of scattered markers over the genome. Several statistical methodologies have been used in GS for the prediction of genetic merit. In general, such methodologies require certain assumptions about the data, such as the normality of the distribution of phenotypic values. To circumvent the non-normality of phenotypic values, the literature suggests the use of Bayesian Generalized Linear Regression (GBLASSO). Another alternative is the models based on machine learning, represented by methodologies such as Artificial Neural Networks (ANN), Decision Trees (DT) and related possible refinements such as Bagging, Random Forest and Boosting. This study aimed to use DT and its refinements for predicting resistance to orange rust in Arabica coffee. Additionally, DT and its refinements were used to identify the importance of markers related to the characteristic of interest. The results were compared with those from GBLASSO and ANN. Data on coffee rust resistance of 245 Arabica coffee plants genotyped for 137 markers were used. The DT refinements presented equal or inferior values of Apparent Error Rate compared to those obtained by DT, GBLASSO, and ANN. Moreover, DT refinements were able to identify important markers for the characteristic of interest. Out of 14 of the most important markers analyzed in each methodology, 9.3 markers on average were in regions of quantitative trait loci (QTLs) related to resistance to disease listed in the literature.
publishDate 2021
dc.date.none.fl_str_mv 2021-01-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162021000401102
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0103-90162021000401102
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1590/1678-992x-2020-0021
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv text/html
dc.publisher.none.fl_str_mv Escola Superior de Agricultura "Luiz de Queiroz"
publisher.none.fl_str_mv Escola Superior de Agricultura "Luiz de Queiroz"
dc.source.none.fl_str_mv Scientia Agricola v.78 n.4 2021
reponame:Scientia Agrícola (Online)
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Scientia Agrícola (Online)
collection Scientia Agrícola (Online)
repository.name.fl_str_mv Scientia Agrícola (Online) - Universidade de São Paulo (USP)
repository.mail.fl_str_mv scientia@usp.br||alleoni@usp.br
_version_ 1748936465661820928