Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum

Detalhes bibliográficos
Autor(a) principal: Santos, Jhonathan Pedroso Rigal dos
Data de Publicação: 2019
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: http://www.teses.usp.br/teses/disponiveis/11/11137/tde-12092019-153123/
Resumo: Sorghum (Sorghum bicolor L. Moench spp.) is a bioenergy crop with several appealing biological features to be explored in plant breeding for increasing efficiency in bioenergy production. The possibility to connect the influence of quantitative trait loci over time and between traits highlight the Bayesian networks as a powerful probabilistic framework to design novel genomic prediction models. In this study, we phenotyped a diverse panel of 869 sorghum lines in four different environments (2 locations in 2 years) with biweekly measurements from 30 days after planting (DAP) to 120 DAP for plant height and dry biomass at the end of the season. Genotyping-by-sequencing was performed, resulting in the scoring of 100,435 biallelic SNP markers. We developed and evaluated several genomic pre- diction models: Bayesian Network (BN), Pleiotropic Bayesian Network (PBN), and Dynamic Bayesian Network (DBN). Assumptions for BN, PBN, and DBN were independence, dependence between traits, and dependence between time points, respectively. For benchmarking, we used multivariate GBLUP models that considered only time points for plant height (MTi- GBLUP), and both time points for plant height and dry biomass (MTr-GBLUP) modeling unstructured variance-covariance matrix for genetic effects and residuals. Coincidence indices (CI) were computed for understanding the success in selecting for dry biomass using plant height measurements, as well as a coincidence index based on lines (CIL) using the posterior draws from the Bayesian networks to understand genetic plasticity over time. In the 5-fold cross-validation scheme, prediction accuracies ranged from 0.48 (PBN) to 0.51 (MTr- GBLUP) for dry biomass and from 0.47 (DBN-DAP120) to 0.74 (MTi-GBLUP-DAP60) for plant height. The forward-chaining cross-validation showed a substantial increment in prediction accuracies when using the DBN model, with r = 0.6 (train on slice 30:45 to predict 120 DAP) to 0.94 (train on slice 30:90 to predict 105 DAP) compared to the BN and PBN, and similar to multivariate GBLUP models. Both the CI and CIL indices showed that the ranking of promising inbred lines changed minimally after 45 DAP for plant height. These results suggest that 45 DAP is an optimal developmental stage for imposing the two-level indirect selection framework, where indirect selection for plant height at the end of the season (first-level target trait) can be done based on its ranking with 45 DAP (secondary trait) as well as for dry biomass (second-level target trait). With the advance of robotic technologies for field-based phenotyping, the development of novel approaches such as the two-level indirect selection framework will be imperative to boost genetic gain per unit of time.
id USP_e73d7d3fd6bb741d5afe3314b02b4746
oai_identifier_str oai:teses.usp.br:tde-12092019-153123
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghumNovas redes Bayesianas para predição genômica de caracteres de desenvolvimento em sorgo biomassaBayesian networksBioenergiaBioenergyGenomic predictionPredição genômicaRedes BayesianasSorghumSorgoSorghum (Sorghum bicolor L. Moench spp.) is a bioenergy crop with several appealing biological features to be explored in plant breeding for increasing efficiency in bioenergy production. The possibility to connect the influence of quantitative trait loci over time and between traits highlight the Bayesian networks as a powerful probabilistic framework to design novel genomic prediction models. In this study, we phenotyped a diverse panel of 869 sorghum lines in four different environments (2 locations in 2 years) with biweekly measurements from 30 days after planting (DAP) to 120 DAP for plant height and dry biomass at the end of the season. Genotyping-by-sequencing was performed, resulting in the scoring of 100,435 biallelic SNP markers. We developed and evaluated several genomic pre- diction models: Bayesian Network (BN), Pleiotropic Bayesian Network (PBN), and Dynamic Bayesian Network (DBN). Assumptions for BN, PBN, and DBN were independence, dependence between traits, and dependence between time points, respectively. For benchmarking, we used multivariate GBLUP models that considered only time points for plant height (MTi- GBLUP), and both time points for plant height and dry biomass (MTr-GBLUP) modeling unstructured variance-covariance matrix for genetic effects and residuals. Coincidence indices (CI) were computed for understanding the success in selecting for dry biomass using plant height measurements, as well as a coincidence index based on lines (CIL) using the posterior draws from the Bayesian networks to understand genetic plasticity over time. In the 5-fold cross-validation scheme, prediction accuracies ranged from 0.48 (PBN) to 0.51 (MTr- GBLUP) for dry biomass and from 0.47 (DBN-DAP120) to 0.74 (MTi-GBLUP-DAP60) for plant height. The forward-chaining cross-validation showed a substantial increment in prediction accuracies when using the DBN model, with r = 0.6 (train on slice 30:45 to predict 120 DAP) to 0.94 (train on slice 30:90 to predict 105 DAP) compared to the BN and PBN, and similar to multivariate GBLUP models. Both the CI and CIL indices showed that the ranking of promising inbred lines changed minimally after 45 DAP for plant height. These results suggest that 45 DAP is an optimal developmental stage for imposing the two-level indirect selection framework, where indirect selection for plant height at the end of the season (first-level target trait) can be done based on its ranking with 45 DAP (secondary trait) as well as for dry biomass (second-level target trait). With the advance of robotic technologies for field-based phenotyping, the development of novel approaches such as the two-level indirect selection framework will be imperative to boost genetic gain per unit of time.O sorgo (Sorghum bicolor L. Moench spp.) é uma cultura bioenergética com várias características atrativas para serem exploradas no melhoramento de plantas para aumentar a eficiência de produção de bioenergia. A possibilidade de conectar informações genômicas em caracteres quantitativos ao longo do tempo, e entre caracteres, destacam as Redes Bayesianas como uma ferramenta probabilística poderosa para delinear novos modelos de predição genômica. Neste estudo, um painel diverso de 869 linhagens de sorgo foi fenotipado em quatro ambientes diferentes (2 locais em 2 anos) com medidas a cada duas semanas de 30 a 120 dias após o plantio (DAP), para altura de plantas e biomassa seca no fim da safra. Um procedimento de Genotipagem por sequenciamento foi executado, resultando na chamada de 100.435 marcadores baseados em Polimorfismos de Nucleotídeos Únicos (SNPs) bialélicos. Neste estudo foram desenvolvidos e avaliados os modelos de predição genômica: Rede Bayesiana (BN), Rede Bayesiana Pleiotrópica (PBN), e Rede Bayesiana Dinâmica (DBN). Os pressupostos para BN, PBN, e DBN foram independência, dependência entre caracteres, e dependência entre pontos no tempo, respectivamente. Para fins comparativos, formulações de modelos multivariados GBLUP foram utilizados considerando dependência entre pontos de tempo para altura de plantas (MTi-GBLUP), e ambos os pontos de tempo para a altura de plantas e biomassa seca (MTr-GBLUP), modelando matriz de variância-covariância não estruturada para efeitos genéticos e residuais. Índices de coincidência (IC) foram calculados para entender o sucesso na seleção indireta de biomassa seca usando medidas de altura de plantas, bem como um índice de coincidência baseado em linhagens (CIL), usando as amostras das posteriores das redes Bayesianas para entender a plasticidade genética ao longo do tempo. No esquema de validação cruzada 5-fold, as acurácias das predições variaram de 0,48 (PBN) a 0,51 (MTr-GBLUP) para biomassa seca e de 0,47 (DBN-DAP120) a 0,74 (MTi-GBLUP-DAP60) para altura de plantas. A validação cruzada forward-chaining mostrou um incremento substancial nas acurácias das predições ao usar o modelo DBN, com r = 0,6 (treinando no intervalo 30:45 para prever 120 DAP) até 0,94 (treinando no intervalo 30:90 para prever 105 DAP) em comparação com o BN e PBN, e semelhante aos modelos multivariados GBLUP. Os índices CI e CIL mostraram que o ranking de linhagens promissoras mudou minimamente após 45 DAP para altura de plantas. Estes resultados sugerem que 45 DAP é um estágio de desenvolvimento ideal para impor a estrutura de seleção indireta em dois níveis, onde a seleção indireta para a altura da planta no final da estação (caractere alvo de primeiro nível) pode ser feita com base na sua classificação com 45 DAP (caractere secundário), bem como para a biomassa seca (caractere alvo de segundo nível). Com o avanço das tecnologias robóticas para a fenotipagem baseada em campo, o desenvolvimento de novas abordagens, como a estrutura de seleção indireta em dois níveis, serão imperativas para aumentar o ganho genético por unidade de tempo.Biblioteca Digitais de Teses e Dissertações da USPGarcia, Antonio Augusto FrancoSantos, Jhonathan Pedroso Rigal dos2019-08-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/11/11137/tde-12092019-153123/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2021-09-11T12:57:28Zoai:teses.usp.br:tde-12092019-153123Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212021-09-11T12:57:28Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum
Novas redes Bayesianas para predição genômica de caracteres de desenvolvimento em sorgo biomassa
title Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum
spellingShingle Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum
Santos, Jhonathan Pedroso Rigal dos
Bayesian networks
Bioenergia
Bioenergy
Genomic prediction
Predição genômica
Redes Bayesianas
Sorghum
Sorgo
title_short Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum
title_full Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum
title_fullStr Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum
title_full_unstemmed Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum
title_sort Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum
author Santos, Jhonathan Pedroso Rigal dos
author_facet Santos, Jhonathan Pedroso Rigal dos
author_role author
dc.contributor.none.fl_str_mv Garcia, Antonio Augusto Franco
dc.contributor.author.fl_str_mv Santos, Jhonathan Pedroso Rigal dos
dc.subject.por.fl_str_mv Bayesian networks
Bioenergia
Bioenergy
Genomic prediction
Predição genômica
Redes Bayesianas
Sorghum
Sorgo
topic Bayesian networks
Bioenergia
Bioenergy
Genomic prediction
Predição genômica
Redes Bayesianas
Sorghum
Sorgo
description Sorghum (Sorghum bicolor L. Moench spp.) is a bioenergy crop with several appealing biological features to be explored in plant breeding for increasing efficiency in bioenergy production. The possibility to connect the influence of quantitative trait loci over time and between traits highlight the Bayesian networks as a powerful probabilistic framework to design novel genomic prediction models. In this study, we phenotyped a diverse panel of 869 sorghum lines in four different environments (2 locations in 2 years) with biweekly measurements from 30 days after planting (DAP) to 120 DAP for plant height and dry biomass at the end of the season. Genotyping-by-sequencing was performed, resulting in the scoring of 100,435 biallelic SNP markers. We developed and evaluated several genomic pre- diction models: Bayesian Network (BN), Pleiotropic Bayesian Network (PBN), and Dynamic Bayesian Network (DBN). Assumptions for BN, PBN, and DBN were independence, dependence between traits, and dependence between time points, respectively. For benchmarking, we used multivariate GBLUP models that considered only time points for plant height (MTi- GBLUP), and both time points for plant height and dry biomass (MTr-GBLUP) modeling unstructured variance-covariance matrix for genetic effects and residuals. Coincidence indices (CI) were computed for understanding the success in selecting for dry biomass using plant height measurements, as well as a coincidence index based on lines (CIL) using the posterior draws from the Bayesian networks to understand genetic plasticity over time. In the 5-fold cross-validation scheme, prediction accuracies ranged from 0.48 (PBN) to 0.51 (MTr- GBLUP) for dry biomass and from 0.47 (DBN-DAP120) to 0.74 (MTi-GBLUP-DAP60) for plant height. The forward-chaining cross-validation showed a substantial increment in prediction accuracies when using the DBN model, with r = 0.6 (train on slice 30:45 to predict 120 DAP) to 0.94 (train on slice 30:90 to predict 105 DAP) compared to the BN and PBN, and similar to multivariate GBLUP models. Both the CI and CIL indices showed that the ranking of promising inbred lines changed minimally after 45 DAP for plant height. These results suggest that 45 DAP is an optimal developmental stage for imposing the two-level indirect selection framework, where indirect selection for plant height at the end of the season (first-level target trait) can be done based on its ranking with 45 DAP (secondary trait) as well as for dry biomass (second-level target trait). With the advance of robotic technologies for field-based phenotyping, the development of novel approaches such as the two-level indirect selection framework will be imperative to boost genetic gain per unit of time.
publishDate 2019
dc.date.none.fl_str_mv 2019-08-02
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.teses.usp.br/teses/disponiveis/11/11137/tde-12092019-153123/
url http://www.teses.usp.br/teses/disponiveis/11/11137/tde-12092019-153123/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815256993053540352