Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da USP |
Texto Completo: | http://www.teses.usp.br/teses/disponiveis/11/11137/tde-12092019-153123/ |
Resumo: | Sorghum (Sorghum bicolor L. Moench spp.) is a bioenergy crop with several appealing biological features to be explored in plant breeding for increasing efficiency in bioenergy production. The possibility to connect the influence of quantitative trait loci over time and between traits highlight the Bayesian networks as a powerful probabilistic framework to design novel genomic prediction models. In this study, we phenotyped a diverse panel of 869 sorghum lines in four different environments (2 locations in 2 years) with biweekly measurements from 30 days after planting (DAP) to 120 DAP for plant height and dry biomass at the end of the season. Genotyping-by-sequencing was performed, resulting in the scoring of 100,435 biallelic SNP markers. We developed and evaluated several genomic pre- diction models: Bayesian Network (BN), Pleiotropic Bayesian Network (PBN), and Dynamic Bayesian Network (DBN). Assumptions for BN, PBN, and DBN were independence, dependence between traits, and dependence between time points, respectively. For benchmarking, we used multivariate GBLUP models that considered only time points for plant height (MTi- GBLUP), and both time points for plant height and dry biomass (MTr-GBLUP) modeling unstructured variance-covariance matrix for genetic effects and residuals. Coincidence indices (CI) were computed for understanding the success in selecting for dry biomass using plant height measurements, as well as a coincidence index based on lines (CIL) using the posterior draws from the Bayesian networks to understand genetic plasticity over time. In the 5-fold cross-validation scheme, prediction accuracies ranged from 0.48 (PBN) to 0.51 (MTr- GBLUP) for dry biomass and from 0.47 (DBN-DAP120) to 0.74 (MTi-GBLUP-DAP60) for plant height. The forward-chaining cross-validation showed a substantial increment in prediction accuracies when using the DBN model, with r = 0.6 (train on slice 30:45 to predict 120 DAP) to 0.94 (train on slice 30:90 to predict 105 DAP) compared to the BN and PBN, and similar to multivariate GBLUP models. Both the CI and CIL indices showed that the ranking of promising inbred lines changed minimally after 45 DAP for plant height. These results suggest that 45 DAP is an optimal developmental stage for imposing the two-level indirect selection framework, where indirect selection for plant height at the end of the season (first-level target trait) can be done based on its ranking with 45 DAP (secondary trait) as well as for dry biomass (second-level target trait). With the advance of robotic technologies for field-based phenotyping, the development of novel approaches such as the two-level indirect selection framework will be imperative to boost genetic gain per unit of time. |
id |
USP_e73d7d3fd6bb741d5afe3314b02b4746 |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-12092019-153123 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
2721 |
spelling |
Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghumNovas redes Bayesianas para predição genômica de caracteres de desenvolvimento em sorgo biomassaBayesian networksBioenergiaBioenergyGenomic predictionPredição genômicaRedes BayesianasSorghumSorgoSorghum (Sorghum bicolor L. Moench spp.) is a bioenergy crop with several appealing biological features to be explored in plant breeding for increasing efficiency in bioenergy production. The possibility to connect the influence of quantitative trait loci over time and between traits highlight the Bayesian networks as a powerful probabilistic framework to design novel genomic prediction models. In this study, we phenotyped a diverse panel of 869 sorghum lines in four different environments (2 locations in 2 years) with biweekly measurements from 30 days after planting (DAP) to 120 DAP for plant height and dry biomass at the end of the season. Genotyping-by-sequencing was performed, resulting in the scoring of 100,435 biallelic SNP markers. We developed and evaluated several genomic pre- diction models: Bayesian Network (BN), Pleiotropic Bayesian Network (PBN), and Dynamic Bayesian Network (DBN). Assumptions for BN, PBN, and DBN were independence, dependence between traits, and dependence between time points, respectively. For benchmarking, we used multivariate GBLUP models that considered only time points for plant height (MTi- GBLUP), and both time points for plant height and dry biomass (MTr-GBLUP) modeling unstructured variance-covariance matrix for genetic effects and residuals. Coincidence indices (CI) were computed for understanding the success in selecting for dry biomass using plant height measurements, as well as a coincidence index based on lines (CIL) using the posterior draws from the Bayesian networks to understand genetic plasticity over time. In the 5-fold cross-validation scheme, prediction accuracies ranged from 0.48 (PBN) to 0.51 (MTr- GBLUP) for dry biomass and from 0.47 (DBN-DAP120) to 0.74 (MTi-GBLUP-DAP60) for plant height. The forward-chaining cross-validation showed a substantial increment in prediction accuracies when using the DBN model, with r = 0.6 (train on slice 30:45 to predict 120 DAP) to 0.94 (train on slice 30:90 to predict 105 DAP) compared to the BN and PBN, and similar to multivariate GBLUP models. Both the CI and CIL indices showed that the ranking of promising inbred lines changed minimally after 45 DAP for plant height. These results suggest that 45 DAP is an optimal developmental stage for imposing the two-level indirect selection framework, where indirect selection for plant height at the end of the season (first-level target trait) can be done based on its ranking with 45 DAP (secondary trait) as well as for dry biomass (second-level target trait). With the advance of robotic technologies for field-based phenotyping, the development of novel approaches such as the two-level indirect selection framework will be imperative to boost genetic gain per unit of time.O sorgo (Sorghum bicolor L. Moench spp.) é uma cultura bioenergética com várias características atrativas para serem exploradas no melhoramento de plantas para aumentar a eficiência de produção de bioenergia. A possibilidade de conectar informações genômicas em caracteres quantitativos ao longo do tempo, e entre caracteres, destacam as Redes Bayesianas como uma ferramenta probabilística poderosa para delinear novos modelos de predição genômica. Neste estudo, um painel diverso de 869 linhagens de sorgo foi fenotipado em quatro ambientes diferentes (2 locais em 2 anos) com medidas a cada duas semanas de 30 a 120 dias após o plantio (DAP), para altura de plantas e biomassa seca no fim da safra. Um procedimento de Genotipagem por sequenciamento foi executado, resultando na chamada de 100.435 marcadores baseados em Polimorfismos de Nucleotídeos Únicos (SNPs) bialélicos. Neste estudo foram desenvolvidos e avaliados os modelos de predição genômica: Rede Bayesiana (BN), Rede Bayesiana Pleiotrópica (PBN), e Rede Bayesiana Dinâmica (DBN). Os pressupostos para BN, PBN, e DBN foram independência, dependência entre caracteres, e dependência entre pontos no tempo, respectivamente. Para fins comparativos, formulações de modelos multivariados GBLUP foram utilizados considerando dependência entre pontos de tempo para altura de plantas (MTi-GBLUP), e ambos os pontos de tempo para a altura de plantas e biomassa seca (MTr-GBLUP), modelando matriz de variância-covariância não estruturada para efeitos genéticos e residuais. Índices de coincidência (IC) foram calculados para entender o sucesso na seleção indireta de biomassa seca usando medidas de altura de plantas, bem como um índice de coincidência baseado em linhagens (CIL), usando as amostras das posteriores das redes Bayesianas para entender a plasticidade genética ao longo do tempo. No esquema de validação cruzada 5-fold, as acurácias das predições variaram de 0,48 (PBN) a 0,51 (MTr-GBLUP) para biomassa seca e de 0,47 (DBN-DAP120) a 0,74 (MTi-GBLUP-DAP60) para altura de plantas. A validação cruzada forward-chaining mostrou um incremento substancial nas acurácias das predições ao usar o modelo DBN, com r = 0,6 (treinando no intervalo 30:45 para prever 120 DAP) até 0,94 (treinando no intervalo 30:90 para prever 105 DAP) em comparação com o BN e PBN, e semelhante aos modelos multivariados GBLUP. Os índices CI e CIL mostraram que o ranking de linhagens promissoras mudou minimamente após 45 DAP para altura de plantas. Estes resultados sugerem que 45 DAP é um estágio de desenvolvimento ideal para impor a estrutura de seleção indireta em dois níveis, onde a seleção indireta para a altura da planta no final da estação (caractere alvo de primeiro nível) pode ser feita com base na sua classificação com 45 DAP (caractere secundário), bem como para a biomassa seca (caractere alvo de segundo nível). Com o avanço das tecnologias robóticas para a fenotipagem baseada em campo, o desenvolvimento de novas abordagens, como a estrutura de seleção indireta em dois níveis, serão imperativas para aumentar o ganho genético por unidade de tempo.Biblioteca Digitais de Teses e Dissertações da USPGarcia, Antonio Augusto FrancoSantos, Jhonathan Pedroso Rigal dos2019-08-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/11/11137/tde-12092019-153123/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2021-09-11T12:57:28Zoai:teses.usp.br:tde-12092019-153123Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212021-09-11T12:57:28Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum Novas redes Bayesianas para predição genômica de caracteres de desenvolvimento em sorgo biomassa |
title |
Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum |
spellingShingle |
Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum Santos, Jhonathan Pedroso Rigal dos Bayesian networks Bioenergia Bioenergy Genomic prediction Predição genômica Redes Bayesianas Sorghum Sorgo |
title_short |
Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum |
title_full |
Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum |
title_fullStr |
Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum |
title_full_unstemmed |
Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum |
title_sort |
Novel Bayesian networks for genomic prediction of developmental traits in biomass sorghum |
author |
Santos, Jhonathan Pedroso Rigal dos |
author_facet |
Santos, Jhonathan Pedroso Rigal dos |
author_role |
author |
dc.contributor.none.fl_str_mv |
Garcia, Antonio Augusto Franco |
dc.contributor.author.fl_str_mv |
Santos, Jhonathan Pedroso Rigal dos |
dc.subject.por.fl_str_mv |
Bayesian networks Bioenergia Bioenergy Genomic prediction Predição genômica Redes Bayesianas Sorghum Sorgo |
topic |
Bayesian networks Bioenergia Bioenergy Genomic prediction Predição genômica Redes Bayesianas Sorghum Sorgo |
description |
Sorghum (Sorghum bicolor L. Moench spp.) is a bioenergy crop with several appealing biological features to be explored in plant breeding for increasing efficiency in bioenergy production. The possibility to connect the influence of quantitative trait loci over time and between traits highlight the Bayesian networks as a powerful probabilistic framework to design novel genomic prediction models. In this study, we phenotyped a diverse panel of 869 sorghum lines in four different environments (2 locations in 2 years) with biweekly measurements from 30 days after planting (DAP) to 120 DAP for plant height and dry biomass at the end of the season. Genotyping-by-sequencing was performed, resulting in the scoring of 100,435 biallelic SNP markers. We developed and evaluated several genomic pre- diction models: Bayesian Network (BN), Pleiotropic Bayesian Network (PBN), and Dynamic Bayesian Network (DBN). Assumptions for BN, PBN, and DBN were independence, dependence between traits, and dependence between time points, respectively. For benchmarking, we used multivariate GBLUP models that considered only time points for plant height (MTi- GBLUP), and both time points for plant height and dry biomass (MTr-GBLUP) modeling unstructured variance-covariance matrix for genetic effects and residuals. Coincidence indices (CI) were computed for understanding the success in selecting for dry biomass using plant height measurements, as well as a coincidence index based on lines (CIL) using the posterior draws from the Bayesian networks to understand genetic plasticity over time. In the 5-fold cross-validation scheme, prediction accuracies ranged from 0.48 (PBN) to 0.51 (MTr- GBLUP) for dry biomass and from 0.47 (DBN-DAP120) to 0.74 (MTi-GBLUP-DAP60) for plant height. The forward-chaining cross-validation showed a substantial increment in prediction accuracies when using the DBN model, with r = 0.6 (train on slice 30:45 to predict 120 DAP) to 0.94 (train on slice 30:90 to predict 105 DAP) compared to the BN and PBN, and similar to multivariate GBLUP models. Both the CI and CIL indices showed that the ranking of promising inbred lines changed minimally after 45 DAP for plant height. These results suggest that 45 DAP is an optimal developmental stage for imposing the two-level indirect selection framework, where indirect selection for plant height at the end of the season (first-level target trait) can be done based on its ranking with 45 DAP (secondary trait) as well as for dry biomass (second-level target trait). With the advance of robotic technologies for field-based phenotyping, the development of novel approaches such as the two-level indirect selection framework will be imperative to boost genetic gain per unit of time. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-08-02 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://www.teses.usp.br/teses/disponiveis/11/11137/tde-12092019-153123/ |
url |
http://www.teses.usp.br/teses/disponiveis/11/11137/tde-12092019-153123/ |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
|
dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.coverage.none.fl_str_mv |
|
dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1815256993053540352 |