Enviromics, nonlinear kernels and optimized training sets for a climate-smart genomic prediction of yield plasticity in maize

Detalhes bibliográficos
Autor(a) principal: Costa Neto, Germano Martins Ferreira
Data de Publicação: 2021
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: https://www.teses.usp.br/teses/disponiveis/11/11137/tde-11102021-134352/
Resumo: Large-scale envirotyping (environmental + typing) or simply enviromics, is an emerging field of data science, applied both in agronomic research and plant breeding. This \"omics\" consists of gathering and processing reliable environmental information, respecting the crop-specific ecophysiology aspects, then for further integration of this data into quantitative genetics and prediction-based breeding. However, most of the current prediction-based platforms are based on genotype-phenotype relationships (i.e., the phenotype-genotype association enabled by whole-genome markers), in which the state-of-art of this approach in the context of predictive breeding is so-called genomic selection or prediction (GP). Despite the success of its use in preliminary breeding stages, mostly conducted under restricted environmental variations (e.g., few number of environments or a single environment), the occurrence of low accuracy values are still a reality under multiple environmental conditions, in which is detected the presence of the so-called \"genotype by environment interaction\" (G×E). On the other hand, knowledge of crop ecophysiology can be the alternative to boost the accuracy of GP under G×E. This environmental variation shapes genotype-specific phenotypic responses to a given gradient of soil, climate and management factors i.e., the reaction norm. In this thesis, we conducted three studies aimed to investigate the use of GP enviromics under G×E scenarios, using for this the grain yield of two datasets of tropical maize hybrids. The first study of this thesis involves the development of the first open-source software dedicated to envirotyping in genomic prediction. In this study, we elucidate the use of remote sensing to popularize the use of envirotyping, as well as aspects of ecophysiology useful to understand and define the concepts of \'environment\', \'enviromics\' and \'envirotyping\'. In the second chapter, we verify the accuracy gains acquired by the adoption of non-linear kernels (Gaussian Kernel, GK; Deep Kernel, DK) for modeling non-additive effects (e.g., dominance and envirotyping-enabled reaction-norms) using the traditional GBLUP (genomic best linear unbiased predictor) as a reference method. Our results suggest that non-linear kernels (GK and DK) are the best alternative to model non-additive and reaction norm effects. The adoption of GK or DK reduced the computational time in running the models, as well as increased the accuracy to predict complex G×E interactions (variations in the rank of genotypes across environments). Finally, we observe that the use of GK or DK for modeling non-additive effects is critical to expand GP\'s resolve to predict the interaction of a particular maize hybrid across multiple environments. Finally, in the third chapter we propose the concept of \'envirotype marker\', developed by reconciling classical concepts of ecophysiology (Shelford\'s Law) and characterization of the environmental typology (i.e., frequency of occurrence of qualitative classes of environmental factors over time and over time. space). The approach was exemplified with two case studies covering the hypothetical use of GP under evaluation trials in maize hybrids in different environments. The combined use of enviromics and genomics made it possible to design a prediction platform (called E-GP) that reconciles selective phenotyping (reduction of training populations for GP) and prediction of future scenarios (i.e., unknown G×E). We observed that the increase in phenotypic information in various environments does not always correspond to the increase in the accuracy of GP. Therefore, the representativeness of each hybrid under evaluation at the experimental network (most representative genotypes, evaluated in \"key\" environments) is more important than the number of genotypes and environments considered for training GP. Through E-GP together with genetic algorithms, we were able to select the most representative G×E combinations, which directly reflected in a drastic reduction in the size of the experimental network, reconciling increased accuracy. Finally, we found that GBLUP without any envirotyping information is inefficient in predicting the phenotypic plasticity of maize hybrids under multiple environments and unknown G×E. With E-GP it was possible to screen the best hybrids, in terms of phenotypic plasticity, using reduced phenotypic information and supplemented by the wide use of genomics and enviromics. Such results allow us to envision smart approaches to climate, involving the drastic reduction of field-testing efforts as the conscious use of enviromics (and envirotyping) combined with genomics increases.
id USP_177236c2efaf286d807a519af0eaa516
oai_identifier_str oai:teses.usp.br:tde-11102021-134352
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Enviromics, nonlinear kernels and optimized training sets for a climate-smart genomic prediction of yield plasticity in maizeEnvirômica, kernels não-lineares e otimização de populações de treinamento na predição genômica inteligente para o clima com foco na plasticidade fenotípica em milhoAdaptabilidadeAdaptabilityCiência de dadosData scienceEnvirotypingGenomic selectionSeleção genômicaTipagem de ambientesLarge-scale envirotyping (environmental + typing) or simply enviromics, is an emerging field of data science, applied both in agronomic research and plant breeding. This \"omics\" consists of gathering and processing reliable environmental information, respecting the crop-specific ecophysiology aspects, then for further integration of this data into quantitative genetics and prediction-based breeding. However, most of the current prediction-based platforms are based on genotype-phenotype relationships (i.e., the phenotype-genotype association enabled by whole-genome markers), in which the state-of-art of this approach in the context of predictive breeding is so-called genomic selection or prediction (GP). Despite the success of its use in preliminary breeding stages, mostly conducted under restricted environmental variations (e.g., few number of environments or a single environment), the occurrence of low accuracy values are still a reality under multiple environmental conditions, in which is detected the presence of the so-called \"genotype by environment interaction\" (G×E). On the other hand, knowledge of crop ecophysiology can be the alternative to boost the accuracy of GP under G×E. This environmental variation shapes genotype-specific phenotypic responses to a given gradient of soil, climate and management factors i.e., the reaction norm. In this thesis, we conducted three studies aimed to investigate the use of GP enviromics under G×E scenarios, using for this the grain yield of two datasets of tropical maize hybrids. The first study of this thesis involves the development of the first open-source software dedicated to envirotyping in genomic prediction. In this study, we elucidate the use of remote sensing to popularize the use of envirotyping, as well as aspects of ecophysiology useful to understand and define the concepts of \'environment\', \'enviromics\' and \'envirotyping\'. In the second chapter, we verify the accuracy gains acquired by the adoption of non-linear kernels (Gaussian Kernel, GK; Deep Kernel, DK) for modeling non-additive effects (e.g., dominance and envirotyping-enabled reaction-norms) using the traditional GBLUP (genomic best linear unbiased predictor) as a reference method. Our results suggest that non-linear kernels (GK and DK) are the best alternative to model non-additive and reaction norm effects. The adoption of GK or DK reduced the computational time in running the models, as well as increased the accuracy to predict complex G×E interactions (variations in the rank of genotypes across environments). Finally, we observe that the use of GK or DK for modeling non-additive effects is critical to expand GP\'s resolve to predict the interaction of a particular maize hybrid across multiple environments. Finally, in the third chapter we propose the concept of \'envirotype marker\', developed by reconciling classical concepts of ecophysiology (Shelford\'s Law) and characterization of the environmental typology (i.e., frequency of occurrence of qualitative classes of environmental factors over time and over time. space). The approach was exemplified with two case studies covering the hypothetical use of GP under evaluation trials in maize hybrids in different environments. The combined use of enviromics and genomics made it possible to design a prediction platform (called E-GP) that reconciles selective phenotyping (reduction of training populations for GP) and prediction of future scenarios (i.e., unknown G×E). We observed that the increase in phenotypic information in various environments does not always correspond to the increase in the accuracy of GP. Therefore, the representativeness of each hybrid under evaluation at the experimental network (most representative genotypes, evaluated in \"key\" environments) is more important than the number of genotypes and environments considered for training GP. Through E-GP together with genetic algorithms, we were able to select the most representative G×E combinations, which directly reflected in a drastic reduction in the size of the experimental network, reconciling increased accuracy. Finally, we found that GBLUP without any envirotyping information is inefficient in predicting the phenotypic plasticity of maize hybrids under multiple environments and unknown G×E. With E-GP it was possible to screen the best hybrids, in terms of phenotypic plasticity, using reduced phenotypic information and supplemented by the wide use of genomics and enviromics. Such results allow us to envision smart approaches to climate, involving the drastic reduction of field-testing efforts as the conscious use of enviromics (and envirotyping) combined with genomics increases.A tipagem de ambientes em larga escala, ou simplesmente a envirômica, é um campo emergente de ciência de dados, tanto na pesquisa agrícola como nas rotinas de programas de melhoramento. Esta \"omica\" consiste em reunir e processar informações ambientais, respeitando a ecofisiologia do cultivo para, por fim, integrá-las na genômica quantitativa e na seleção baseada em modelos preditivos. No entanto, a maioria das atuais plataformas baseadas em predição aplicáveis ao melhoramento de plantas são baseadas nas relações genótipo-fenótipo, isto é; na modelagem a variação fenotípica em função da variação genômica caracterizada por marcadores moleculares, na qual o estado da arte é denominado por seleção ou predição genômica (GP). Apesar do sucesso de seu uso em estágios preliminares de melhoramento, sob condições restritas variações ambientais (p.ex: poucos ambientes ou um único ambiente), baixas acurácias ainda são observadas sob múltiplas condições ambientais, na presença de \"interação genótipo por ambiente\" (G×E). Por outro lado, o conhecimento da ecofisiologia dos cultivos pode ser a alternativa para impulsionar aumentar a acurácia da GP sob G×E. Esta variação ambiental molda respostas fenotípicas específicas de cada genótipo a um dado gradiente de fatores de solo, clima e manejo isto é, a norma de reação. Nesta tese, buscamos estudar esses aspectos, através da realização de três estudos voltados para o uso de envirômica com GP sob cenários de G×E, usando para isso o rendimento de grãos de dois conjuntos de dados de híbridos de milho tropical. O primeiro estudo desta tese envolve o desenvolvimento do primeiro software de código aberto dedicado a ambitipagem (tradução proposta para o termo envirotyping) em predição genômica. Neste estudo, elucidamos o uso de sensoriamento remoto para popularizar o uso da ambitipagem, assim como aspectos de ecofisiologia úteis para compreender e definir os conceitos de \'ambiente\', \'envirômica\' e \'ambitipagem\'. No segundo capítulo, verificamos os ganhos de acurácia adquiridos pela adoção de kernels não lineares (Gaussian Kernel, GK; Deep Kernel, DK) para modelagem de efeitos não-aditivos (p.ex: dominância e ambitipagem), usando o tradicional GBLUP (genomic best linear unbiased predictor) como método de referência. Nossos resultados sugerem que os kernels não lineares (GK e DK) são a melhor alternativa para modelar efeitos não-aditivos e de norma de reação. A adoção de GK ou DK reduziu o tempo computacional na execução dos modelos, como também aumentou a precisão para prever interações G×E complexas/cruzadas (variações no rank dos genótipos através dos ambientes). Por fim, observamos que o uso de GK ou DK para modelagem de efeitos não-aditivos é fundamental para expandir a resolução da GP em predizer a interação de um hibrido de milho particular através de múltiplos ambientes. Finalmente, no terceiro capítulo propomos o conceito de \"marcador qualitativo de ambiente\", desenvolvido conciliando conceitos clássicos de ecofisiologia (Lei de Shelford) e caracterização da tipologia ambiental (isto é, frequência de ocorrência de classes qualitativas de fatores ambientais através do tempo e do espaço). A abordagem foi exemplificada com dois estudos de caso abrangendo o uso hipotético de GP sob ensaios de avaliação de em híbridos de milho em diversos ambientes. O uso combinado de envirômica e genômica possibilitou conceber uma plataforma de predição (denominada E-GP) que concilia fenotipagem seletiva (redução das populações de treinamento para GP) e predição de cenários futuros (isto é, G×E desconhecidas). Observamos que o aumento de informações fenotípicas em vários ambientes nem sempre corresponde ao aumento de acurácia da GP. Portanto, a representatividade da rede de avaliação de híbridos (genótipos mais representativos, avaliados nos ambientes \"chave\") é mais importante que o número de genótipos e ambientes considerados. Através de E-GP juntamente a algoritmos genéticos, fomos capazes de selecionar as combinações G×E mais representativas, o que refletiu diretamente em uma redução drástica do tamanho da rede experimental, conciliando aumento de acurácia. Por fim, constatamos que o GBLUP sem nenhuma informação de ambitipagem é ineficiente em predizer a plasticidade fenotípica dos híbridos de milho sob múltiplos ambientes e G×E desconhecida. Com E-GP foi possível realizar uma triagem dos melhores híbridos, em termos de plasticidade fenotípica, usando reduzidas informações fenotípicas e suplementadas pelo amplo uso de genômica e envirômica. Tais resultados permitem vislumbrar abordagens inteligentes para o clima, envolvendo a redução drástica dos esforços de testes de campo à medida que aumenta o uso consciente de envirômica (e ambitipagem) combinada com genômica.Biblioteca Digitais de Teses e Dissertações da USPFritsche Neto, RobertoCosta Neto, Germano Martins Ferreira2021-07-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/11/11137/tde-11102021-134352/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2021-10-13T20:02:03Zoai:teses.usp.br:tde-11102021-134352Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212021-10-13T20:02:03Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Enviromics, nonlinear kernels and optimized training sets for a climate-smart genomic prediction of yield plasticity in maize
Envirômica, kernels não-lineares e otimização de populações de treinamento na predição genômica inteligente para o clima com foco na plasticidade fenotípica em milho
title Enviromics, nonlinear kernels and optimized training sets for a climate-smart genomic prediction of yield plasticity in maize
spellingShingle Enviromics, nonlinear kernels and optimized training sets for a climate-smart genomic prediction of yield plasticity in maize
Costa Neto, Germano Martins Ferreira
Adaptabilidade
Adaptability
Ciência de dados
Data science
Envirotyping
Genomic selection
Seleção genômica
Tipagem de ambientes
title_short Enviromics, nonlinear kernels and optimized training sets for a climate-smart genomic prediction of yield plasticity in maize
title_full Enviromics, nonlinear kernels and optimized training sets for a climate-smart genomic prediction of yield plasticity in maize
title_fullStr Enviromics, nonlinear kernels and optimized training sets for a climate-smart genomic prediction of yield plasticity in maize
title_full_unstemmed Enviromics, nonlinear kernels and optimized training sets for a climate-smart genomic prediction of yield plasticity in maize
title_sort Enviromics, nonlinear kernels and optimized training sets for a climate-smart genomic prediction of yield plasticity in maize
author Costa Neto, Germano Martins Ferreira
author_facet Costa Neto, Germano Martins Ferreira
author_role author
dc.contributor.none.fl_str_mv Fritsche Neto, Roberto
dc.contributor.author.fl_str_mv Costa Neto, Germano Martins Ferreira
dc.subject.por.fl_str_mv Adaptabilidade
Adaptability
Ciência de dados
Data science
Envirotyping
Genomic selection
Seleção genômica
Tipagem de ambientes
topic Adaptabilidade
Adaptability
Ciência de dados
Data science
Envirotyping
Genomic selection
Seleção genômica
Tipagem de ambientes
description Large-scale envirotyping (environmental + typing) or simply enviromics, is an emerging field of data science, applied both in agronomic research and plant breeding. This \"omics\" consists of gathering and processing reliable environmental information, respecting the crop-specific ecophysiology aspects, then for further integration of this data into quantitative genetics and prediction-based breeding. However, most of the current prediction-based platforms are based on genotype-phenotype relationships (i.e., the phenotype-genotype association enabled by whole-genome markers), in which the state-of-art of this approach in the context of predictive breeding is so-called genomic selection or prediction (GP). Despite the success of its use in preliminary breeding stages, mostly conducted under restricted environmental variations (e.g., few number of environments or a single environment), the occurrence of low accuracy values are still a reality under multiple environmental conditions, in which is detected the presence of the so-called \"genotype by environment interaction\" (G×E). On the other hand, knowledge of crop ecophysiology can be the alternative to boost the accuracy of GP under G×E. This environmental variation shapes genotype-specific phenotypic responses to a given gradient of soil, climate and management factors i.e., the reaction norm. In this thesis, we conducted three studies aimed to investigate the use of GP enviromics under G×E scenarios, using for this the grain yield of two datasets of tropical maize hybrids. The first study of this thesis involves the development of the first open-source software dedicated to envirotyping in genomic prediction. In this study, we elucidate the use of remote sensing to popularize the use of envirotyping, as well as aspects of ecophysiology useful to understand and define the concepts of \'environment\', \'enviromics\' and \'envirotyping\'. In the second chapter, we verify the accuracy gains acquired by the adoption of non-linear kernels (Gaussian Kernel, GK; Deep Kernel, DK) for modeling non-additive effects (e.g., dominance and envirotyping-enabled reaction-norms) using the traditional GBLUP (genomic best linear unbiased predictor) as a reference method. Our results suggest that non-linear kernels (GK and DK) are the best alternative to model non-additive and reaction norm effects. The adoption of GK or DK reduced the computational time in running the models, as well as increased the accuracy to predict complex G×E interactions (variations in the rank of genotypes across environments). Finally, we observe that the use of GK or DK for modeling non-additive effects is critical to expand GP\'s resolve to predict the interaction of a particular maize hybrid across multiple environments. Finally, in the third chapter we propose the concept of \'envirotype marker\', developed by reconciling classical concepts of ecophysiology (Shelford\'s Law) and characterization of the environmental typology (i.e., frequency of occurrence of qualitative classes of environmental factors over time and over time. space). The approach was exemplified with two case studies covering the hypothetical use of GP under evaluation trials in maize hybrids in different environments. The combined use of enviromics and genomics made it possible to design a prediction platform (called E-GP) that reconciles selective phenotyping (reduction of training populations for GP) and prediction of future scenarios (i.e., unknown G×E). We observed that the increase in phenotypic information in various environments does not always correspond to the increase in the accuracy of GP. Therefore, the representativeness of each hybrid under evaluation at the experimental network (most representative genotypes, evaluated in \"key\" environments) is more important than the number of genotypes and environments considered for training GP. Through E-GP together with genetic algorithms, we were able to select the most representative G×E combinations, which directly reflected in a drastic reduction in the size of the experimental network, reconciling increased accuracy. Finally, we found that GBLUP without any envirotyping information is inefficient in predicting the phenotypic plasticity of maize hybrids under multiple environments and unknown G×E. With E-GP it was possible to screen the best hybrids, in terms of phenotypic plasticity, using reduced phenotypic information and supplemented by the wide use of genomics and enviromics. Such results allow us to envision smart approaches to climate, involving the drastic reduction of field-testing efforts as the conscious use of enviromics (and envirotyping) combined with genomics increases.
publishDate 2021
dc.date.none.fl_str_mv 2021-07-22
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/11/11137/tde-11102021-134352/
url https://www.teses.usp.br/teses/disponiveis/11/11137/tde-11102021-134352/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1809090629269979136