Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia

Detalhes bibliográficos
Autor(a) principal: Garcia, Baltasar F. [UNESP]
Data de Publicação: 2022
Outros Autores: Yoshida, Grazyella M., Carvalheiro, Roberto [UNESP], Yáñez, José M.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1016/j.aquaculture.2022.737947
http://hdl.handle.net/11449/223301
Resumo: A cost-effective strategy to obtain ultra-dense genomic information is to sequence part of population and perform imputation from lower density genotypes to sequence level for the remaining animals. The aims of this study were to evaluate the feasibility of genotype imputation from medium density to sequence level in Nile tilapia and to investigate the impacts of size and origin of reference population in the accuracy of imputation. Genomic DNA was extracted from fin-clip samples of 326 animals from 3 different populations (PA, PB and PC). After sequencing, alignment, variant calling and quality control of genotypes, approximately 4.6 million of single-nucleotide polymorphisms (SNPs) in common to all populations were retained and used for further imputation analyses. Four scenarios were evaluated to assess imputation accuracy on each population, including: two reference sizes (10 or 90% of animals of each reference population) and two reference origins (two different populations only or all three populations used as reference). The animals in the validation set had part of their genotypes masked keeping only 49,216 SNPs available and the accuracy of imputation was assessed using the correlation between the imputed and observed genotypes (R2). Imputation was carried out using FImpute3 software. At individual level, the R2 showed intermediate values ranging from 0.37 ± 0.04 to 0.56 ± 0.07 for PA, 0.43 ± 0.05 to 0.58 ± 0.08 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC. An increase in the R2 was observed when 90% of animals from the same population were used as reference in comparison to only 10% (0.37 ± 0.04 to 0.54 ± 0.07 for PA, 0.43 ± 0.05 to 0.57 ± 0.07 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC). At SNP level, the use of all three populations as reference yielded the best results in terms of number of SNPs imputed with accuracy greater than 0.8. On average, 676,233 ± 142,291, 666,559 ± 52,648 and 592,187 ± 89,663 SNPs were imputed with accuracy >0.8 for PA, PB and PC, respectively. Considering only these highly accurate imputed SNPs, the average imputation accuracy of samples was equal to 0.95 ± 0.06 for PA and 0.92 ± 0.07 for PB and PC, for scenarios that included more animals as reference (90% of same population as reference, two and three populations). There were no significant differences for R2 between scenarios that used 90% of animals from the same population and used animals from the three population as reference showing that the strategy of using information from other population to increase the reference population had minor effect on accuracy of imputation. In conclusion, it was feasible to impute from 50 K to approximately 700 K with high accuracy using tilapia sequence data. We also expect that the use of more animals from these populations or animals from ascending lines as reference could help in the imputation process to obtain millions of imputed SNPs with high accuracy.
id UNSP_5548f9782ab64be1d64565c45cb2705b
oai_identifier_str oai:repositorio.unesp.br:11449/223301
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapiaGenotype imputationNile tilapiaWhole-genome sequencingA cost-effective strategy to obtain ultra-dense genomic information is to sequence part of population and perform imputation from lower density genotypes to sequence level for the remaining animals. The aims of this study were to evaluate the feasibility of genotype imputation from medium density to sequence level in Nile tilapia and to investigate the impacts of size and origin of reference population in the accuracy of imputation. Genomic DNA was extracted from fin-clip samples of 326 animals from 3 different populations (PA, PB and PC). After sequencing, alignment, variant calling and quality control of genotypes, approximately 4.6 million of single-nucleotide polymorphisms (SNPs) in common to all populations were retained and used for further imputation analyses. Four scenarios were evaluated to assess imputation accuracy on each population, including: two reference sizes (10 or 90% of animals of each reference population) and two reference origins (two different populations only or all three populations used as reference). The animals in the validation set had part of their genotypes masked keeping only 49,216 SNPs available and the accuracy of imputation was assessed using the correlation between the imputed and observed genotypes (R2). Imputation was carried out using FImpute3 software. At individual level, the R2 showed intermediate values ranging from 0.37 ± 0.04 to 0.56 ± 0.07 for PA, 0.43 ± 0.05 to 0.58 ± 0.08 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC. An increase in the R2 was observed when 90% of animals from the same population were used as reference in comparison to only 10% (0.37 ± 0.04 to 0.54 ± 0.07 for PA, 0.43 ± 0.05 to 0.57 ± 0.07 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC). At SNP level, the use of all three populations as reference yielded the best results in terms of number of SNPs imputed with accuracy greater than 0.8. On average, 676,233 ± 142,291, 666,559 ± 52,648 and 592,187 ± 89,663 SNPs were imputed with accuracy >0.8 for PA, PB and PC, respectively. Considering only these highly accurate imputed SNPs, the average imputation accuracy of samples was equal to 0.95 ± 0.06 for PA and 0.92 ± 0.07 for PB and PC, for scenarios that included more animals as reference (90% of same population as reference, two and three populations). There were no significant differences for R2 between scenarios that used 90% of animals from the same population and used animals from the three population as reference showing that the strategy of using information from other population to increase the reference population had minor effect on accuracy of imputation. In conclusion, it was feasible to impute from 50 K to approximately 700 K with high accuracy using tilapia sequence data. We also expect that the use of more animals from these populations or animals from ascending lines as reference could help in the imputation process to obtain millions of imputed SNPs with high accuracy.Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)School of Agricultural and Veterinary Sciences UNESP - São Paulo State UniversityFacultad de Ciencias Veterinarias y Pecuarias Universidad de ChileNational Council for Scientific and Technological Development (CNPq)School of Agricultural and Veterinary Sciences UNESP - São Paulo State UniversityUniversidade Estadual Paulista (UNESP)Universidad de ChileNational Council for Scientific and Technological Development (CNPq)Garcia, Baltasar F. [UNESP]Yoshida, Grazyella M.Carvalheiro, Roberto [UNESP]Yáñez, José M.2022-04-28T19:49:55Z2022-04-28T19:49:55Z2022-03-30info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1016/j.aquaculture.2022.737947Aquaculture, v. 551.0044-8486http://hdl.handle.net/11449/22330110.1016/j.aquaculture.2022.7379472-s2.0-85123244954Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengAquacultureinfo:eu-repo/semantics/openAccess2022-04-28T19:49:56Zoai:repositorio.unesp.br:11449/223301Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462022-04-28T19:49:56Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia
title Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia
spellingShingle Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia
Garcia, Baltasar F. [UNESP]
Genotype imputation
Nile tilapia
Whole-genome sequencing
title_short Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia
title_full Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia
title_fullStr Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia
title_full_unstemmed Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia
title_sort Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia
author Garcia, Baltasar F. [UNESP]
author_facet Garcia, Baltasar F. [UNESP]
Yoshida, Grazyella M.
Carvalheiro, Roberto [UNESP]
Yáñez, José M.
author_role author
author2 Yoshida, Grazyella M.
Carvalheiro, Roberto [UNESP]
Yáñez, José M.
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (UNESP)
Universidad de Chile
National Council for Scientific and Technological Development (CNPq)
dc.contributor.author.fl_str_mv Garcia, Baltasar F. [UNESP]
Yoshida, Grazyella M.
Carvalheiro, Roberto [UNESP]
Yáñez, José M.
dc.subject.por.fl_str_mv Genotype imputation
Nile tilapia
Whole-genome sequencing
topic Genotype imputation
Nile tilapia
Whole-genome sequencing
description A cost-effective strategy to obtain ultra-dense genomic information is to sequence part of population and perform imputation from lower density genotypes to sequence level for the remaining animals. The aims of this study were to evaluate the feasibility of genotype imputation from medium density to sequence level in Nile tilapia and to investigate the impacts of size and origin of reference population in the accuracy of imputation. Genomic DNA was extracted from fin-clip samples of 326 animals from 3 different populations (PA, PB and PC). After sequencing, alignment, variant calling and quality control of genotypes, approximately 4.6 million of single-nucleotide polymorphisms (SNPs) in common to all populations were retained and used for further imputation analyses. Four scenarios were evaluated to assess imputation accuracy on each population, including: two reference sizes (10 or 90% of animals of each reference population) and two reference origins (two different populations only or all three populations used as reference). The animals in the validation set had part of their genotypes masked keeping only 49,216 SNPs available and the accuracy of imputation was assessed using the correlation between the imputed and observed genotypes (R2). Imputation was carried out using FImpute3 software. At individual level, the R2 showed intermediate values ranging from 0.37 ± 0.04 to 0.56 ± 0.07 for PA, 0.43 ± 0.05 to 0.58 ± 0.08 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC. An increase in the R2 was observed when 90% of animals from the same population were used as reference in comparison to only 10% (0.37 ± 0.04 to 0.54 ± 0.07 for PA, 0.43 ± 0.05 to 0.57 ± 0.07 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC). At SNP level, the use of all three populations as reference yielded the best results in terms of number of SNPs imputed with accuracy greater than 0.8. On average, 676,233 ± 142,291, 666,559 ± 52,648 and 592,187 ± 89,663 SNPs were imputed with accuracy >0.8 for PA, PB and PC, respectively. Considering only these highly accurate imputed SNPs, the average imputation accuracy of samples was equal to 0.95 ± 0.06 for PA and 0.92 ± 0.07 for PB and PC, for scenarios that included more animals as reference (90% of same population as reference, two and three populations). There were no significant differences for R2 between scenarios that used 90% of animals from the same population and used animals from the three population as reference showing that the strategy of using information from other population to increase the reference population had minor effect on accuracy of imputation. In conclusion, it was feasible to impute from 50 K to approximately 700 K with high accuracy using tilapia sequence data. We also expect that the use of more animals from these populations or animals from ascending lines as reference could help in the imputation process to obtain millions of imputed SNPs with high accuracy.
publishDate 2022
dc.date.none.fl_str_mv 2022-04-28T19:49:55Z
2022-04-28T19:49:55Z
2022-03-30
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1016/j.aquaculture.2022.737947
Aquaculture, v. 551.
0044-8486
http://hdl.handle.net/11449/223301
10.1016/j.aquaculture.2022.737947
2-s2.0-85123244954
url http://dx.doi.org/10.1016/j.aquaculture.2022.737947
http://hdl.handle.net/11449/223301
identifier_str_mv Aquaculture, v. 551.
0044-8486
10.1016/j.aquaculture.2022.737947
2-s2.0-85123244954
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Aquaculture
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1799965532037840896