Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1016/j.aquaculture.2022.737947 http://hdl.handle.net/11449/223301 |
Resumo: | A cost-effective strategy to obtain ultra-dense genomic information is to sequence part of population and perform imputation from lower density genotypes to sequence level for the remaining animals. The aims of this study were to evaluate the feasibility of genotype imputation from medium density to sequence level in Nile tilapia and to investigate the impacts of size and origin of reference population in the accuracy of imputation. Genomic DNA was extracted from fin-clip samples of 326 animals from 3 different populations (PA, PB and PC). After sequencing, alignment, variant calling and quality control of genotypes, approximately 4.6 million of single-nucleotide polymorphisms (SNPs) in common to all populations were retained and used for further imputation analyses. Four scenarios were evaluated to assess imputation accuracy on each population, including: two reference sizes (10 or 90% of animals of each reference population) and two reference origins (two different populations only or all three populations used as reference). The animals in the validation set had part of their genotypes masked keeping only 49,216 SNPs available and the accuracy of imputation was assessed using the correlation between the imputed and observed genotypes (R2). Imputation was carried out using FImpute3 software. At individual level, the R2 showed intermediate values ranging from 0.37 ± 0.04 to 0.56 ± 0.07 for PA, 0.43 ± 0.05 to 0.58 ± 0.08 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC. An increase in the R2 was observed when 90% of animals from the same population were used as reference in comparison to only 10% (0.37 ± 0.04 to 0.54 ± 0.07 for PA, 0.43 ± 0.05 to 0.57 ± 0.07 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC). At SNP level, the use of all three populations as reference yielded the best results in terms of number of SNPs imputed with accuracy greater than 0.8. On average, 676,233 ± 142,291, 666,559 ± 52,648 and 592,187 ± 89,663 SNPs were imputed with accuracy >0.8 for PA, PB and PC, respectively. Considering only these highly accurate imputed SNPs, the average imputation accuracy of samples was equal to 0.95 ± 0.06 for PA and 0.92 ± 0.07 for PB and PC, for scenarios that included more animals as reference (90% of same population as reference, two and three populations). There were no significant differences for R2 between scenarios that used 90% of animals from the same population and used animals from the three population as reference showing that the strategy of using information from other population to increase the reference population had minor effect on accuracy of imputation. In conclusion, it was feasible to impute from 50 K to approximately 700 K with high accuracy using tilapia sequence data. We also expect that the use of more animals from these populations or animals from ascending lines as reference could help in the imputation process to obtain millions of imputed SNPs with high accuracy. |
id |
UNSP_5548f9782ab64be1d64565c45cb2705b |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/223301 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapiaGenotype imputationNile tilapiaWhole-genome sequencingA cost-effective strategy to obtain ultra-dense genomic information is to sequence part of population and perform imputation from lower density genotypes to sequence level for the remaining animals. The aims of this study were to evaluate the feasibility of genotype imputation from medium density to sequence level in Nile tilapia and to investigate the impacts of size and origin of reference population in the accuracy of imputation. Genomic DNA was extracted from fin-clip samples of 326 animals from 3 different populations (PA, PB and PC). After sequencing, alignment, variant calling and quality control of genotypes, approximately 4.6 million of single-nucleotide polymorphisms (SNPs) in common to all populations were retained and used for further imputation analyses. Four scenarios were evaluated to assess imputation accuracy on each population, including: two reference sizes (10 or 90% of animals of each reference population) and two reference origins (two different populations only or all three populations used as reference). The animals in the validation set had part of their genotypes masked keeping only 49,216 SNPs available and the accuracy of imputation was assessed using the correlation between the imputed and observed genotypes (R2). Imputation was carried out using FImpute3 software. At individual level, the R2 showed intermediate values ranging from 0.37 ± 0.04 to 0.56 ± 0.07 for PA, 0.43 ± 0.05 to 0.58 ± 0.08 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC. An increase in the R2 was observed when 90% of animals from the same population were used as reference in comparison to only 10% (0.37 ± 0.04 to 0.54 ± 0.07 for PA, 0.43 ± 0.05 to 0.57 ± 0.07 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC). At SNP level, the use of all three populations as reference yielded the best results in terms of number of SNPs imputed with accuracy greater than 0.8. On average, 676,233 ± 142,291, 666,559 ± 52,648 and 592,187 ± 89,663 SNPs were imputed with accuracy >0.8 for PA, PB and PC, respectively. Considering only these highly accurate imputed SNPs, the average imputation accuracy of samples was equal to 0.95 ± 0.06 for PA and 0.92 ± 0.07 for PB and PC, for scenarios that included more animals as reference (90% of same population as reference, two and three populations). There were no significant differences for R2 between scenarios that used 90% of animals from the same population and used animals from the three population as reference showing that the strategy of using information from other population to increase the reference population had minor effect on accuracy of imputation. In conclusion, it was feasible to impute from 50 K to approximately 700 K with high accuracy using tilapia sequence data. We also expect that the use of more animals from these populations or animals from ascending lines as reference could help in the imputation process to obtain millions of imputed SNPs with high accuracy.Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)School of Agricultural and Veterinary Sciences UNESP - São Paulo State UniversityFacultad de Ciencias Veterinarias y Pecuarias Universidad de ChileNational Council for Scientific and Technological Development (CNPq)School of Agricultural and Veterinary Sciences UNESP - São Paulo State UniversityUniversidade Estadual Paulista (UNESP)Universidad de ChileNational Council for Scientific and Technological Development (CNPq)Garcia, Baltasar F. [UNESP]Yoshida, Grazyella M.Carvalheiro, Roberto [UNESP]Yáñez, José M.2022-04-28T19:49:55Z2022-04-28T19:49:55Z2022-03-30info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1016/j.aquaculture.2022.737947Aquaculture, v. 551.0044-8486http://hdl.handle.net/11449/22330110.1016/j.aquaculture.2022.7379472-s2.0-85123244954Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengAquacultureinfo:eu-repo/semantics/openAccess2022-04-28T19:49:56Zoai:repositorio.unesp.br:11449/223301Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T22:14:41.318625Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia |
title |
Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia |
spellingShingle |
Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia Garcia, Baltasar F. [UNESP] Genotype imputation Nile tilapia Whole-genome sequencing |
title_short |
Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia |
title_full |
Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia |
title_fullStr |
Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia |
title_full_unstemmed |
Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia |
title_sort |
Accuracy of genotype imputation to whole genome sequencing level using different populations of Nile tilapia |
author |
Garcia, Baltasar F. [UNESP] |
author_facet |
Garcia, Baltasar F. [UNESP] Yoshida, Grazyella M. Carvalheiro, Roberto [UNESP] Yáñez, José M. |
author_role |
author |
author2 |
Yoshida, Grazyella M. Carvalheiro, Roberto [UNESP] Yáñez, José M. |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (UNESP) Universidad de Chile National Council for Scientific and Technological Development (CNPq) |
dc.contributor.author.fl_str_mv |
Garcia, Baltasar F. [UNESP] Yoshida, Grazyella M. Carvalheiro, Roberto [UNESP] Yáñez, José M. |
dc.subject.por.fl_str_mv |
Genotype imputation Nile tilapia Whole-genome sequencing |
topic |
Genotype imputation Nile tilapia Whole-genome sequencing |
description |
A cost-effective strategy to obtain ultra-dense genomic information is to sequence part of population and perform imputation from lower density genotypes to sequence level for the remaining animals. The aims of this study were to evaluate the feasibility of genotype imputation from medium density to sequence level in Nile tilapia and to investigate the impacts of size and origin of reference population in the accuracy of imputation. Genomic DNA was extracted from fin-clip samples of 326 animals from 3 different populations (PA, PB and PC). After sequencing, alignment, variant calling and quality control of genotypes, approximately 4.6 million of single-nucleotide polymorphisms (SNPs) in common to all populations were retained and used for further imputation analyses. Four scenarios were evaluated to assess imputation accuracy on each population, including: two reference sizes (10 or 90% of animals of each reference population) and two reference origins (two different populations only or all three populations used as reference). The animals in the validation set had part of their genotypes masked keeping only 49,216 SNPs available and the accuracy of imputation was assessed using the correlation between the imputed and observed genotypes (R2). Imputation was carried out using FImpute3 software. At individual level, the R2 showed intermediate values ranging from 0.37 ± 0.04 to 0.56 ± 0.07 for PA, 0.43 ± 0.05 to 0.58 ± 0.08 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC. An increase in the R2 was observed when 90% of animals from the same population were used as reference in comparison to only 10% (0.37 ± 0.04 to 0.54 ± 0.07 for PA, 0.43 ± 0.05 to 0.57 ± 0.07 for PB and 0.43 ± 0.05 to 0.58 ± 0.07 for PC). At SNP level, the use of all three populations as reference yielded the best results in terms of number of SNPs imputed with accuracy greater than 0.8. On average, 676,233 ± 142,291, 666,559 ± 52,648 and 592,187 ± 89,663 SNPs were imputed with accuracy >0.8 for PA, PB and PC, respectively. Considering only these highly accurate imputed SNPs, the average imputation accuracy of samples was equal to 0.95 ± 0.06 for PA and 0.92 ± 0.07 for PB and PC, for scenarios that included more animals as reference (90% of same population as reference, two and three populations). There were no significant differences for R2 between scenarios that used 90% of animals from the same population and used animals from the three population as reference showing that the strategy of using information from other population to increase the reference population had minor effect on accuracy of imputation. In conclusion, it was feasible to impute from 50 K to approximately 700 K with high accuracy using tilapia sequence data. We also expect that the use of more animals from these populations or animals from ascending lines as reference could help in the imputation process to obtain millions of imputed SNPs with high accuracy. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-04-28T19:49:55Z 2022-04-28T19:49:55Z 2022-03-30 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1016/j.aquaculture.2022.737947 Aquaculture, v. 551. 0044-8486 http://hdl.handle.net/11449/223301 10.1016/j.aquaculture.2022.737947 2-s2.0-85123244954 |
url |
http://dx.doi.org/10.1016/j.aquaculture.2022.737947 http://hdl.handle.net/11449/223301 |
identifier_str_mv |
Aquaculture, v. 551. 0044-8486 10.1016/j.aquaculture.2022.737947 2-s2.0-85123244954 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Aquaculture |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808129408136904704 |