Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1071/AN21581 http://hdl.handle.net/11449/245610 |
Resumo: | Context. In beef cattle populations, there is little evidence regarding the minimum number of genetic markers needed to obtain reliable genomic prediction and imputed genotypes. Aims. This study aimed to evaluate the impact of single nucleotide polymorphism (SNP) marker density and minor allele frequency (MAF), on genomic predictions and imputation performance for high and de low heritability traits using the single-step genomic Best Linear Unbiased Prediction methodology (ssGBLUP) in a simulated beef cattle population. Methods. The simulated genomic and phenotypic data were obtained through QMsim software. 735 293 SNPs markers and 7000 quantitative trait loci (QTL) were randomly simulated. The mutation rate (10(-5)), QTL effects distribution (gamma distribution with shape parameter = 0.4) and minor allele frequency (MAF >= 0.02) of markers were used for quality control. A total of 335k SNPs (high density, HD) and 1000 QTLs were finally considered. Densities of 33 500 (35k), 16 750 (16k), 4186 (4k) and 2093 (2k) SNPs were customised through windows of 10, 20, 80 and 160 SNPs by chromosome, respectively. Three marker selection criteria were used within windows: (1) informative markers with MAF values close to 0.5 (HI); (2) less informative markers with the lowest MAF values (LI); (3) markers evenly distributed (ED). We evaluated the prediction of the high-density array and of 12 scenarios of customised SNP arrays, further the imputation performance of them. The genomic predictions and imputed genotypes were obtained with Blupf90 and FImpute software, respectively, and statistics parameters were applied to evaluate the accuracy of genotypes imputed. The Pearson'scorrelation,thecoefficient of regression, and the difference between genomic predictions and true breeding values were used to evaluate the prediction ability (PA), inflation (b), and bias (d), respectively. Key results. Densities above 16k SNPs using HI and ED criteria displayed lower b, higher PA and higher imputation accuracy. Consequently, similar values of PA, b and d were observed with the use of imputed genotypes. The LI criterion with densities higher than 35k SNPs, showed higher PA and similar predictions using imputed genotypes, however lower b and quality of imputed genotypes were observed. Conclusion. The results obtained showed that at least 5% of HI or ED SNPs available in the HD array are necessary to obtain reliable genomic predictions and imputed genotypes. Implications. The development of low-density customised arrays based on criteria of MAF and even distribution of SNPs, might be a cost-effective and feasible approach to implement genomic selection in beef cattle. |
id |
UNSP_e7f0021a9bc19fff2209d0ffadef6017 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/245610 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle populationbiasbovinecustomised SNP arraysgenomic selectionimputation accuracyinflationMAFsimulationContext. In beef cattle populations, there is little evidence regarding the minimum number of genetic markers needed to obtain reliable genomic prediction and imputed genotypes. Aims. This study aimed to evaluate the impact of single nucleotide polymorphism (SNP) marker density and minor allele frequency (MAF), on genomic predictions and imputation performance for high and de low heritability traits using the single-step genomic Best Linear Unbiased Prediction methodology (ssGBLUP) in a simulated beef cattle population. Methods. The simulated genomic and phenotypic data were obtained through QMsim software. 735 293 SNPs markers and 7000 quantitative trait loci (QTL) were randomly simulated. The mutation rate (10(-5)), QTL effects distribution (gamma distribution with shape parameter = 0.4) and minor allele frequency (MAF >= 0.02) of markers were used for quality control. A total of 335k SNPs (high density, HD) and 1000 QTLs were finally considered. Densities of 33 500 (35k), 16 750 (16k), 4186 (4k) and 2093 (2k) SNPs were customised through windows of 10, 20, 80 and 160 SNPs by chromosome, respectively. Three marker selection criteria were used within windows: (1) informative markers with MAF values close to 0.5 (HI); (2) less informative markers with the lowest MAF values (LI); (3) markers evenly distributed (ED). We evaluated the prediction of the high-density array and of 12 scenarios of customised SNP arrays, further the imputation performance of them. The genomic predictions and imputed genotypes were obtained with Blupf90 and FImpute software, respectively, and statistics parameters were applied to evaluate the accuracy of genotypes imputed. The Pearson'scorrelation,thecoefficient of regression, and the difference between genomic predictions and true breeding values were used to evaluate the prediction ability (PA), inflation (b), and bias (d), respectively. Key results. Densities above 16k SNPs using HI and ED criteria displayed lower b, higher PA and higher imputation accuracy. Consequently, similar values of PA, b and d were observed with the use of imputed genotypes. The LI criterion with densities higher than 35k SNPs, showed higher PA and similar predictions using imputed genotypes, however lower b and quality of imputed genotypes were observed. Conclusion. The results obtained showed that at least 5% of HI or ED SNPs available in the HD array are necessary to obtain reliable genomic predictions and imputed genotypes. Implications. The development of low-density customised arrays based on criteria of MAF and even distribution of SNPs, might be a cost-effective and feasible approach to implement genomic selection in beef cattle.Coordenação de Aperfeiçoamento de Pessoa de Nível Superior (CAPES)Programa Escala de Estudiantes de Pos-Graduacao of Asociacion de Universidades GRUPO MONTEVIDEO (PEEPg/AUGM-2019)Univ Estadual Paulista Unesp, Fac Ciencias Agr & Vet, Dept Zootecnia, BR-14884900 Jaboticabal, BrazilUniv Sao Paulo, Fac Zootecnia & Engn Alimentos, Dept Med Vet, BR-13535900 Pirassununga, BrazilAssoc Nacl Criadores & Pesquisadores, Ribeirao Preto, BrazilUniv Republica, Fac Vet, Dept Genet & Mejoramiento Anim, Montevideo, UruguayInst Nacl Invest Agr, Montevideo, UruguayUniv Estadual Paulista Unesp, Fac Ciencias Agr & Vet, Dept Zootecnia, BR-14884900 Jaboticabal, BrazilCAPES: 32/2017Csiro PublishingUniversidade Estadual Paulista (UNESP)Universidade de São Paulo (USP)Assoc Nacl Criadores & PesquisadoresUniv RepublicaInst Nacl Invest AgrRodriguez, Juan Diego [UNESP]Peripolli, Elisa [UNESP]Londono-Gil, Marisol [UNESP]Espigolan, RafaelLobo, Raysildo BarbosaLopez-Correa, RodrigoAguilar, IgnacioBaldi, Fernando [UNESP]2023-07-29T12:00:00Z2023-07-29T12:00:00Z2023-04-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article9http://dx.doi.org/10.1071/AN21581Animal Production Science. Clayton: Csiro Publishing, 9 p., 2023.1836-0939http://hdl.handle.net/11449/24561010.1071/AN21581WOS:000962378300001Web of Sciencereponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengAnimal Production Scienceinfo:eu-repo/semantics/openAccess2024-06-07T18:44:00Zoai:repositorio.unesp.br:11449/245610Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T21:35:59.099532Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population |
title |
Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population |
spellingShingle |
Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population Rodriguez, Juan Diego [UNESP] bias bovine customised SNP arrays genomic selection imputation accuracy inflation MAF simulation |
title_short |
Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population |
title_full |
Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population |
title_fullStr |
Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population |
title_full_unstemmed |
Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population |
title_sort |
Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population |
author |
Rodriguez, Juan Diego [UNESP] |
author_facet |
Rodriguez, Juan Diego [UNESP] Peripolli, Elisa [UNESP] Londono-Gil, Marisol [UNESP] Espigolan, Rafael Lobo, Raysildo Barbosa Lopez-Correa, Rodrigo Aguilar, Ignacio Baldi, Fernando [UNESP] |
author_role |
author |
author2 |
Peripolli, Elisa [UNESP] Londono-Gil, Marisol [UNESP] Espigolan, Rafael Lobo, Raysildo Barbosa Lopez-Correa, Rodrigo Aguilar, Ignacio Baldi, Fernando [UNESP] |
author2_role |
author author author author author author author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (UNESP) Universidade de São Paulo (USP) Assoc Nacl Criadores & Pesquisadores Univ Republica Inst Nacl Invest Agr |
dc.contributor.author.fl_str_mv |
Rodriguez, Juan Diego [UNESP] Peripolli, Elisa [UNESP] Londono-Gil, Marisol [UNESP] Espigolan, Rafael Lobo, Raysildo Barbosa Lopez-Correa, Rodrigo Aguilar, Ignacio Baldi, Fernando [UNESP] |
dc.subject.por.fl_str_mv |
bias bovine customised SNP arrays genomic selection imputation accuracy inflation MAF simulation |
topic |
bias bovine customised SNP arrays genomic selection imputation accuracy inflation MAF simulation |
description |
Context. In beef cattle populations, there is little evidence regarding the minimum number of genetic markers needed to obtain reliable genomic prediction and imputed genotypes. Aims. This study aimed to evaluate the impact of single nucleotide polymorphism (SNP) marker density and minor allele frequency (MAF), on genomic predictions and imputation performance for high and de low heritability traits using the single-step genomic Best Linear Unbiased Prediction methodology (ssGBLUP) in a simulated beef cattle population. Methods. The simulated genomic and phenotypic data were obtained through QMsim software. 735 293 SNPs markers and 7000 quantitative trait loci (QTL) were randomly simulated. The mutation rate (10(-5)), QTL effects distribution (gamma distribution with shape parameter = 0.4) and minor allele frequency (MAF >= 0.02) of markers were used for quality control. A total of 335k SNPs (high density, HD) and 1000 QTLs were finally considered. Densities of 33 500 (35k), 16 750 (16k), 4186 (4k) and 2093 (2k) SNPs were customised through windows of 10, 20, 80 and 160 SNPs by chromosome, respectively. Three marker selection criteria were used within windows: (1) informative markers with MAF values close to 0.5 (HI); (2) less informative markers with the lowest MAF values (LI); (3) markers evenly distributed (ED). We evaluated the prediction of the high-density array and of 12 scenarios of customised SNP arrays, further the imputation performance of them. The genomic predictions and imputed genotypes were obtained with Blupf90 and FImpute software, respectively, and statistics parameters were applied to evaluate the accuracy of genotypes imputed. The Pearson'scorrelation,thecoefficient of regression, and the difference between genomic predictions and true breeding values were used to evaluate the prediction ability (PA), inflation (b), and bias (d), respectively. Key results. Densities above 16k SNPs using HI and ED criteria displayed lower b, higher PA and higher imputation accuracy. Consequently, similar values of PA, b and d were observed with the use of imputed genotypes. The LI criterion with densities higher than 35k SNPs, showed higher PA and similar predictions using imputed genotypes, however lower b and quality of imputed genotypes were observed. Conclusion. The results obtained showed that at least 5% of HI or ED SNPs available in the HD array are necessary to obtain reliable genomic predictions and imputed genotypes. Implications. The development of low-density customised arrays based on criteria of MAF and even distribution of SNPs, might be a cost-effective and feasible approach to implement genomic selection in beef cattle. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-07-29T12:00:00Z 2023-07-29T12:00:00Z 2023-04-03 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1071/AN21581 Animal Production Science. Clayton: Csiro Publishing, 9 p., 2023. 1836-0939 http://hdl.handle.net/11449/245610 10.1071/AN21581 WOS:000962378300001 |
url |
http://dx.doi.org/10.1071/AN21581 http://hdl.handle.net/11449/245610 |
identifier_str_mv |
Animal Production Science. Clayton: Csiro Publishing, 9 p., 2023. 1836-0939 10.1071/AN21581 WOS:000962378300001 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Animal Production Science |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
9 |
dc.publisher.none.fl_str_mv |
Csiro Publishing |
publisher.none.fl_str_mv |
Csiro Publishing |
dc.source.none.fl_str_mv |
Web of Science reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808129340178694144 |