Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population

Detalhes bibliográficos
Autor(a) principal: Rodriguez, Juan Diego [UNESP]
Data de Publicação: 2023
Outros Autores: Peripolli, Elisa [UNESP], Londono-Gil, Marisol [UNESP], Espigolan, Rafael, Lobo, Raysildo Barbosa, Lopez-Correa, Rodrigo, Aguilar, Ignacio, Baldi, Fernando [UNESP]
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1071/AN21581
http://hdl.handle.net/11449/245610
Resumo: Context. In beef cattle populations, there is little evidence regarding the minimum number of genetic markers needed to obtain reliable genomic prediction and imputed genotypes. Aims. This study aimed to evaluate the impact of single nucleotide polymorphism (SNP) marker density and minor allele frequency (MAF), on genomic predictions and imputation performance for high and de low heritability traits using the single-step genomic Best Linear Unbiased Prediction methodology (ssGBLUP) in a simulated beef cattle population. Methods. The simulated genomic and phenotypic data were obtained through QMsim software. 735 293 SNPs markers and 7000 quantitative trait loci (QTL) were randomly simulated. The mutation rate (10(-5)), QTL effects distribution (gamma distribution with shape parameter = 0.4) and minor allele frequency (MAF >= 0.02) of markers were used for quality control. A total of 335k SNPs (high density, HD) and 1000 QTLs were finally considered. Densities of 33 500 (35k), 16 750 (16k), 4186 (4k) and 2093 (2k) SNPs were customised through windows of 10, 20, 80 and 160 SNPs by chromosome, respectively. Three marker selection criteria were used within windows: (1) informative markers with MAF values close to 0.5 (HI); (2) less informative markers with the lowest MAF values (LI); (3) markers evenly distributed (ED). We evaluated the prediction of the high-density array and of 12 scenarios of customised SNP arrays, further the imputation performance of them. The genomic predictions and imputed genotypes were obtained with Blupf90 and FImpute software, respectively, and statistics parameters were applied to evaluate the accuracy of genotypes imputed. The Pearson'scorrelation,thecoefficient of regression, and the difference between genomic predictions and true breeding values were used to evaluate the prediction ability (PA), inflation (b), and bias (d), respectively. Key results. Densities above 16k SNPs using HI and ED criteria displayed lower b, higher PA and higher imputation accuracy. Consequently, similar values of PA, b and d were observed with the use of imputed genotypes. The LI criterion with densities higher than 35k SNPs, showed higher PA and similar predictions using imputed genotypes, however lower b and quality of imputed genotypes were observed. Conclusion. The results obtained showed that at least 5% of HI or ED SNPs available in the HD array are necessary to obtain reliable genomic predictions and imputed genotypes. Implications. The development of low-density customised arrays based on criteria of MAF and even distribution of SNPs, might be a cost-effective and feasible approach to implement genomic selection in beef cattle.
id UNSP_e7f0021a9bc19fff2209d0ffadef6017
oai_identifier_str oai:repositorio.unesp.br:11449/245610
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle populationbiasbovinecustomised SNP arraysgenomic selectionimputation accuracyinflationMAFsimulationContext. In beef cattle populations, there is little evidence regarding the minimum number of genetic markers needed to obtain reliable genomic prediction and imputed genotypes. Aims. This study aimed to evaluate the impact of single nucleotide polymorphism (SNP) marker density and minor allele frequency (MAF), on genomic predictions and imputation performance for high and de low heritability traits using the single-step genomic Best Linear Unbiased Prediction methodology (ssGBLUP) in a simulated beef cattle population. Methods. The simulated genomic and phenotypic data were obtained through QMsim software. 735 293 SNPs markers and 7000 quantitative trait loci (QTL) were randomly simulated. The mutation rate (10(-5)), QTL effects distribution (gamma distribution with shape parameter = 0.4) and minor allele frequency (MAF >= 0.02) of markers were used for quality control. A total of 335k SNPs (high density, HD) and 1000 QTLs were finally considered. Densities of 33 500 (35k), 16 750 (16k), 4186 (4k) and 2093 (2k) SNPs were customised through windows of 10, 20, 80 and 160 SNPs by chromosome, respectively. Three marker selection criteria were used within windows: (1) informative markers with MAF values close to 0.5 (HI); (2) less informative markers with the lowest MAF values (LI); (3) markers evenly distributed (ED). We evaluated the prediction of the high-density array and of 12 scenarios of customised SNP arrays, further the imputation performance of them. The genomic predictions and imputed genotypes were obtained with Blupf90 and FImpute software, respectively, and statistics parameters were applied to evaluate the accuracy of genotypes imputed. The Pearson'scorrelation,thecoefficient of regression, and the difference between genomic predictions and true breeding values were used to evaluate the prediction ability (PA), inflation (b), and bias (d), respectively. Key results. Densities above 16k SNPs using HI and ED criteria displayed lower b, higher PA and higher imputation accuracy. Consequently, similar values of PA, b and d were observed with the use of imputed genotypes. The LI criterion with densities higher than 35k SNPs, showed higher PA and similar predictions using imputed genotypes, however lower b and quality of imputed genotypes were observed. Conclusion. The results obtained showed that at least 5% of HI or ED SNPs available in the HD array are necessary to obtain reliable genomic predictions and imputed genotypes. Implications. The development of low-density customised arrays based on criteria of MAF and even distribution of SNPs, might be a cost-effective and feasible approach to implement genomic selection in beef cattle.Coordenação de Aperfeiçoamento de Pessoa de Nível Superior (CAPES)Programa Escala de Estudiantes de Pos-Graduacao of Asociacion de Universidades GRUPO MONTEVIDEO (PEEPg/AUGM-2019)Univ Estadual Paulista Unesp, Fac Ciencias Agr & Vet, Dept Zootecnia, BR-14884900 Jaboticabal, BrazilUniv Sao Paulo, Fac Zootecnia & Engn Alimentos, Dept Med Vet, BR-13535900 Pirassununga, BrazilAssoc Nacl Criadores & Pesquisadores, Ribeirao Preto, BrazilUniv Republica, Fac Vet, Dept Genet & Mejoramiento Anim, Montevideo, UruguayInst Nacl Invest Agr, Montevideo, UruguayUniv Estadual Paulista Unesp, Fac Ciencias Agr & Vet, Dept Zootecnia, BR-14884900 Jaboticabal, BrazilCAPES: 32/2017Csiro PublishingUniversidade Estadual Paulista (UNESP)Universidade de São Paulo (USP)Assoc Nacl Criadores & PesquisadoresUniv RepublicaInst Nacl Invest AgrRodriguez, Juan Diego [UNESP]Peripolli, Elisa [UNESP]Londono-Gil, Marisol [UNESP]Espigolan, RafaelLobo, Raysildo BarbosaLopez-Correa, RodrigoAguilar, IgnacioBaldi, Fernando [UNESP]2023-07-29T12:00:00Z2023-07-29T12:00:00Z2023-04-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article9http://dx.doi.org/10.1071/AN21581Animal Production Science. Clayton: Csiro Publishing, 9 p., 2023.1836-0939http://hdl.handle.net/11449/24561010.1071/AN21581WOS:000962378300001Web of Sciencereponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengAnimal Production Scienceinfo:eu-repo/semantics/openAccess2024-06-07T18:44:00Zoai:repositorio.unesp.br:11449/245610Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T21:35:59.099532Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population
title Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population
spellingShingle Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population
Rodriguez, Juan Diego [UNESP]
bias
bovine
customised SNP arrays
genomic selection
imputation accuracy
inflation
MAF
simulation
title_short Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population
title_full Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population
title_fullStr Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population
title_full_unstemmed Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population
title_sort Effect of minor allele frequency and density of single nucleotide polymorphism marker arrays on imputation performance and prediction ability using the single-step genomic Best Linear Unbiased Prediction in a simulated beef cattle population
author Rodriguez, Juan Diego [UNESP]
author_facet Rodriguez, Juan Diego [UNESP]
Peripolli, Elisa [UNESP]
Londono-Gil, Marisol [UNESP]
Espigolan, Rafael
Lobo, Raysildo Barbosa
Lopez-Correa, Rodrigo
Aguilar, Ignacio
Baldi, Fernando [UNESP]
author_role author
author2 Peripolli, Elisa [UNESP]
Londono-Gil, Marisol [UNESP]
Espigolan, Rafael
Lobo, Raysildo Barbosa
Lopez-Correa, Rodrigo
Aguilar, Ignacio
Baldi, Fernando [UNESP]
author2_role author
author
author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (UNESP)
Universidade de São Paulo (USP)
Assoc Nacl Criadores & Pesquisadores
Univ Republica
Inst Nacl Invest Agr
dc.contributor.author.fl_str_mv Rodriguez, Juan Diego [UNESP]
Peripolli, Elisa [UNESP]
Londono-Gil, Marisol [UNESP]
Espigolan, Rafael
Lobo, Raysildo Barbosa
Lopez-Correa, Rodrigo
Aguilar, Ignacio
Baldi, Fernando [UNESP]
dc.subject.por.fl_str_mv bias
bovine
customised SNP arrays
genomic selection
imputation accuracy
inflation
MAF
simulation
topic bias
bovine
customised SNP arrays
genomic selection
imputation accuracy
inflation
MAF
simulation
description Context. In beef cattle populations, there is little evidence regarding the minimum number of genetic markers needed to obtain reliable genomic prediction and imputed genotypes. Aims. This study aimed to evaluate the impact of single nucleotide polymorphism (SNP) marker density and minor allele frequency (MAF), on genomic predictions and imputation performance for high and de low heritability traits using the single-step genomic Best Linear Unbiased Prediction methodology (ssGBLUP) in a simulated beef cattle population. Methods. The simulated genomic and phenotypic data were obtained through QMsim software. 735 293 SNPs markers and 7000 quantitative trait loci (QTL) were randomly simulated. The mutation rate (10(-5)), QTL effects distribution (gamma distribution with shape parameter = 0.4) and minor allele frequency (MAF >= 0.02) of markers were used for quality control. A total of 335k SNPs (high density, HD) and 1000 QTLs were finally considered. Densities of 33 500 (35k), 16 750 (16k), 4186 (4k) and 2093 (2k) SNPs were customised through windows of 10, 20, 80 and 160 SNPs by chromosome, respectively. Three marker selection criteria were used within windows: (1) informative markers with MAF values close to 0.5 (HI); (2) less informative markers with the lowest MAF values (LI); (3) markers evenly distributed (ED). We evaluated the prediction of the high-density array and of 12 scenarios of customised SNP arrays, further the imputation performance of them. The genomic predictions and imputed genotypes were obtained with Blupf90 and FImpute software, respectively, and statistics parameters were applied to evaluate the accuracy of genotypes imputed. The Pearson'scorrelation,thecoefficient of regression, and the difference between genomic predictions and true breeding values were used to evaluate the prediction ability (PA), inflation (b), and bias (d), respectively. Key results. Densities above 16k SNPs using HI and ED criteria displayed lower b, higher PA and higher imputation accuracy. Consequently, similar values of PA, b and d were observed with the use of imputed genotypes. The LI criterion with densities higher than 35k SNPs, showed higher PA and similar predictions using imputed genotypes, however lower b and quality of imputed genotypes were observed. Conclusion. The results obtained showed that at least 5% of HI or ED SNPs available in the HD array are necessary to obtain reliable genomic predictions and imputed genotypes. Implications. The development of low-density customised arrays based on criteria of MAF and even distribution of SNPs, might be a cost-effective and feasible approach to implement genomic selection in beef cattle.
publishDate 2023
dc.date.none.fl_str_mv 2023-07-29T12:00:00Z
2023-07-29T12:00:00Z
2023-04-03
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1071/AN21581
Animal Production Science. Clayton: Csiro Publishing, 9 p., 2023.
1836-0939
http://hdl.handle.net/11449/245610
10.1071/AN21581
WOS:000962378300001
url http://dx.doi.org/10.1071/AN21581
http://hdl.handle.net/11449/245610
identifier_str_mv Animal Production Science. Clayton: Csiro Publishing, 9 p., 2023.
1836-0939
10.1071/AN21581
WOS:000962378300001
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Animal Production Science
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 9
dc.publisher.none.fl_str_mv Csiro Publishing
publisher.none.fl_str_mv Csiro Publishing
dc.source.none.fl_str_mv Web of Science
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808129340178694144