NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , , , , , , , , , , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da FIOCRUZ (ARCA) |
Texto Completo: | https://www.arca.fiocruz.br/handle/icict/55556 |
Resumo: | Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Lerner Research Institute. Genomic Medicine.Cleveland Clinic. Cleveland, OH, United States |
id |
CRUZ_3d9db3d373c4e0dc23ca94350e0ef9b7 |
---|---|
oai_identifier_str |
oai:www.arca.fiocruz.br:icict/55556 |
network_acronym_str |
CRUZ |
network_name_str |
Repositório Institucional da FIOCRUZ (ARCA) |
repository_id_str |
2135 |
spelling |
Leal, Thiago PeixotoFurlan, Vinicius CGouveia, Mateus HenriqueDuarte, Julia Maria SaraivaFonseca, Pablo ASTou, RafaelScliar, Marilia de OliveiraAraujo, Gilderlanio Santana deCosta, Lucas F.Zolini, CamilaPeixoto, Maria Gabriela Campolina DinizCarvalho, Maria Raquel SantosCosta, Maria Fernanda Furtado LimaGilman, Robert HTarazona-Santos, EduardoRodrigues, Maíra Ribeiro2022-11-09T17:37:28Z2022-11-09T17:37:28Z2022LEAL,Thiago Peixoto et al. NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses. Comput Struct Biotechnol J., v. 20, p. 1821–1828, 2022. doi: 10.1016/j.csbj.2022.04.0092001-0370https://www.arca.fiocruz.br/handle/icict/55556engNature PublishingNAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analysesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Lerner Research Institute. Genomic Medicine.Cleveland Clinic. Cleveland, OH, United StatesDepartamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, BrazilUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Center for Research on Genomics & Global Health. National Human Genome Research Institute. National Institutes of Health. Bethesda, MD, United StatesUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, BrazilUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Centre for Genetic Improvement of Livestock. Department of Animal Biosciences. University of Guelph. Guelph, Ontario, CanadáUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, BrazilUniversidade de São Paulo. Instituto de Biociências. Centro de Estudos doGenoma Humano e Células-Tronco. São Paulo, SP, BrazilUniversidade Federal do Pará. Instituto de Ciências Biológicas. Programa de Pós-Graduação em Biologia Molecular. Laboratório de Genética Humana e Médica. Belém, PA, BrazilUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, BrazilUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Beagle. Belo Horizonte, MG, Brazil/Mosaico Translational Genomics Initiative, Belo Horizonte, MG, BrazilEmbrapa Gado de Leite, Embrapa, Juiz de Fora, MG, BrazilUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Instituto René Rachou. Belo Horizonte, MG, BrazilUniversidad Peruana Cayetano Heredia.Lima, Lima, Perú/ Dept of International Health. Johns Hopkins School of Public Health Baltimore. Baltimore, MD, USAUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Mosaico Translational Genomics Initiative, Belo Horizonte, MG, Brazil/Dept of International Health. Johns Hopkins School of Public Health Baltimore. Baltimore, MD, USAUniversidade de São Paulo. Instituto de Biociências. Departamento de Genética e Biologia Evolutiva. São Paulo, SP, BrazilGenetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to complex datasets. When compared with two other popular population genetics methodologies (PLINK and KING), NAToRA shows the best combination of removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar effects on the allele frequency spectrum and Principal Component Analysis than PLINK and KING. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also release a genealogies simulator software used for different tests performed in this study.Complex network theoryPopulation geneticsGenetic kinshipGenealogies simulatorinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da FIOCRUZ (ARCA)instname:Fundação Oswaldo Cruz (FIOCRUZ)instacron:FIOCRUZLICENSElicense.txtlicense.txttext/plain; charset=utf-82991https://www.arca.fiocruz.br/bitstream/icict/55556/1/license.txt5a560609d32a3863062d77ff32785d58MD51ORIGINALNAToRA, a relatedness-pruning method to minimize the loss of dataset.pdfNAToRA, a relatedness-pruning method to minimize the loss of dataset.pdfapplication/pdf1200255https://www.arca.fiocruz.br/bitstream/icict/55556/2/NAToRA%2c%20a%20relatedness-pruning%20method%20to%20minimize%20the%20loss%20of%20dataset.pdf8a4dbf223833c07630f8d7824c8a18cdMD52icict/555562022-11-09 14:37:29.286oai:www.arca.fiocruz.br:icict/55556Q0VTU8ODTyBOw4NPIEVYQ0xVU0lWQSBERSBESVJFSVRPUyBBVVRPUkFJUwoKQW8gYWNlaXRhciBvcyBURVJNT1MgZSBDT05EScOHw5VFUyBkZXN0YSBDRVNTw4NPLCBvIEFVVE9SIGUvb3UgVElUVUxBUiBkZSBkaXJlaXRvcwphdXRvcmFpcyBzb2JyZSBhIE9CUkEgZGUgcXVlIHRyYXRhIGVzdGUgZG9jdW1lbnRvOgoKKDEpIENFREUgZSBUUkFOU0ZFUkUsIHRvdGFsIGUgZ3JhdHVpdGFtZW50ZSwgw6AgRklPQ1JVWiAtIEZVTkRBw4fDg08gT1NXQUxETyBDUlVaLCBlbQpjYXLDoXRlciBwZXJtYW5lbnRlLCBpcnJldm9nw6F2ZWwgZSBOw4NPIEVYQ0xVU0lWTywgdG9kb3Mgb3MgZGlyZWl0b3MgcGF0cmltb25pYWlzIE7Dg08KQ09NRVJDSUFJUyBkZSB1dGlsaXphw6fDo28gZGEgT0JSQSBhcnTDrXN0aWNhIGUvb3UgY2llbnTDrWZpY2EgaW5kaWNhZGEgYWNpbWEsIGluY2x1c2l2ZSBvcyBkaXJlaXRvcwpkZSB2b3ogZSBpbWFnZW0gdmluY3VsYWRvcyDDoCBPQlJBLCBkdXJhbnRlIHRvZG8gbyBwcmF6byBkZSBkdXJhw6fDo28gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBlbQpxdWFscXVlciBpZGlvbWEgZSBlbSB0b2RvcyBvcyBwYcOtc2VzOwoKKDIpIEFDRUlUQSBxdWUgYSBjZXNzw6NvIHRvdGFsIG7Do28gZXhjbHVzaXZhLCBwZXJtYW5lbnRlIGUgaXJyZXZvZ8OhdmVsIGRvcyBkaXJlaXRvcyBhdXRvcmFpcwpwYXRyaW1vbmlhaXMgbsOjbyBjb21lcmNpYWlzIGRlIHV0aWxpemHDp8OjbyBkZSBxdWUgdHJhdGEgZXN0ZSBkb2N1bWVudG8gaW5jbHVpLCBleGVtcGxpZmljYXRpdmFtZW50ZSwKb3MgZGlyZWl0b3MgZGUgZGlzcG9uaWJpbGl6YcOnw6NvIGUgY29tdW5pY2HDp8OjbyBww7pibGljYSBkYSBPQlJBLCBlbSBxdWFscXVlciBtZWlvIG91IHZlw61jdWxvLAppbmNsdXNpdmUgZW0gUmVwb3NpdMOzcmlvcyBEaWdpdGFpcywgYmVtIGNvbW8gb3MgZGlyZWl0b3MgZGUgcmVwcm9kdcOnw6NvLCBleGliacOnw6NvLCBleGVjdcOnw6NvLApkZWNsYW1hw6fDo28sIHJlY2l0YcOnw6NvLCBleHBvc2nDp8OjbywgYXJxdWl2YW1lbnRvLCBpbmNsdXPDo28gZW0gYmFuY28gZGUgZGFkb3MsIHByZXNlcnZhw6fDo28sIGRpZnVzw6NvLApkaXN0cmlidWnDp8OjbywgZGl2dWxnYcOnw6NvLCBlbXByw6lzdGltbywgdHJhZHXDp8OjbywgZHVibGFnZW0sIGxlZ2VuZGFnZW0sIGluY2x1c8OjbyBlbSBub3ZhcyBvYnJhcyBvdQpjb2xldMOibmVhcywgcmV1dGlsaXphw6fDo28sIGVkacOnw6NvLCBwcm9kdcOnw6NvIGRlIG1hdGVyaWFsIGRpZMOhdGljbyBlIGN1cnNvcyBvdSBxdWFscXVlciBmb3JtYSBkZQp1dGlsaXphw6fDo28gbsOjbyBjb21lcmNpYWw7CgooMykgUkVDT05IRUNFIHF1ZSBhIGNlc3PDo28gYXF1aSBlc3BlY2lmaWNhZGEgY29uY2VkZSDDoCBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPCkNSVVogbyBkaXJlaXRvIGRlIGF1dG9yaXphciBxdWFscXVlciBwZXNzb2Eg4oCTIGbDrXNpY2Egb3UganVyw61kaWNhLCBww7pibGljYSBvdSBwcml2YWRhLCBuYWNpb25hbCBvdQplc3RyYW5nZWlyYSDigJMgYSBhY2Vzc2FyIGUgdXRpbGl6YXIgYW1wbGFtZW50ZSBhIE9CUkEsIHNlbSBleGNsdXNpdmlkYWRlLCBwYXJhIHF1YWlzcXVlcgpmaW5hbGlkYWRlcyBuw6NvIGNvbWVyY2lhaXM7CgooNCkgREVDTEFSQSBxdWUgYSBvYnJhIMOpIGNyaWHDp8OjbyBvcmlnaW5hbCBlIHF1ZSDDqSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGFxdWkgY2VkaWRvcyBlIGF1dG9yaXphZG9zLApyZXNwb25zYWJpbGl6YW5kby1zZSBpbnRlZ3JhbG1lbnRlIHBlbG8gY29udGXDumRvIGUgb3V0cm9zIGVsZW1lbnRvcyBxdWUgZmF6ZW0gcGFydGUgZGEgT0JSQSwKaW5jbHVzaXZlIG9zIGRpcmVpdG9zIGRlIHZveiBlIGltYWdlbSB2aW5jdWxhZG9zIMOgIE9CUkEsIG9icmlnYW5kby1zZSBhIGluZGVuaXphciB0ZXJjZWlyb3MgcG9yCmRhbm9zLCBiZW0gY29tbyBpbmRlbml6YXIgZSByZXNzYXJjaXIgYSBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPIENSVVogZGUKZXZlbnR1YWlzIGRlc3Blc2FzIHF1ZSB2aWVyZW0gYSBzdXBvcnRhciwgZW0gcmF6w6NvIGRlIHF1YWxxdWVyIG9mZW5zYSBhIGRpcmVpdG9zIGF1dG9yYWlzIG91CmRpcmVpdG9zIGRlIHZveiBvdSBpbWFnZW0sIHByaW5jaXBhbG1lbnRlIG5vIHF1ZSBkaXogcmVzcGVpdG8gYSBwbMOhZ2lvIGUgdmlvbGHDp8O1ZXMgZGUgZGlyZWl0b3M7CgooNSkgQUZJUk1BIHF1ZSBjb25oZWNlIGEgUG9sw610aWNhIEluc3RpdHVjaW9uYWwgZGUgQWNlc3NvIEFiZXJ0byBkYSBGSU9DUlVaIC0gRlVOREHDh8ODTwpPU1dBTERPIENSVVogZSBhcyBkaXJldHJpemVzIHBhcmEgbyBmdW5jaW9uYW1lbnRvIGRvIHJlcG9zaXTDs3JpbyBpbnN0aXR1Y2lvbmFsIEFSQ0EuCgpBIFBvbMOtdGljYSBJbnN0aXR1Y2lvbmFsIGRlIEFjZXNzbyBBYmVydG8gZGEgRklPQ1JVWiAtIEZVTkRBw4fDg08gT1NXQUxETyBDUlVaIHJlc2VydmEKZXhjbHVzaXZhbWVudGUgYW8gQVVUT1Igb3MgZGlyZWl0b3MgbW9yYWlzIGUgb3MgdXNvcyBjb21lcmNpYWlzIHNvYnJlIGFzIG9icmFzIGRlIHN1YSBhdXRvcmlhCmUvb3UgdGl0dWxhcmlkYWRlLCBzZW5kbyBvcyB0ZXJjZWlyb3MgdXN1w6FyaW9zIHJlc3BvbnPDoXZlaXMgcGVsYSBhdHJpYnVpw6fDo28gZGUgYXV0b3JpYSBlIG1hbnV0ZW7Dp8OjbwpkYSBpbnRlZ3JpZGFkZSBkYSBPQlJBIGVtIHF1YWxxdWVyIHV0aWxpemHDp8Ojby4KCkEgUG9sw610aWNhIEluc3RpdHVjaW9uYWwgZGUgQWNlc3NvIEFiZXJ0byBkYSBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPIENSVVoKcmVzcGVpdGEgb3MgY29udHJhdG9zIGUgYWNvcmRvcyBwcmVleGlzdGVudGVzIGRvcyBBdXRvcmVzIGNvbSB0ZXJjZWlyb3MsIGNhYmVuZG8gYW9zIEF1dG9yZXMKaW5mb3JtYXIgw6AgSW5zdGl0dWnDp8OjbyBhcyBjb25kacOnw7VlcyBlIG91dHJhcyByZXN0cmnDp8O1ZXMgaW1wb3N0YXMgcG9yIGVzdGVzIGluc3RydW1lbnRvcy4KRepositório InstitucionalPUBhttps://www.arca.fiocruz.br/oai/requestrepositorio.arca@fiocruz.bropendoar:21352022-11-09T17:37:29Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)false |
dc.title.en_US.fl_str_mv |
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title |
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
spellingShingle |
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses Leal, Thiago Peixoto Complex network theory Population genetics Genetic kinship Genealogies simulator |
title_short |
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title_full |
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title_fullStr |
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title_full_unstemmed |
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title_sort |
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
author |
Leal, Thiago Peixoto |
author_facet |
Leal, Thiago Peixoto Furlan, Vinicius C Gouveia, Mateus Henrique Duarte, Julia Maria Saraiva Fonseca, Pablo AS Tou, Rafael Scliar, Marilia de Oliveira Araujo, Gilderlanio Santana de Costa, Lucas F. Zolini, Camila Peixoto, Maria Gabriela Campolina Diniz Carvalho, Maria Raquel Santos Costa, Maria Fernanda Furtado Lima Gilman, Robert H Tarazona-Santos, Eduardo Rodrigues, Maíra Ribeiro |
author_role |
author |
author2 |
Furlan, Vinicius C Gouveia, Mateus Henrique Duarte, Julia Maria Saraiva Fonseca, Pablo AS Tou, Rafael Scliar, Marilia de Oliveira Araujo, Gilderlanio Santana de Costa, Lucas F. Zolini, Camila Peixoto, Maria Gabriela Campolina Diniz Carvalho, Maria Raquel Santos Costa, Maria Fernanda Furtado Lima Gilman, Robert H Tarazona-Santos, Eduardo Rodrigues, Maíra Ribeiro |
author2_role |
author author author author author author author author author author author author author author author |
dc.contributor.author.fl_str_mv |
Leal, Thiago Peixoto Furlan, Vinicius C Gouveia, Mateus Henrique Duarte, Julia Maria Saraiva Fonseca, Pablo AS Tou, Rafael Scliar, Marilia de Oliveira Araujo, Gilderlanio Santana de Costa, Lucas F. Zolini, Camila Peixoto, Maria Gabriela Campolina Diniz Carvalho, Maria Raquel Santos Costa, Maria Fernanda Furtado Lima Gilman, Robert H Tarazona-Santos, Eduardo Rodrigues, Maíra Ribeiro |
dc.subject.en.en_US.fl_str_mv |
Complex network theory Population genetics Genetic kinship Genealogies simulator |
topic |
Complex network theory Population genetics Genetic kinship Genealogies simulator |
description |
Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Lerner Research Institute. Genomic Medicine.Cleveland Clinic. Cleveland, OH, United States |
publishDate |
2022 |
dc.date.accessioned.fl_str_mv |
2022-11-09T17:37:28Z |
dc.date.available.fl_str_mv |
2022-11-09T17:37:28Z |
dc.date.issued.fl_str_mv |
2022 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
LEAL,Thiago Peixoto et al. NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses. Comput Struct Biotechnol J., v. 20, p. 1821–1828, 2022. doi: 10.1016/j.csbj.2022.04.009 |
dc.identifier.uri.fl_str_mv |
https://www.arca.fiocruz.br/handle/icict/55556 |
dc.identifier.issn.en_US.fl_str_mv |
2001-0370 |
identifier_str_mv |
LEAL,Thiago Peixoto et al. NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses. Comput Struct Biotechnol J., v. 20, p. 1821–1828, 2022. doi: 10.1016/j.csbj.2022.04.009 2001-0370 |
url |
https://www.arca.fiocruz.br/handle/icict/55556 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Nature Publishing |
publisher.none.fl_str_mv |
Nature Publishing |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da FIOCRUZ (ARCA) instname:Fundação Oswaldo Cruz (FIOCRUZ) instacron:FIOCRUZ |
instname_str |
Fundação Oswaldo Cruz (FIOCRUZ) |
instacron_str |
FIOCRUZ |
institution |
FIOCRUZ |
reponame_str |
Repositório Institucional da FIOCRUZ (ARCA) |
collection |
Repositório Institucional da FIOCRUZ (ARCA) |
bitstream.url.fl_str_mv |
https://www.arca.fiocruz.br/bitstream/icict/55556/1/license.txt https://www.arca.fiocruz.br/bitstream/icict/55556/2/NAToRA%2c%20a%20relatedness-pruning%20method%20to%20minimize%20the%20loss%20of%20dataset.pdf |
bitstream.checksum.fl_str_mv |
5a560609d32a3863062d77ff32785d58 8a4dbf223833c07630f8d7824c8a18cd |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ) |
repository.mail.fl_str_mv |
repositorio.arca@fiocruz.br |
_version_ |
1798325000492548096 |