NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses

Detalhes bibliográficos
Autor(a) principal: Leal, Thiago Peixoto
Data de Publicação: 2022
Outros Autores: Furlan, Vinicius C, Gouveia, Mateus Henrique, Duarte, Julia Maria Saraiva, Fonseca, Pablo AS, Tou, Rafael, Scliar, Marilia de Oliveira, Araujo, Gilderlanio Santana de, Costa, Lucas F., Zolini, Camila, Peixoto, Maria Gabriela Campolina Diniz, Carvalho, Maria Raquel Santos, Costa, Maria Fernanda Furtado Lima, Gilman, Robert H, Tarazona-Santos, Eduardo, Rodrigues, Maíra Ribeiro
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da FIOCRUZ (ARCA)
Texto Completo: https://www.arca.fiocruz.br/handle/icict/55556
Resumo: Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Lerner Research Institute. Genomic Medicine.Cleveland Clinic. Cleveland, OH, United States
id CRUZ_3d9db3d373c4e0dc23ca94350e0ef9b7
oai_identifier_str oai:www.arca.fiocruz.br:icict/55556
network_acronym_str CRUZ
network_name_str Repositório Institucional da FIOCRUZ (ARCA)
repository_id_str 2135
spelling Leal, Thiago PeixotoFurlan, Vinicius CGouveia, Mateus HenriqueDuarte, Julia Maria SaraivaFonseca, Pablo ASTou, RafaelScliar, Marilia de OliveiraAraujo, Gilderlanio Santana deCosta, Lucas F.Zolini, CamilaPeixoto, Maria Gabriela Campolina DinizCarvalho, Maria Raquel SantosCosta, Maria Fernanda Furtado LimaGilman, Robert HTarazona-Santos, EduardoRodrigues, Maíra Ribeiro2022-11-09T17:37:28Z2022-11-09T17:37:28Z2022LEAL,Thiago Peixoto et al. NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses. Comput Struct Biotechnol J., v. 20, p. 1821–1828, 2022. doi: 10.1016/j.csbj.2022.04.0092001-0370https://www.arca.fiocruz.br/handle/icict/55556engNature PublishingNAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analysesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Lerner Research Institute. Genomic Medicine.Cleveland Clinic. Cleveland, OH, United StatesDepartamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, BrazilUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Center for Research on Genomics & Global Health. National Human Genome Research Institute. National Institutes of Health. Bethesda, MD, United StatesUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, BrazilUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Centre for Genetic Improvement of Livestock. Department of Animal Biosciences. University of Guelph. Guelph, Ontario, CanadáUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, BrazilUniversidade de São Paulo. Instituto de Biociências. Centro de Estudos doGenoma Humano e Células-Tronco. São Paulo, SP, BrazilUniversidade Federal do Pará. Instituto de Ciências Biológicas. Programa de Pós-Graduação em Biologia Molecular. Laboratório de Genética Humana e Médica. Belém, PA, BrazilUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, BrazilUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Beagle. Belo Horizonte, MG, Brazil/Mosaico Translational Genomics Initiative, Belo Horizonte, MG, BrazilEmbrapa Gado de Leite, Embrapa, Juiz de Fora, MG, BrazilUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Instituto René Rachou. Belo Horizonte, MG, BrazilUniversidad Peruana Cayetano Heredia.Lima, Lima, Perú/ Dept of International Health. Johns Hopkins School of Public Health Baltimore. Baltimore, MD, USAUniversidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Mosaico Translational Genomics Initiative, Belo Horizonte, MG, Brazil/Dept of International Health. Johns Hopkins School of Public Health Baltimore. Baltimore, MD, USAUniversidade de São Paulo. Instituto de Biociências. Departamento de Genética e Biologia Evolutiva. São Paulo, SP, BrazilGenetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to complex datasets. When compared with two other popular population genetics methodologies (PLINK and KING), NAToRA shows the best combination of removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar effects on the allele frequency spectrum and Principal Component Analysis than PLINK and KING. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also release a genealogies simulator software used for different tests performed in this study.Complex network theoryPopulation geneticsGenetic kinshipGenealogies simulatorinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da FIOCRUZ (ARCA)instname:Fundação Oswaldo Cruz (FIOCRUZ)instacron:FIOCRUZLICENSElicense.txtlicense.txttext/plain; charset=utf-82991https://www.arca.fiocruz.br/bitstream/icict/55556/1/license.txt5a560609d32a3863062d77ff32785d58MD51ORIGINALNAToRA, a relatedness-pruning method to minimize the loss of dataset.pdfNAToRA, a relatedness-pruning method to minimize the loss of dataset.pdfapplication/pdf1200255https://www.arca.fiocruz.br/bitstream/icict/55556/2/NAToRA%2c%20a%20relatedness-pruning%20method%20to%20minimize%20the%20loss%20of%20dataset.pdf8a4dbf223833c07630f8d7824c8a18cdMD52icict/555562022-11-09 14:37:29.286oai:www.arca.fiocruz.br:icict/55556Q0VTU8ODTyBOw4NPIEVYQ0xVU0lWQSBERSBESVJFSVRPUyBBVVRPUkFJUwoKQW8gYWNlaXRhciBvcyBURVJNT1MgZSBDT05EScOHw5VFUyBkZXN0YSBDRVNTw4NPLCBvIEFVVE9SIGUvb3UgVElUVUxBUiBkZSBkaXJlaXRvcwphdXRvcmFpcyBzb2JyZSBhIE9CUkEgZGUgcXVlIHRyYXRhIGVzdGUgZG9jdW1lbnRvOgoKKDEpIENFREUgZSBUUkFOU0ZFUkUsIHRvdGFsIGUgZ3JhdHVpdGFtZW50ZSwgw6AgRklPQ1JVWiAtIEZVTkRBw4fDg08gT1NXQUxETyBDUlVaLCBlbQpjYXLDoXRlciBwZXJtYW5lbnRlLCBpcnJldm9nw6F2ZWwgZSBOw4NPIEVYQ0xVU0lWTywgdG9kb3Mgb3MgZGlyZWl0b3MgcGF0cmltb25pYWlzIE7Dg08KQ09NRVJDSUFJUyBkZSB1dGlsaXphw6fDo28gZGEgT0JSQSBhcnTDrXN0aWNhIGUvb3UgY2llbnTDrWZpY2EgaW5kaWNhZGEgYWNpbWEsIGluY2x1c2l2ZSBvcyBkaXJlaXRvcwpkZSB2b3ogZSBpbWFnZW0gdmluY3VsYWRvcyDDoCBPQlJBLCBkdXJhbnRlIHRvZG8gbyBwcmF6byBkZSBkdXJhw6fDo28gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBlbQpxdWFscXVlciBpZGlvbWEgZSBlbSB0b2RvcyBvcyBwYcOtc2VzOwoKKDIpIEFDRUlUQSBxdWUgYSBjZXNzw6NvIHRvdGFsIG7Do28gZXhjbHVzaXZhLCBwZXJtYW5lbnRlIGUgaXJyZXZvZ8OhdmVsIGRvcyBkaXJlaXRvcyBhdXRvcmFpcwpwYXRyaW1vbmlhaXMgbsOjbyBjb21lcmNpYWlzIGRlIHV0aWxpemHDp8OjbyBkZSBxdWUgdHJhdGEgZXN0ZSBkb2N1bWVudG8gaW5jbHVpLCBleGVtcGxpZmljYXRpdmFtZW50ZSwKb3MgZGlyZWl0b3MgZGUgZGlzcG9uaWJpbGl6YcOnw6NvIGUgY29tdW5pY2HDp8OjbyBww7pibGljYSBkYSBPQlJBLCBlbSBxdWFscXVlciBtZWlvIG91IHZlw61jdWxvLAppbmNsdXNpdmUgZW0gUmVwb3NpdMOzcmlvcyBEaWdpdGFpcywgYmVtIGNvbW8gb3MgZGlyZWl0b3MgZGUgcmVwcm9kdcOnw6NvLCBleGliacOnw6NvLCBleGVjdcOnw6NvLApkZWNsYW1hw6fDo28sIHJlY2l0YcOnw6NvLCBleHBvc2nDp8OjbywgYXJxdWl2YW1lbnRvLCBpbmNsdXPDo28gZW0gYmFuY28gZGUgZGFkb3MsIHByZXNlcnZhw6fDo28sIGRpZnVzw6NvLApkaXN0cmlidWnDp8OjbywgZGl2dWxnYcOnw6NvLCBlbXByw6lzdGltbywgdHJhZHXDp8OjbywgZHVibGFnZW0sIGxlZ2VuZGFnZW0sIGluY2x1c8OjbyBlbSBub3ZhcyBvYnJhcyBvdQpjb2xldMOibmVhcywgcmV1dGlsaXphw6fDo28sIGVkacOnw6NvLCBwcm9kdcOnw6NvIGRlIG1hdGVyaWFsIGRpZMOhdGljbyBlIGN1cnNvcyBvdSBxdWFscXVlciBmb3JtYSBkZQp1dGlsaXphw6fDo28gbsOjbyBjb21lcmNpYWw7CgooMykgUkVDT05IRUNFIHF1ZSBhIGNlc3PDo28gYXF1aSBlc3BlY2lmaWNhZGEgY29uY2VkZSDDoCBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPCkNSVVogbyBkaXJlaXRvIGRlIGF1dG9yaXphciBxdWFscXVlciBwZXNzb2Eg4oCTIGbDrXNpY2Egb3UganVyw61kaWNhLCBww7pibGljYSBvdSBwcml2YWRhLCBuYWNpb25hbCBvdQplc3RyYW5nZWlyYSDigJMgYSBhY2Vzc2FyIGUgdXRpbGl6YXIgYW1wbGFtZW50ZSBhIE9CUkEsIHNlbSBleGNsdXNpdmlkYWRlLCBwYXJhIHF1YWlzcXVlcgpmaW5hbGlkYWRlcyBuw6NvIGNvbWVyY2lhaXM7CgooNCkgREVDTEFSQSBxdWUgYSBvYnJhIMOpIGNyaWHDp8OjbyBvcmlnaW5hbCBlIHF1ZSDDqSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGFxdWkgY2VkaWRvcyBlIGF1dG9yaXphZG9zLApyZXNwb25zYWJpbGl6YW5kby1zZSBpbnRlZ3JhbG1lbnRlIHBlbG8gY29udGXDumRvIGUgb3V0cm9zIGVsZW1lbnRvcyBxdWUgZmF6ZW0gcGFydGUgZGEgT0JSQSwKaW5jbHVzaXZlIG9zIGRpcmVpdG9zIGRlIHZveiBlIGltYWdlbSB2aW5jdWxhZG9zIMOgIE9CUkEsIG9icmlnYW5kby1zZSBhIGluZGVuaXphciB0ZXJjZWlyb3MgcG9yCmRhbm9zLCBiZW0gY29tbyBpbmRlbml6YXIgZSByZXNzYXJjaXIgYSBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPIENSVVogZGUKZXZlbnR1YWlzIGRlc3Blc2FzIHF1ZSB2aWVyZW0gYSBzdXBvcnRhciwgZW0gcmF6w6NvIGRlIHF1YWxxdWVyIG9mZW5zYSBhIGRpcmVpdG9zIGF1dG9yYWlzIG91CmRpcmVpdG9zIGRlIHZveiBvdSBpbWFnZW0sIHByaW5jaXBhbG1lbnRlIG5vIHF1ZSBkaXogcmVzcGVpdG8gYSBwbMOhZ2lvIGUgdmlvbGHDp8O1ZXMgZGUgZGlyZWl0b3M7CgooNSkgQUZJUk1BIHF1ZSBjb25oZWNlIGEgUG9sw610aWNhIEluc3RpdHVjaW9uYWwgZGUgQWNlc3NvIEFiZXJ0byBkYSBGSU9DUlVaIC0gRlVOREHDh8ODTwpPU1dBTERPIENSVVogZSBhcyBkaXJldHJpemVzIHBhcmEgbyBmdW5jaW9uYW1lbnRvIGRvIHJlcG9zaXTDs3JpbyBpbnN0aXR1Y2lvbmFsIEFSQ0EuCgpBIFBvbMOtdGljYSBJbnN0aXR1Y2lvbmFsIGRlIEFjZXNzbyBBYmVydG8gZGEgRklPQ1JVWiAtIEZVTkRBw4fDg08gT1NXQUxETyBDUlVaIHJlc2VydmEKZXhjbHVzaXZhbWVudGUgYW8gQVVUT1Igb3MgZGlyZWl0b3MgbW9yYWlzIGUgb3MgdXNvcyBjb21lcmNpYWlzIHNvYnJlIGFzIG9icmFzIGRlIHN1YSBhdXRvcmlhCmUvb3UgdGl0dWxhcmlkYWRlLCBzZW5kbyBvcyB0ZXJjZWlyb3MgdXN1w6FyaW9zIHJlc3BvbnPDoXZlaXMgcGVsYSBhdHJpYnVpw6fDo28gZGUgYXV0b3JpYSBlIG1hbnV0ZW7Dp8OjbwpkYSBpbnRlZ3JpZGFkZSBkYSBPQlJBIGVtIHF1YWxxdWVyIHV0aWxpemHDp8Ojby4KCkEgUG9sw610aWNhIEluc3RpdHVjaW9uYWwgZGUgQWNlc3NvIEFiZXJ0byBkYSBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPIENSVVoKcmVzcGVpdGEgb3MgY29udHJhdG9zIGUgYWNvcmRvcyBwcmVleGlzdGVudGVzIGRvcyBBdXRvcmVzIGNvbSB0ZXJjZWlyb3MsIGNhYmVuZG8gYW9zIEF1dG9yZXMKaW5mb3JtYXIgw6AgSW5zdGl0dWnDp8OjbyBhcyBjb25kacOnw7VlcyBlIG91dHJhcyByZXN0cmnDp8O1ZXMgaW1wb3N0YXMgcG9yIGVzdGVzIGluc3RydW1lbnRvcy4KRepositório InstitucionalPUBhttps://www.arca.fiocruz.br/oai/requestrepositorio.arca@fiocruz.bropendoar:21352022-11-09T17:37:29Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)false
dc.title.en_US.fl_str_mv NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
spellingShingle NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
Leal, Thiago Peixoto
Complex network theory
Population genetics
Genetic kinship
Genealogies simulator
title_short NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title_full NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title_fullStr NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title_full_unstemmed NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title_sort NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
author Leal, Thiago Peixoto
author_facet Leal, Thiago Peixoto
Furlan, Vinicius C
Gouveia, Mateus Henrique
Duarte, Julia Maria Saraiva
Fonseca, Pablo AS
Tou, Rafael
Scliar, Marilia de Oliveira
Araujo, Gilderlanio Santana de
Costa, Lucas F.
Zolini, Camila
Peixoto, Maria Gabriela Campolina Diniz
Carvalho, Maria Raquel Santos
Costa, Maria Fernanda Furtado Lima
Gilman, Robert H
Tarazona-Santos, Eduardo
Rodrigues, Maíra Ribeiro
author_role author
author2 Furlan, Vinicius C
Gouveia, Mateus Henrique
Duarte, Julia Maria Saraiva
Fonseca, Pablo AS
Tou, Rafael
Scliar, Marilia de Oliveira
Araujo, Gilderlanio Santana de
Costa, Lucas F.
Zolini, Camila
Peixoto, Maria Gabriela Campolina Diniz
Carvalho, Maria Raquel Santos
Costa, Maria Fernanda Furtado Lima
Gilman, Robert H
Tarazona-Santos, Eduardo
Rodrigues, Maíra Ribeiro
author2_role author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Leal, Thiago Peixoto
Furlan, Vinicius C
Gouveia, Mateus Henrique
Duarte, Julia Maria Saraiva
Fonseca, Pablo AS
Tou, Rafael
Scliar, Marilia de Oliveira
Araujo, Gilderlanio Santana de
Costa, Lucas F.
Zolini, Camila
Peixoto, Maria Gabriela Campolina Diniz
Carvalho, Maria Raquel Santos
Costa, Maria Fernanda Furtado Lima
Gilman, Robert H
Tarazona-Santos, Eduardo
Rodrigues, Maíra Ribeiro
dc.subject.en.en_US.fl_str_mv Complex network theory
Population genetics
Genetic kinship
Genealogies simulator
topic Complex network theory
Population genetics
Genetic kinship
Genealogies simulator
description Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Genética, Ecologia e Evolução. Belo Horizonte, MG, Brazil/Lerner Research Institute. Genomic Medicine.Cleveland Clinic. Cleveland, OH, United States
publishDate 2022
dc.date.accessioned.fl_str_mv 2022-11-09T17:37:28Z
dc.date.available.fl_str_mv 2022-11-09T17:37:28Z
dc.date.issued.fl_str_mv 2022
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.citation.fl_str_mv LEAL,Thiago Peixoto et al. NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses. Comput Struct Biotechnol J., v. 20, p. 1821–1828, 2022. doi: 10.1016/j.csbj.2022.04.009
dc.identifier.uri.fl_str_mv https://www.arca.fiocruz.br/handle/icict/55556
dc.identifier.issn.en_US.fl_str_mv 2001-0370
identifier_str_mv LEAL,Thiago Peixoto et al. NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses. Comput Struct Biotechnol J., v. 20, p. 1821–1828, 2022. doi: 10.1016/j.csbj.2022.04.009
2001-0370
url https://www.arca.fiocruz.br/handle/icict/55556
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Nature Publishing
publisher.none.fl_str_mv Nature Publishing
dc.source.none.fl_str_mv reponame:Repositório Institucional da FIOCRUZ (ARCA)
instname:Fundação Oswaldo Cruz (FIOCRUZ)
instacron:FIOCRUZ
instname_str Fundação Oswaldo Cruz (FIOCRUZ)
instacron_str FIOCRUZ
institution FIOCRUZ
reponame_str Repositório Institucional da FIOCRUZ (ARCA)
collection Repositório Institucional da FIOCRUZ (ARCA)
bitstream.url.fl_str_mv https://www.arca.fiocruz.br/bitstream/icict/55556/1/license.txt
https://www.arca.fiocruz.br/bitstream/icict/55556/2/NAToRA%2c%20a%20relatedness-pruning%20method%20to%20minimize%20the%20loss%20of%20dataset.pdf
bitstream.checksum.fl_str_mv 5a560609d32a3863062d77ff32785d58
8a4dbf223833c07630f8d7824c8a18cd
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)
repository.mail.fl_str_mv repositorio.arca@fiocruz.br
_version_ 1798325000492548096