Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | |
Tipo de documento: | preprint |
Idioma: | eng |
Título da fonte: | SciELO Preprints |
Texto Completo: | https://preprints.scielo.org/index.php/scielo/preprint/view/957 |
Resumo: | Peru is one of the most biodiverse countries in the world. Genetic diversity is an important component of biodiversity, and it is crucial for current efforts to protect and sustainably manage several organisms and habitats. As far as we know, there is only one work describing Peruvian genetic information stored in public databases. We aimed to update this previous work searching in four public databases that stored sequencing information: Nucleotide, BioProject, PATRIC), BOLD. With this information we comment on the contribution of Peruvian institutions during recent years. In Nucleotide, the largest database, Bacteria are the most sequenced organisms by Peruvian institutions (70.60%), pathogenic bacteria such as Pasteurella multocida, Neisseria meningitidis, and Vibrio parahaemolyticus were the most abundant. We found no sequence records from the Archaea domain. In BioProject, the most common sequence belongs to Salmonella enterica subsp. enterica serovar Infantis. In PATRIC, a database for pathogenic agents, Mycobacterium tuberculosis and Yersinia pestis had the highest number of entries. Finally, in BOLD, exclusively Eukaryotic database, Chordata (Aves and Actinopterygii), Angiospermae, and Arthropoda (Insecta, and Arachnida) were the most frequent records. Our results would indicate research preferences of Peruvian institutions, focusing on infectious diseases. Although there has been a significant increase of DNA information submitted by Peruvian institutions since the last report, the genetic diversity reflected in these databases remains inconsistent with the diversity in the country. More efforts must be made to know obtain genetic information from more underestimated taxonomic groups and to promote more genetic research in regional Peruvian institutions. |
id |
SCI-1_5d7a21a41a2deddbb01cbbe7411690d4 |
---|---|
oai_identifier_str |
oai:ops.preprints.scielo.org:preprint/957 |
network_acronym_str |
SCI-1 |
network_name_str |
SciELO Preprints |
repository_id_str |
|
spelling |
Data mining of DNA sequences submitted by Peruvian institutions to public genetic databasesGenetic diversitypublic databasesbiodiversityPerudata miningPeru is one of the most biodiverse countries in the world. Genetic diversity is an important component of biodiversity, and it is crucial for current efforts to protect and sustainably manage several organisms and habitats. As far as we know, there is only one work describing Peruvian genetic information stored in public databases. We aimed to update this previous work searching in four public databases that stored sequencing information: Nucleotide, BioProject, PATRIC), BOLD. With this information we comment on the contribution of Peruvian institutions during recent years. In Nucleotide, the largest database, Bacteria are the most sequenced organisms by Peruvian institutions (70.60%), pathogenic bacteria such as Pasteurella multocida, Neisseria meningitidis, and Vibrio parahaemolyticus were the most abundant. We found no sequence records from the Archaea domain. In BioProject, the most common sequence belongs to Salmonella enterica subsp. enterica serovar Infantis. In PATRIC, a database for pathogenic agents, Mycobacterium tuberculosis and Yersinia pestis had the highest number of entries. Finally, in BOLD, exclusively Eukaryotic database, Chordata (Aves and Actinopterygii), Angiospermae, and Arthropoda (Insecta, and Arachnida) were the most frequent records. Our results would indicate research preferences of Peruvian institutions, focusing on infectious diseases. Although there has been a significant increase of DNA information submitted by Peruvian institutions since the last report, the genetic diversity reflected in these databases remains inconsistent with the diversity in the country. More efforts must be made to know obtain genetic information from more underestimated taxonomic groups and to promote more genetic research in regional Peruvian institutions.SciELO PreprintsSciELO PreprintsSciELO Preprints2020-07-14info:eu-repo/semantics/preprintinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://preprints.scielo.org/index.php/scielo/preprint/view/95710.1590/SciELOPreprints.957enghttps://preprints.scielo.org/index.php/scielo/article/view/957/1349Copyright (c) 2020 Pedro Romero, Camila Castillo-Vilcahuamanhttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessRomero, PedroCastillo-Vilcahuaman, Camilareponame:SciELO Preprintsinstname:SciELOinstacron:SCI2020-07-14T02:42:32Zoai:ops.preprints.scielo.org:preprint/957Servidor de preprintshttps://preprints.scielo.org/index.php/scieloONGhttps://preprints.scielo.org/index.php/scielo/oaiscielo.submission@scielo.orgopendoar:2020-07-14T02:42:32SciELO Preprints - SciELOfalse |
dc.title.none.fl_str_mv |
Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases |
title |
Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases |
spellingShingle |
Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases Romero, Pedro Genetic diversity public databases biodiversity Peru data mining |
title_short |
Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases |
title_full |
Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases |
title_fullStr |
Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases |
title_full_unstemmed |
Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases |
title_sort |
Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases |
author |
Romero, Pedro |
author_facet |
Romero, Pedro Castillo-Vilcahuaman, Camila |
author_role |
author |
author2 |
Castillo-Vilcahuaman, Camila |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Romero, Pedro Castillo-Vilcahuaman, Camila |
dc.subject.por.fl_str_mv |
Genetic diversity public databases biodiversity Peru data mining |
topic |
Genetic diversity public databases biodiversity Peru data mining |
description |
Peru is one of the most biodiverse countries in the world. Genetic diversity is an important component of biodiversity, and it is crucial for current efforts to protect and sustainably manage several organisms and habitats. As far as we know, there is only one work describing Peruvian genetic information stored in public databases. We aimed to update this previous work searching in four public databases that stored sequencing information: Nucleotide, BioProject, PATRIC), BOLD. With this information we comment on the contribution of Peruvian institutions during recent years. In Nucleotide, the largest database, Bacteria are the most sequenced organisms by Peruvian institutions (70.60%), pathogenic bacteria such as Pasteurella multocida, Neisseria meningitidis, and Vibrio parahaemolyticus were the most abundant. We found no sequence records from the Archaea domain. In BioProject, the most common sequence belongs to Salmonella enterica subsp. enterica serovar Infantis. In PATRIC, a database for pathogenic agents, Mycobacterium tuberculosis and Yersinia pestis had the highest number of entries. Finally, in BOLD, exclusively Eukaryotic database, Chordata (Aves and Actinopterygii), Angiospermae, and Arthropoda (Insecta, and Arachnida) were the most frequent records. Our results would indicate research preferences of Peruvian institutions, focusing on infectious diseases. Although there has been a significant increase of DNA information submitted by Peruvian institutions since the last report, the genetic diversity reflected in these databases remains inconsistent with the diversity in the country. More efforts must be made to know obtain genetic information from more underestimated taxonomic groups and to promote more genetic research in regional Peruvian institutions. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-07-14 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/preprint info:eu-repo/semantics/publishedVersion |
format |
preprint |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://preprints.scielo.org/index.php/scielo/preprint/view/957 10.1590/SciELOPreprints.957 |
url |
https://preprints.scielo.org/index.php/scielo/preprint/view/957 |
identifier_str_mv |
10.1590/SciELOPreprints.957 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://preprints.scielo.org/index.php/scielo/article/view/957/1349 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2020 Pedro Romero, Camila Castillo-Vilcahuaman https://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2020 Pedro Romero, Camila Castillo-Vilcahuaman https://creativecommons.org/licenses/by/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
SciELO Preprints SciELO Preprints SciELO Preprints |
publisher.none.fl_str_mv |
SciELO Preprints SciELO Preprints SciELO Preprints |
dc.source.none.fl_str_mv |
reponame:SciELO Preprints instname:SciELO instacron:SCI |
instname_str |
SciELO |
instacron_str |
SCI |
institution |
SCI |
reponame_str |
SciELO Preprints |
collection |
SciELO Preprints |
repository.name.fl_str_mv |
SciELO Preprints - SciELO |
repository.mail.fl_str_mv |
scielo.submission@scielo.org |
_version_ |
1797047819372068864 |