Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases

Detalhes bibliográficos
Autor(a) principal: Romero, Pedro
Data de Publicação: 2020
Outros Autores: Castillo-Vilcahuaman, Camila
Tipo de documento: preprint
Idioma: eng
Título da fonte: SciELO Preprints
Texto Completo: https://preprints.scielo.org/index.php/scielo/preprint/view/957
Resumo: Peru is one of the most biodiverse countries in the world. Genetic diversity is an important component of biodiversity, and it is crucial for current efforts to protect and sustainably manage several organisms and habitats. As far as we know, there is only one work describing Peruvian genetic information stored in public databases. We aimed to update this previous work searching in four public databases that stored sequencing information: Nucleotide, BioProject, PATRIC), BOLD. With this information we comment on the contribution of Peruvian institutions during recent years. In Nucleotide, the largest database, Bacteria are the most sequenced organisms by Peruvian institutions (70.60%), pathogenic bacteria such as Pasteurella multocida, Neisseria meningitidis, and Vibrio parahaemolyticus were the most abundant. We found no sequence records from the Archaea domain. In BioProject, the most common sequence belongs to Salmonella enterica subsp. enterica serovar Infantis. In PATRIC, a database for pathogenic agents, Mycobacterium tuberculosis and Yersinia pestis had the highest number of entries. Finally, in BOLD, exclusively Eukaryotic database, Chordata (Aves and Actinopterygii), Angiospermae, and Arthropoda (Insecta, and Arachnida) were the most frequent records. Our results would indicate research preferences of Peruvian institutions, focusing on infectious diseases. Although there has been a significant increase of DNA information submitted by Peruvian institutions since the last report, the genetic diversity reflected in these databases remains inconsistent with the diversity in the country. More efforts must be made to know obtain genetic information from more underestimated taxonomic groups and to promote more genetic research in regional Peruvian institutions.
id SCI-1_5d7a21a41a2deddbb01cbbe7411690d4
oai_identifier_str oai:ops.preprints.scielo.org:preprint/957
network_acronym_str SCI-1
network_name_str SciELO Preprints
repository_id_str
spelling Data mining of DNA sequences submitted by Peruvian institutions to public genetic databasesGenetic diversitypublic databasesbiodiversityPerudata miningPeru is one of the most biodiverse countries in the world. Genetic diversity is an important component of biodiversity, and it is crucial for current efforts to protect and sustainably manage several organisms and habitats. As far as we know, there is only one work describing Peruvian genetic information stored in public databases. We aimed to update this previous work searching in four public databases that stored sequencing information: Nucleotide, BioProject, PATRIC), BOLD. With this information we comment on the contribution of Peruvian institutions during recent years. In Nucleotide, the largest database, Bacteria are the most sequenced organisms by Peruvian institutions (70.60%), pathogenic bacteria such as Pasteurella multocida, Neisseria meningitidis, and Vibrio parahaemolyticus were the most abundant. We found no sequence records from the Archaea domain. In BioProject, the most common sequence belongs to Salmonella enterica subsp. enterica serovar Infantis. In PATRIC, a database for pathogenic agents, Mycobacterium tuberculosis and Yersinia pestis had the highest number of entries. Finally, in BOLD, exclusively Eukaryotic database, Chordata (Aves and Actinopterygii), Angiospermae, and Arthropoda (Insecta, and Arachnida) were the most frequent records. Our results would indicate research preferences of Peruvian institutions, focusing on infectious diseases. Although there has been a significant increase of DNA information submitted by Peruvian institutions since the last report, the genetic diversity reflected in these databases remains inconsistent with the diversity in the country. More efforts must be made to know obtain genetic information from more underestimated taxonomic groups and to promote more genetic research in regional Peruvian institutions.SciELO PreprintsSciELO PreprintsSciELO Preprints2020-07-14info:eu-repo/semantics/preprintinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://preprints.scielo.org/index.php/scielo/preprint/view/95710.1590/SciELOPreprints.957enghttps://preprints.scielo.org/index.php/scielo/article/view/957/1349Copyright (c) 2020 Pedro Romero, Camila Castillo-Vilcahuamanhttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessRomero, PedroCastillo-Vilcahuaman, Camilareponame:SciELO Preprintsinstname:SciELOinstacron:SCI2020-07-14T02:42:32Zoai:ops.preprints.scielo.org:preprint/957Servidor de preprintshttps://preprints.scielo.org/index.php/scieloONGhttps://preprints.scielo.org/index.php/scielo/oaiscielo.submission@scielo.orgopendoar:2020-07-14T02:42:32SciELO Preprints - SciELOfalse
dc.title.none.fl_str_mv Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases
title Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases
spellingShingle Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases
Romero, Pedro
Genetic diversity
public databases
biodiversity
Peru
data mining
title_short Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases
title_full Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases
title_fullStr Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases
title_full_unstemmed Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases
title_sort Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases
author Romero, Pedro
author_facet Romero, Pedro
Castillo-Vilcahuaman, Camila
author_role author
author2 Castillo-Vilcahuaman, Camila
author2_role author
dc.contributor.author.fl_str_mv Romero, Pedro
Castillo-Vilcahuaman, Camila
dc.subject.por.fl_str_mv Genetic diversity
public databases
biodiversity
Peru
data mining
topic Genetic diversity
public databases
biodiversity
Peru
data mining
description Peru is one of the most biodiverse countries in the world. Genetic diversity is an important component of biodiversity, and it is crucial for current efforts to protect and sustainably manage several organisms and habitats. As far as we know, there is only one work describing Peruvian genetic information stored in public databases. We aimed to update this previous work searching in four public databases that stored sequencing information: Nucleotide, BioProject, PATRIC), BOLD. With this information we comment on the contribution of Peruvian institutions during recent years. In Nucleotide, the largest database, Bacteria are the most sequenced organisms by Peruvian institutions (70.60%), pathogenic bacteria such as Pasteurella multocida, Neisseria meningitidis, and Vibrio parahaemolyticus were the most abundant. We found no sequence records from the Archaea domain. In BioProject, the most common sequence belongs to Salmonella enterica subsp. enterica serovar Infantis. In PATRIC, a database for pathogenic agents, Mycobacterium tuberculosis and Yersinia pestis had the highest number of entries. Finally, in BOLD, exclusively Eukaryotic database, Chordata (Aves and Actinopterygii), Angiospermae, and Arthropoda (Insecta, and Arachnida) were the most frequent records. Our results would indicate research preferences of Peruvian institutions, focusing on infectious diseases. Although there has been a significant increase of DNA information submitted by Peruvian institutions since the last report, the genetic diversity reflected in these databases remains inconsistent with the diversity in the country. More efforts must be made to know obtain genetic information from more underestimated taxonomic groups and to promote more genetic research in regional Peruvian institutions.
publishDate 2020
dc.date.none.fl_str_mv 2020-07-14
dc.type.driver.fl_str_mv info:eu-repo/semantics/preprint
info:eu-repo/semantics/publishedVersion
format preprint
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://preprints.scielo.org/index.php/scielo/preprint/view/957
10.1590/SciELOPreprints.957
url https://preprints.scielo.org/index.php/scielo/preprint/view/957
identifier_str_mv 10.1590/SciELOPreprints.957
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://preprints.scielo.org/index.php/scielo/article/view/957/1349
dc.rights.driver.fl_str_mv Copyright (c) 2020 Pedro Romero, Camila Castillo-Vilcahuaman
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2020 Pedro Romero, Camila Castillo-Vilcahuaman
https://creativecommons.org/licenses/by/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv SciELO Preprints
SciELO Preprints
SciELO Preprints
publisher.none.fl_str_mv SciELO Preprints
SciELO Preprints
SciELO Preprints
dc.source.none.fl_str_mv reponame:SciELO Preprints
instname:SciELO
instacron:SCI
instname_str SciELO
instacron_str SCI
institution SCI
reponame_str SciELO Preprints
collection SciELO Preprints
repository.name.fl_str_mv SciELO Preprints - SciELO
repository.mail.fl_str_mv scielo.submission@scielo.org
_version_ 1797047819372068864