Relative scalability of NoSQL databases for genotype data manipulation.

Detalhes bibliográficos
Autor(a) principal: ALMEIDA, A. L.
Data de Publicação: 2018
Outros Autores: SCHETTINO, V. J., BARBOSA, T. J. R., FREITAS, P. F., GUIMARÃES, P. G. S., ARBEX, W. A.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
Texto Completo: http://www.alice.cnptia.embrapa.br/alice/handle/doc/1102528
Resumo: Abstract Genotype data manipulation is one of the greatest challenges in bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explains why Relational Database Management Systems (RDBMSs), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, Big Data has been pushing the development of modern database systems that might be able to overcome RDBMSs deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical SNP sequences. Results indicate that although Tarantool has the best overall throughput, MongoDB is less impacted by the increase of SNP markers per individual.
id EMBR_5dd778fe637504f3d32081ab68013df0
oai_identifier_str oai:www.alice.cnptia.embrapa.br:doc/1102528
network_acronym_str EMBR
network_name_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository_id_str 2154
spelling Relative scalability of NoSQL databases for genotype data manipulation.DatabaseNoSQLData ScienceSNPBioinformaticsGenotypeAbstract Genotype data manipulation is one of the greatest challenges in bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explains why Relational Database Management Systems (RDBMSs), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, Big Data has been pushing the development of modern database systems that might be able to overcome RDBMSs deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical SNP sequences. Results indicate that although Tarantool has the best overall throughput, MongoDB is less impacted by the increase of SNP markers per individual.WAGNER ANTONIO ARBEX, CNPGL.ALMEIDA, A. L.SCHETTINO, V. J.BARBOSA, T. J. R.FREITAS, P. F.GUIMARÃES, P. G. S.ARBEX, W. A.2018-12-26T23:42:22Z2018-12-26T23:42:22Z2018-12-2620182018-12-26T23:42:22Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleRevista de Informática Teórica e Aplicada, v. 25, n. 2, p. 93-100, 2018.http://www.alice.cnptia.embrapa.br/alice/handle/doc/110252810.22456/2175-2745.79334enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)instacron:EMBRAPA2018-12-26T23:42:28Zoai:www.alice.cnptia.embrapa.br:doc/1102528Repositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestopendoar:21542018-12-26T23:42:28falseRepositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestcg-riaa@embrapa.bropendoar:21542018-12-26T23:42:28Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)false
dc.title.none.fl_str_mv Relative scalability of NoSQL databases for genotype data manipulation.
title Relative scalability of NoSQL databases for genotype data manipulation.
spellingShingle Relative scalability of NoSQL databases for genotype data manipulation.
ALMEIDA, A. L.
Database
NoSQL
Data Science
SNP
Bioinformatics
Genotype
title_short Relative scalability of NoSQL databases for genotype data manipulation.
title_full Relative scalability of NoSQL databases for genotype data manipulation.
title_fullStr Relative scalability of NoSQL databases for genotype data manipulation.
title_full_unstemmed Relative scalability of NoSQL databases for genotype data manipulation.
title_sort Relative scalability of NoSQL databases for genotype data manipulation.
author ALMEIDA, A. L.
author_facet ALMEIDA, A. L.
SCHETTINO, V. J.
BARBOSA, T. J. R.
FREITAS, P. F.
GUIMARÃES, P. G. S.
ARBEX, W. A.
author_role author
author2 SCHETTINO, V. J.
BARBOSA, T. J. R.
FREITAS, P. F.
GUIMARÃES, P. G. S.
ARBEX, W. A.
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv WAGNER ANTONIO ARBEX, CNPGL.
dc.contributor.author.fl_str_mv ALMEIDA, A. L.
SCHETTINO, V. J.
BARBOSA, T. J. R.
FREITAS, P. F.
GUIMARÃES, P. G. S.
ARBEX, W. A.
dc.subject.por.fl_str_mv Database
NoSQL
Data Science
SNP
Bioinformatics
Genotype
topic Database
NoSQL
Data Science
SNP
Bioinformatics
Genotype
description Abstract Genotype data manipulation is one of the greatest challenges in bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explains why Relational Database Management Systems (RDBMSs), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, Big Data has been pushing the development of modern database systems that might be able to overcome RDBMSs deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical SNP sequences. Results indicate that although Tarantool has the best overall throughput, MongoDB is less impacted by the increase of SNP markers per individual.
publishDate 2018
dc.date.none.fl_str_mv 2018-12-26T23:42:22Z
2018-12-26T23:42:22Z
2018-12-26
2018
2018-12-26T23:42:22Z
dc.type.driver.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv Revista de Informática Teórica e Aplicada, v. 25, n. 2, p. 93-100, 2018.
http://www.alice.cnptia.embrapa.br/alice/handle/doc/1102528
10.22456/2175-2745.79334
identifier_str_mv Revista de Informática Teórica e Aplicada, v. 25, n. 2, p. 93-100, 2018.
10.22456/2175-2745.79334
url http://www.alice.cnptia.embrapa.br/alice/handle/doc/1102528
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron:EMBRAPA
instname_str Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron_str EMBRAPA
institution EMBRAPA
reponame_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
collection Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository.name.fl_str_mv Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
repository.mail.fl_str_mv cg-riaa@embrapa.br
_version_ 1794503467929174016