Relative scalability of NoSQL databases for genotype data manipulation.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
Texto Completo: | http://www.alice.cnptia.embrapa.br/alice/handle/doc/1102528 |
Resumo: | Abstract Genotype data manipulation is one of the greatest challenges in bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explains why Relational Database Management Systems (RDBMSs), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, Big Data has been pushing the development of modern database systems that might be able to overcome RDBMSs deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical SNP sequences. Results indicate that although Tarantool has the best overall throughput, MongoDB is less impacted by the increase of SNP markers per individual. |
id |
EMBR_5dd778fe637504f3d32081ab68013df0 |
---|---|
oai_identifier_str |
oai:www.alice.cnptia.embrapa.br:doc/1102528 |
network_acronym_str |
EMBR |
network_name_str |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
repository_id_str |
2154 |
spelling |
Relative scalability of NoSQL databases for genotype data manipulation.DatabaseNoSQLData ScienceSNPBioinformaticsGenotypeAbstract Genotype data manipulation is one of the greatest challenges in bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explains why Relational Database Management Systems (RDBMSs), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, Big Data has been pushing the development of modern database systems that might be able to overcome RDBMSs deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical SNP sequences. Results indicate that although Tarantool has the best overall throughput, MongoDB is less impacted by the increase of SNP markers per individual.WAGNER ANTONIO ARBEX, CNPGL.ALMEIDA, A. L.SCHETTINO, V. J.BARBOSA, T. J. R.FREITAS, P. F.GUIMARÃES, P. G. S.ARBEX, W. A.2018-12-26T23:42:22Z2018-12-26T23:42:22Z2018-12-2620182018-12-26T23:42:22Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleRevista de Informática Teórica e Aplicada, v. 25, n. 2, p. 93-100, 2018.http://www.alice.cnptia.embrapa.br/alice/handle/doc/110252810.22456/2175-2745.79334enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)instacron:EMBRAPA2018-12-26T23:42:28Zoai:www.alice.cnptia.embrapa.br:doc/1102528Repositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestopendoar:21542018-12-26T23:42:28falseRepositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestcg-riaa@embrapa.bropendoar:21542018-12-26T23:42:28Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)false |
dc.title.none.fl_str_mv |
Relative scalability of NoSQL databases for genotype data manipulation. |
title |
Relative scalability of NoSQL databases for genotype data manipulation. |
spellingShingle |
Relative scalability of NoSQL databases for genotype data manipulation. ALMEIDA, A. L. Database NoSQL Data Science SNP Bioinformatics Genotype |
title_short |
Relative scalability of NoSQL databases for genotype data manipulation. |
title_full |
Relative scalability of NoSQL databases for genotype data manipulation. |
title_fullStr |
Relative scalability of NoSQL databases for genotype data manipulation. |
title_full_unstemmed |
Relative scalability of NoSQL databases for genotype data manipulation. |
title_sort |
Relative scalability of NoSQL databases for genotype data manipulation. |
author |
ALMEIDA, A. L. |
author_facet |
ALMEIDA, A. L. SCHETTINO, V. J. BARBOSA, T. J. R. FREITAS, P. F. GUIMARÃES, P. G. S. ARBEX, W. A. |
author_role |
author |
author2 |
SCHETTINO, V. J. BARBOSA, T. J. R. FREITAS, P. F. GUIMARÃES, P. G. S. ARBEX, W. A. |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
WAGNER ANTONIO ARBEX, CNPGL. |
dc.contributor.author.fl_str_mv |
ALMEIDA, A. L. SCHETTINO, V. J. BARBOSA, T. J. R. FREITAS, P. F. GUIMARÃES, P. G. S. ARBEX, W. A. |
dc.subject.por.fl_str_mv |
Database NoSQL Data Science SNP Bioinformatics Genotype |
topic |
Database NoSQL Data Science SNP Bioinformatics Genotype |
description |
Abstract Genotype data manipulation is one of the greatest challenges in bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explains why Relational Database Management Systems (RDBMSs), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, Big Data has been pushing the development of modern database systems that might be able to overcome RDBMSs deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical SNP sequences. Results indicate that although Tarantool has the best overall throughput, MongoDB is less impacted by the increase of SNP markers per individual. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-12-26T23:42:22Z 2018-12-26T23:42:22Z 2018-12-26 2018 2018-12-26T23:42:22Z |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
Revista de Informática Teórica e Aplicada, v. 25, n. 2, p. 93-100, 2018. http://www.alice.cnptia.embrapa.br/alice/handle/doc/1102528 10.22456/2175-2745.79334 |
identifier_str_mv |
Revista de Informática Teórica e Aplicada, v. 25, n. 2, p. 93-100, 2018. 10.22456/2175-2745.79334 |
url |
http://www.alice.cnptia.embrapa.br/alice/handle/doc/1102528 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa) instacron:EMBRAPA |
instname_str |
Empresa Brasileira de Pesquisa Agropecuária (Embrapa) |
instacron_str |
EMBRAPA |
institution |
EMBRAPA |
reponame_str |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
collection |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
repository.name.fl_str_mv |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa) |
repository.mail.fl_str_mv |
cg-riaa@embrapa.br |
_version_ |
1794503467929174016 |