Benchmark of market cloud data warehouse technologies
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/1822/89565 |
Resumo: | Over the past two decades, the way computing resources are been developed, deployed, upgraded, and applied changed dramatically, with more and more software and hardware solutions being transferred to cloud technologies. Data Warehouses (DW), defined as a way of organizing corporate data in an integrated manner over (sequential) time periods, "structured & disposed" in order to generate a "single data source", were also affected by the evolution, thus giving rise to the concept of Cloud Data Warehouse (CDW). This technology allows users to be more technologically free, as they do not need to spend time investing in software and hardware, they only pay for the resources they used and the infrastructure itself has greater flexibility and scalability. However, selecting the most suitable platform or technology for a CDW can be a complex task due to the large number of factors that can influence the decision and due to the existing offer in the market. The objective of this paper is to describe the process of benchmarking a set of CDW platforms, with the goal of analyzing and exposing each one’s performance results. These platforms are Snowflake, Google BigQuery, Amazon Redshift, and Azure Synapse. The metrics to be measured are data loading and query running time, and alias running times. For this benchmark, the dataset used was Star Schema Benchmark (SSB), a dataset based on the well-known TPC Benchmark™ H (TPC-H). |
id |
RCAP_bb19d722ed2cc754c3e8abf33f9fdcd1 |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/89565 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Benchmark of market cloud data warehouse technologiesData WarehouseCloud ComputingCloud Data WarehouseCloud Data Warehouse TechnologiesEngenharia e Tecnologia::Outras Engenharias e TecnologiasIndústria, inovação e infraestruturasOver the past two decades, the way computing resources are been developed, deployed, upgraded, and applied changed dramatically, with more and more software and hardware solutions being transferred to cloud technologies. Data Warehouses (DW), defined as a way of organizing corporate data in an integrated manner over (sequential) time periods, "structured & disposed" in order to generate a "single data source", were also affected by the evolution, thus giving rise to the concept of Cloud Data Warehouse (CDW). This technology allows users to be more technologically free, as they do not need to spend time investing in software and hardware, they only pay for the resources they used and the infrastructure itself has greater flexibility and scalability. However, selecting the most suitable platform or technology for a CDW can be a complex task due to the large number of factors that can influence the decision and due to the existing offer in the market. The objective of this paper is to describe the process of benchmarking a set of CDW platforms, with the goal of analyzing and exposing each one’s performance results. These platforms are Snowflake, Google BigQuery, Amazon Redshift, and Azure Synapse. The metrics to be measured are data loading and query running time, and alias running times. For this benchmark, the dataset used was Star Schema Benchmark (SSB), a dataset based on the well-known TPC Benchmark™ H (TPC-H).This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020ElsevierUniversidade do MinhoOliveira e Sá, JorgeRenata GonçalvesKaldeich, Claus20232023-01-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://hdl.handle.net/1822/89565engOliveira e Sá, J., Gonçalves, R. & Kaldeich, C. (2023). Benchmark of Market Cloud Data Warehouse Technologies. CENTERIS 2023, Porto, Portugal, November 8-10, 2023, Elsevier, Procedia Computer Science.1877-0509https://centeris.scika.org/?page=scheduleeventsinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-11T07:13:54Zoai:repositorium.sdum.uminho.pt:1822/89565Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-11T07:13:54Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Benchmark of market cloud data warehouse technologies |
title |
Benchmark of market cloud data warehouse technologies |
spellingShingle |
Benchmark of market cloud data warehouse technologies Oliveira e Sá, Jorge Data Warehouse Cloud Computing Cloud Data Warehouse Cloud Data Warehouse Technologies Engenharia e Tecnologia::Outras Engenharias e Tecnologias Indústria, inovação e infraestruturas |
title_short |
Benchmark of market cloud data warehouse technologies |
title_full |
Benchmark of market cloud data warehouse technologies |
title_fullStr |
Benchmark of market cloud data warehouse technologies |
title_full_unstemmed |
Benchmark of market cloud data warehouse technologies |
title_sort |
Benchmark of market cloud data warehouse technologies |
author |
Oliveira e Sá, Jorge |
author_facet |
Oliveira e Sá, Jorge Renata Gonçalves Kaldeich, Claus |
author_role |
author |
author2 |
Renata Gonçalves Kaldeich, Claus |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Oliveira e Sá, Jorge Renata Gonçalves Kaldeich, Claus |
dc.subject.por.fl_str_mv |
Data Warehouse Cloud Computing Cloud Data Warehouse Cloud Data Warehouse Technologies Engenharia e Tecnologia::Outras Engenharias e Tecnologias Indústria, inovação e infraestruturas |
topic |
Data Warehouse Cloud Computing Cloud Data Warehouse Cloud Data Warehouse Technologies Engenharia e Tecnologia::Outras Engenharias e Tecnologias Indústria, inovação e infraestruturas |
description |
Over the past two decades, the way computing resources are been developed, deployed, upgraded, and applied changed dramatically, with more and more software and hardware solutions being transferred to cloud technologies. Data Warehouses (DW), defined as a way of organizing corporate data in an integrated manner over (sequential) time periods, "structured & disposed" in order to generate a "single data source", were also affected by the evolution, thus giving rise to the concept of Cloud Data Warehouse (CDW). This technology allows users to be more technologically free, as they do not need to spend time investing in software and hardware, they only pay for the resources they used and the infrastructure itself has greater flexibility and scalability. However, selecting the most suitable platform or technology for a CDW can be a complex task due to the large number of factors that can influence the decision and due to the existing offer in the market. The objective of this paper is to describe the process of benchmarking a set of CDW platforms, with the goal of analyzing and exposing each one’s performance results. These platforms are Snowflake, Google BigQuery, Amazon Redshift, and Azure Synapse. The metrics to be measured are data loading and query running time, and alias running times. For this benchmark, the dataset used was Star Schema Benchmark (SSB), a dataset based on the well-known TPC Benchmark™ H (TPC-H). |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023 2023-01-01T00:00:00Z |
dc.type.driver.fl_str_mv |
conference paper |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1822/89565 |
url |
https://hdl.handle.net/1822/89565 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Oliveira e Sá, J., Gonçalves, R. & Kaldeich, C. (2023). Benchmark of Market Cloud Data Warehouse Technologies. CENTERIS 2023, Porto, Portugal, November 8-10, 2023, Elsevier, Procedia Computer Science. 1877-0509 https://centeris.scika.org/?page=scheduleevents |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier |
publisher.none.fl_str_mv |
Elsevier |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817545247119376384 |