Benchmark of market cloud data warehouse technologies

Detalhes bibliográficos
Autor(a) principal: Oliveira e Sá, Jorge
Data de Publicação: 2023
Outros Autores: Renata Gonçalves, Kaldeich, Claus
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/1822/89565
Resumo: Over the past two decades, the way computing resources are been developed, deployed, upgraded, and applied changed dramatically, with more and more software and hardware solutions being transferred to cloud technologies. Data Warehouses (DW), defined as a way of organizing corporate data in an integrated manner over (sequential) time periods, "structured & disposed" in order to generate a "single data source", were also affected by the evolution, thus giving rise to the concept of Cloud Data Warehouse (CDW). This technology allows users to be more technologically free, as they do not need to spend time investing in software and hardware, they only pay for the resources they used and the infrastructure itself has greater flexibility and scalability. However, selecting the most suitable platform or technology for a CDW can be a complex task due to the large number of factors that can influence the decision and due to the existing offer in the market. The objective of this paper is to describe the process of benchmarking a set of CDW platforms, with the goal of analyzing and exposing each one’s performance results. These platforms are Snowflake, Google BigQuery, Amazon Redshift, and Azure Synapse. The metrics to be measured are data loading and query running time, and alias running times. For this benchmark, the dataset used was Star Schema Benchmark (SSB), a dataset based on the well-known TPC Benchmark™ H (TPC-H).
id RCAP_bb19d722ed2cc754c3e8abf33f9fdcd1
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/89565
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Benchmark of market cloud data warehouse technologiesData WarehouseCloud ComputingCloud Data WarehouseCloud Data Warehouse TechnologiesEngenharia e Tecnologia::Outras Engenharias e TecnologiasIndústria, inovação e infraestruturasOver the past two decades, the way computing resources are been developed, deployed, upgraded, and applied changed dramatically, with more and more software and hardware solutions being transferred to cloud technologies. Data Warehouses (DW), defined as a way of organizing corporate data in an integrated manner over (sequential) time periods, "structured & disposed" in order to generate a "single data source", were also affected by the evolution, thus giving rise to the concept of Cloud Data Warehouse (CDW). This technology allows users to be more technologically free, as they do not need to spend time investing in software and hardware, they only pay for the resources they used and the infrastructure itself has greater flexibility and scalability. However, selecting the most suitable platform or technology for a CDW can be a complex task due to the large number of factors that can influence the decision and due to the existing offer in the market. The objective of this paper is to describe the process of benchmarking a set of CDW platforms, with the goal of analyzing and exposing each one’s performance results. These platforms are Snowflake, Google BigQuery, Amazon Redshift, and Azure Synapse. The metrics to be measured are data loading and query running time, and alias running times. For this benchmark, the dataset used was Star Schema Benchmark (SSB), a dataset based on the well-known TPC Benchmark™ H (TPC-H).This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020ElsevierUniversidade do MinhoOliveira e Sá, JorgeRenata GonçalvesKaldeich, Claus20232023-01-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://hdl.handle.net/1822/89565engOliveira e Sá, J., Gonçalves, R. & Kaldeich, C. (2023). Benchmark of Market Cloud Data Warehouse Technologies. CENTERIS 2023, Porto, Portugal, November 8-10, 2023, Elsevier, Procedia Computer Science.1877-0509https://centeris.scika.org/?page=scheduleeventsinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-11T07:13:54Zoai:repositorium.sdum.uminho.pt:1822/89565Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-11T07:13:54Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Benchmark of market cloud data warehouse technologies
title Benchmark of market cloud data warehouse technologies
spellingShingle Benchmark of market cloud data warehouse technologies
Oliveira e Sá, Jorge
Data Warehouse
Cloud Computing
Cloud Data Warehouse
Cloud Data Warehouse Technologies
Engenharia e Tecnologia::Outras Engenharias e Tecnologias
Indústria, inovação e infraestruturas
title_short Benchmark of market cloud data warehouse technologies
title_full Benchmark of market cloud data warehouse technologies
title_fullStr Benchmark of market cloud data warehouse technologies
title_full_unstemmed Benchmark of market cloud data warehouse technologies
title_sort Benchmark of market cloud data warehouse technologies
author Oliveira e Sá, Jorge
author_facet Oliveira e Sá, Jorge
Renata Gonçalves
Kaldeich, Claus
author_role author
author2 Renata Gonçalves
Kaldeich, Claus
author2_role author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Oliveira e Sá, Jorge
Renata Gonçalves
Kaldeich, Claus
dc.subject.por.fl_str_mv Data Warehouse
Cloud Computing
Cloud Data Warehouse
Cloud Data Warehouse Technologies
Engenharia e Tecnologia::Outras Engenharias e Tecnologias
Indústria, inovação e infraestruturas
topic Data Warehouse
Cloud Computing
Cloud Data Warehouse
Cloud Data Warehouse Technologies
Engenharia e Tecnologia::Outras Engenharias e Tecnologias
Indústria, inovação e infraestruturas
description Over the past two decades, the way computing resources are been developed, deployed, upgraded, and applied changed dramatically, with more and more software and hardware solutions being transferred to cloud technologies. Data Warehouses (DW), defined as a way of organizing corporate data in an integrated manner over (sequential) time periods, "structured & disposed" in order to generate a "single data source", were also affected by the evolution, thus giving rise to the concept of Cloud Data Warehouse (CDW). This technology allows users to be more technologically free, as they do not need to spend time investing in software and hardware, they only pay for the resources they used and the infrastructure itself has greater flexibility and scalability. However, selecting the most suitable platform or technology for a CDW can be a complex task due to the large number of factors that can influence the decision and due to the existing offer in the market. The objective of this paper is to describe the process of benchmarking a set of CDW platforms, with the goal of analyzing and exposing each one’s performance results. These platforms are Snowflake, Google BigQuery, Amazon Redshift, and Azure Synapse. The metrics to be measured are data loading and query running time, and alias running times. For this benchmark, the dataset used was Star Schema Benchmark (SSB), a dataset based on the well-known TPC Benchmark™ H (TPC-H).
publishDate 2023
dc.date.none.fl_str_mv 2023
2023-01-01T00:00:00Z
dc.type.driver.fl_str_mv conference paper
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/89565
url https://hdl.handle.net/1822/89565
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Oliveira e Sá, J., Gonçalves, R. & Kaldeich, C. (2023). Benchmark of Market Cloud Data Warehouse Technologies. CENTERIS 2023, Porto, Portugal, November 8-10, 2023, Elsevier, Procedia Computer Science.
1877-0509
https://centeris.scika.org/?page=scheduleevents
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv mluisa.alvim@gmail.com
_version_ 1817545247119376384