Transaction Processing over Geo-Partitioned Data

Detalhes bibliográficos
Autor(a) principal: Braz, Sofia Frederico de Sousa
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/145197
Resumo: Databases are a fundamental component of any web service, storing and managing all the service data. In large-scale web services, it is essential that the data storage systems used consider techniques such as partial replication, geo-replication, and weaker consistency models so that the expectations of these systems regarding availability and latency can be met as best as possible. In this dissertation, we address the problem of executing transactions on data that is partially replicated. In this sense, we adopt the transactional causal consistency semantics, the consistency model where a transaction accesses a causally consistent snapshot of the database. However, implementing this consistency model in a partially replicated setting raises several challenges regarding handling transactions that access data items replicated in different nodes. Our work aims to design and implement a novel algorithm for executing transactions over geo-partitioned data with transactional causal consistency semantics. We discuss the problems and design choices for executing transactions over partially replicated data and present a design to implement the proposed algorithm by extending a weakly consistent geo-replicated key-value store with partial replication, adding support for executing transactions involving geo-partitioned data items. In this context, we also addressed the problem of deciding the best strategy for searching data in replicas that hold only a part of the total data of a service and where the state of each replica might diverge. We evaluate our solution using microbenchmarks based on the TPC-H database. Our results show that the overhead of the system is low for the expected scenario of a low ratio of remote transactions.
id RCAP_c2bde4e90914ff88408a2a03e663ae20
oai_identifier_str oai:run.unl.pt:10362/145197
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Transaction Processing over Geo-Partitioned DataGeo-replicationpartial replicationcausal consistencytransactionsDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaDatabases are a fundamental component of any web service, storing and managing all the service data. In large-scale web services, it is essential that the data storage systems used consider techniques such as partial replication, geo-replication, and weaker consistency models so that the expectations of these systems regarding availability and latency can be met as best as possible. In this dissertation, we address the problem of executing transactions on data that is partially replicated. In this sense, we adopt the transactional causal consistency semantics, the consistency model where a transaction accesses a causally consistent snapshot of the database. However, implementing this consistency model in a partially replicated setting raises several challenges regarding handling transactions that access data items replicated in different nodes. Our work aims to design and implement a novel algorithm for executing transactions over geo-partitioned data with transactional causal consistency semantics. We discuss the problems and design choices for executing transactions over partially replicated data and present a design to implement the proposed algorithm by extending a weakly consistent geo-replicated key-value store with partial replication, adding support for executing transactions involving geo-partitioned data items. In this context, we also addressed the problem of deciding the best strategy for searching data in replicas that hold only a part of the total data of a service and where the state of each replica might diverge. We evaluate our solution using microbenchmarks based on the TPC-H database. Our results show that the overhead of the system is low for the expected scenario of a low ratio of remote transactions.As bases de dados representam um componente fundamental de qualquer serviço web, armazenando e gerindo todos os dados do serviço. Em serviços web de grande escala, é essencial que os sistemas de armazenamento de dados utilizados considerem técnicas como a replicação parcial, geo-replicação e modelos de consistência mais fracos, de forma a que as expectativas dos utilizadores desses sistemas em relação à disponibilidade e latência possam ser atendidas da melhor forma possível. Nesta dissertação, abordamos o problema de executar transações sobre dados que estão parcialmente replicados. Nesse sentido, adotamos uma semântica de consistência transacional causal, o modelo de consistência em que uma transação acede a um snapshot causalmente consistente da base de dados. No entanto, implementar este modelo de consistência numa configuração parcialmente replicada levanta vários desafios relativamente à execução de transações que acedem a dados replicados em nós diferentes. O objetivo do nosso trabalho é projetar e implementar um novo algoritmo para a execução de transações sobre dados geo-particionados com semântica de consistência causal transacional. Discutimos os problemas e as opções de design para a execução de transações em dados parcialmente replicados e apresentamos um design para implementar o algoritmo proposto, estendendo um sistema de armazenamento chave-valor geo-replicado de consistência fraca com replicação parcial, adicionando suporte para executar transações envolvendo dados geo-particionados. Nesse contexto, também abordamos o problema de decidir a melhor estratégia para procurar dados em réplicas que guardam apenas uma parte total dos dados de um serviço e onde o estado de cada réplica pode divergir. Avaliamos a nossa solução utilizando microbenchmarks baseados na base de dados TPC-H. Os nossos resultados mostram que a carga adicional do sistema é baixa para o cenário esperado de uma baixa percentagem de transações remotas.Preguiça, NunoRUNBraz, Sofia Frederico de Sousa2022-11-03T15:53:46Z2022-012022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/145197enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:25:21Zoai:run.unl.pt:10362/145197Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:51:57.357711Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Transaction Processing over Geo-Partitioned Data
title Transaction Processing over Geo-Partitioned Data
spellingShingle Transaction Processing over Geo-Partitioned Data
Braz, Sofia Frederico de Sousa
Geo-replication
partial replication
causal consistency
transactions
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Transaction Processing over Geo-Partitioned Data
title_full Transaction Processing over Geo-Partitioned Data
title_fullStr Transaction Processing over Geo-Partitioned Data
title_full_unstemmed Transaction Processing over Geo-Partitioned Data
title_sort Transaction Processing over Geo-Partitioned Data
author Braz, Sofia Frederico de Sousa
author_facet Braz, Sofia Frederico de Sousa
author_role author
dc.contributor.none.fl_str_mv Preguiça, Nuno
RUN
dc.contributor.author.fl_str_mv Braz, Sofia Frederico de Sousa
dc.subject.por.fl_str_mv Geo-replication
partial replication
causal consistency
transactions
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Geo-replication
partial replication
causal consistency
transactions
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Databases are a fundamental component of any web service, storing and managing all the service data. In large-scale web services, it is essential that the data storage systems used consider techniques such as partial replication, geo-replication, and weaker consistency models so that the expectations of these systems regarding availability and latency can be met as best as possible. In this dissertation, we address the problem of executing transactions on data that is partially replicated. In this sense, we adopt the transactional causal consistency semantics, the consistency model where a transaction accesses a causally consistent snapshot of the database. However, implementing this consistency model in a partially replicated setting raises several challenges regarding handling transactions that access data items replicated in different nodes. Our work aims to design and implement a novel algorithm for executing transactions over geo-partitioned data with transactional causal consistency semantics. We discuss the problems and design choices for executing transactions over partially replicated data and present a design to implement the proposed algorithm by extending a weakly consistent geo-replicated key-value store with partial replication, adding support for executing transactions involving geo-partitioned data items. In this context, we also addressed the problem of deciding the best strategy for searching data in replicas that hold only a part of the total data of a service and where the state of each replica might diverge. We evaluate our solution using microbenchmarks based on the TPC-H database. Our results show that the overhead of the system is low for the expected scenario of a low ratio of remote transactions.
publishDate 2022
dc.date.none.fl_str_mv 2022-11-03T15:53:46Z
2022-01
2022-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/145197
url http://hdl.handle.net/10362/145197
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138111737823232