Arquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídos
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Biblioteca Digital de Teses e Dissertações do UFSM |
Texto Completo: | http://repositorio.ufsm.br/handle/1/26470 |
Resumo: | Distributed file systems are essential to support applications that handle large volumes of data. One of the most widely used file systems is the HDFS, Apache Hadoop’s Distributed File System. Data replication, which is at the core of the HDFS storage model, is essential for fault tolerance and performance since the placement of the data across the cluster directly affects replica balancing and data locality. As new data is loaded into the system, it is common for the distribution of the replicas among the nodes to become unbalanced. HDFS Balancer is the official solution for data balancing by rearranging the replicas already stored in the cluster. Nevertheless, its current balancing policy does not address the characteristics and specific needs of the applications during data rearrangement. Besides that, it is up to the system administrator to monitor the HDFS status and, when considered necessary, run the balancer daemon, which creates a dependency that is inadequate and inefficient in many situations. To address these limitations, this work presents DARB, a dynamic architecture that promotes reactive and proactive replica balancing. The reactive strategy arises from the PRBP, a customized and prioritized replica balancing policy for the HDFSBalancer. The PRBP is based on an adaptable and configurable system of priorities, from which association rules were defined to allow the use of multiple priorities simultaneously. Along with the rules, a set of usage guidelines has been formalized and evaluated through practical experiments, which validated the behavior and applicability of the PRBP. The proactive strategy of DARB, in contrast, consists of an event-driven strategy to make the replica balancing process in HDFS transparent. To this end, a metrics observation model and a structure were created to automatically determine when corrective actions should be taken and trigger the balancing process in the file system based on standardized trigger events. The evaluation results reinforce that the proposed solution removes the need for manual configuration and utilization of the HDFS Balancer while actively acting to keep the cluster balanced taking into account performance, reliability, and data availability perspectives. In this way, DARB presents itself as a specialized solution, flexibilizing the balancing process and introducing to HDFS the concept of context-aware replica balancing. |
id |
UFSM_5e24aa583b38ebf02db0708d903d6104 |
---|---|
oai_identifier_str |
oai:repositorio.ufsm.br:1/26470 |
network_acronym_str |
UFSM |
network_name_str |
Biblioteca Digital de Teses e Dissertações do UFSM |
repository_id_str |
|
spelling |
2022-10-13T18:58:56Z2022-10-13T18:58:56Z2022-09-28http://repositorio.ufsm.br/handle/1/26470Distributed file systems are essential to support applications that handle large volumes of data. One of the most widely used file systems is the HDFS, Apache Hadoop’s Distributed File System. Data replication, which is at the core of the HDFS storage model, is essential for fault tolerance and performance since the placement of the data across the cluster directly affects replica balancing and data locality. As new data is loaded into the system, it is common for the distribution of the replicas among the nodes to become unbalanced. HDFS Balancer is the official solution for data balancing by rearranging the replicas already stored in the cluster. Nevertheless, its current balancing policy does not address the characteristics and specific needs of the applications during data rearrangement. Besides that, it is up to the system administrator to monitor the HDFS status and, when considered necessary, run the balancer daemon, which creates a dependency that is inadequate and inefficient in many situations. To address these limitations, this work presents DARB, a dynamic architecture that promotes reactive and proactive replica balancing. The reactive strategy arises from the PRBP, a customized and prioritized replica balancing policy for the HDFSBalancer. The PRBP is based on an adaptable and configurable system of priorities, from which association rules were defined to allow the use of multiple priorities simultaneously. Along with the rules, a set of usage guidelines has been formalized and evaluated through practical experiments, which validated the behavior and applicability of the PRBP. The proactive strategy of DARB, in contrast, consists of an event-driven strategy to make the replica balancing process in HDFS transparent. To this end, a metrics observation model and a structure were created to automatically determine when corrective actions should be taken and trigger the balancing process in the file system based on standardized trigger events. The evaluation results reinforce that the proposed solution removes the need for manual configuration and utilization of the HDFS Balancer while actively acting to keep the cluster balanced taking into account performance, reliability, and data availability perspectives. In this way, DARB presents itself as a specialized solution, flexibilizing the balancing process and introducing to HDFS the concept of context-aware replica balancing.Sistemas de arquivos distribuídos são essenciais para suportar aplicações que lidam com grandes volumes de dados. Um dos sistemas mais utilizados é o HDFS, o sistema de arquivos distribuído do Apache Hadoop. A replicação de dados, que é o elemento central do modelo de armazenamento do HDFS, é essencial para a tolerância a falhas e o desempenho, sendo que o posicionamento das réplicas no cluster afeta diretamente o balanceamento de réplicas e a localidade dos dados. À medida que novos dados são escritos no sistema de arquivos, é comum que a distribuição das réplicas entre os nodos fique desequilibrada. O HDFS Balancer é a solução oficial para o balanceamento de dados por meio do rearranjo das réplicas já armazenadas no cluster. No entanto, sua política de operação atual não considera as características e necessidades específicas das aplicações. Além disso, cabe ao administrador monitorar o estado do HDFS e, quando julgar necessário, executar o balanceador, o que cria uma dependência manual e ineficiente em muitas situações. Para endereçar tais limitações, este trabalho apresenta a DARB, uma arquitetura dinâmica que promove o balanceamento reativo e proativo. Para a parte reativa, foi desenvolvida a PRBP, uma política personalizada com base em prioridades para o HDFS Balancer. A PRBP é formada por um sistema de prioridades adaptável e configurável, a partir do qual foram definidas regras de associação que permitem o uso de múltiplas prioridades em simultâneo. Em conjunto com as regras, guidelines de uso foram formalizadas e avaliadas experimentalmente, validando o comportamento e a aplicabilidade da PRBP. Já para a parte proativa da DARB, foi desenvolvida uma estratégia orientada a eventos que visa tornar transparente o processo de balanceamento de réplicas no HDFS. Para isso, criou-se um modelo de observação de métricas e uma estrutura que, por meio de eventos de disparo, determina automaticamente quando ações corretivas devem ser tomadas no sistema de arquivos. Os resultados da avaliação reforçam que a solução proposta remove a necessidade de configuração e uso manual do HDFS Balancer, enquanto atua ativamente para manter o cluster em umestadobalanceado emproldeperspectivas de desempenho, confiabilidade e disponibilidade dos dados. Desse modo, a DARB apresenta-se como uma solução especializada, flexibilizando o processo de balanceamento e introduzindo ao HDFS o conceito de balanceamento de réplicas sensível ao contexto.Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico - CNPqporUniversidade Federal de Santa MariaCentro de TecnologiaPrograma de Pós-Graduação em Ciência da ComputaçãoUFSMBrasilCiência da ComputaçãoAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessArquitetura dinâmicaPolítica de balanceamentoBalanceamento de réplicasReplicação de dadosTolerância a falhasSistemas de arquivos distribuídosDynamic architectureBalancing policyReplica balancingData replicationFault toleranceDistributed file systemsCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOArquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídosDynamic architecture for replica balancing in distributed file systemsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisBarcelos, Patrícia Pitthan de Araújohttp://lattes.cnpq.br/6069105173950277Lima, João Vicente FerreiraMendizabal, Odorico Machadohttp://lattes.cnpq.br/5175618357450515Fazul, Rhauani Weber Aita100300000007600600600600600a421e37e-ac28-4452-9943-e257a19343cac8b2ac4b-3da3-42ec-9d1e-e5f1ef8070fbaec00059-8729-40bd-b989-bf07fcb3bdbdf919cfa2-8dac-4a9d-a070-e0819e7f3594reponame:Biblioteca Digital de Teses e Dissertações do UFSMinstname:Universidade Federal de Santa Maria (UFSM)instacron:UFSMLICENSElicense.txtlicense.txttext/plain; charset=utf-816http://repositorio.ufsm.br/bitstream/1/26470/3/license.txtf8fcb28efb1c8cf0dc096bec902bf4c4MD53ORIGINALDIS_PPGCC_2022_AITA_RHAUANI.pdfDIS_PPGCC_2022_AITA_RHAUANI.pdfDissertação de mestradoapplication/pdf927277http://repositorio.ufsm.br/bitstream/1/26470/1/DIS_PPGCC_2022_AITA_RHAUANI.pdf83e311e892f70ffd491c327fbf483db3MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805http://repositorio.ufsm.br/bitstream/1/26470/2/license_rdf4460e5956bc1d1639be9ae6146a50347MD521/264702022-10-13 15:58:56.862oai:repositorio.ufsm.br:1/26470Q3JlYXRpdmUgQ29tbW9ucw==Biblioteca Digital de Teses e Dissertaçõeshttps://repositorio.ufsm.br/ONGhttps://repositorio.ufsm.br/oai/requestatendimento.sib@ufsm.br||tedebc@gmail.comopendoar:2022-10-13T18:58:56Biblioteca Digital de Teses e Dissertações do UFSM - Universidade Federal de Santa Maria (UFSM)false |
dc.title.por.fl_str_mv |
Arquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídos |
dc.title.alternative.eng.fl_str_mv |
Dynamic architecture for replica balancing in distributed file systems |
title |
Arquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídos |
spellingShingle |
Arquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídos Fazul, Rhauani Weber Aita Arquitetura dinâmica Política de balanceamento Balanceamento de réplicas Replicação de dados Tolerância a falhas Sistemas de arquivos distribuídos Dynamic architecture Balancing policy Replica balancing Data replication Fault tolerance Distributed file systems CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
title_short |
Arquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídos |
title_full |
Arquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídos |
title_fullStr |
Arquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídos |
title_full_unstemmed |
Arquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídos |
title_sort |
Arquitetura dinâmica para o balanceamento de réplicas em sistemas de arquivos distribuídos |
author |
Fazul, Rhauani Weber Aita |
author_facet |
Fazul, Rhauani Weber Aita |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
Barcelos, Patrícia Pitthan de Araújo |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/6069105173950277 |
dc.contributor.referee1.fl_str_mv |
Lima, João Vicente Ferreira |
dc.contributor.referee2.fl_str_mv |
Mendizabal, Odorico Machado |
dc.contributor.authorLattes.fl_str_mv |
http://lattes.cnpq.br/5175618357450515 |
dc.contributor.author.fl_str_mv |
Fazul, Rhauani Weber Aita |
contributor_str_mv |
Barcelos, Patrícia Pitthan de Araújo Lima, João Vicente Ferreira Mendizabal, Odorico Machado |
dc.subject.por.fl_str_mv |
Arquitetura dinâmica Política de balanceamento Balanceamento de réplicas Replicação de dados Tolerância a falhas Sistemas de arquivos distribuídos |
topic |
Arquitetura dinâmica Política de balanceamento Balanceamento de réplicas Replicação de dados Tolerância a falhas Sistemas de arquivos distribuídos Dynamic architecture Balancing policy Replica balancing Data replication Fault tolerance Distributed file systems CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
dc.subject.eng.fl_str_mv |
Dynamic architecture Balancing policy Replica balancing Data replication Fault tolerance Distributed file systems |
dc.subject.cnpq.fl_str_mv |
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
description |
Distributed file systems are essential to support applications that handle large volumes of data. One of the most widely used file systems is the HDFS, Apache Hadoop’s Distributed File System. Data replication, which is at the core of the HDFS storage model, is essential for fault tolerance and performance since the placement of the data across the cluster directly affects replica balancing and data locality. As new data is loaded into the system, it is common for the distribution of the replicas among the nodes to become unbalanced. HDFS Balancer is the official solution for data balancing by rearranging the replicas already stored in the cluster. Nevertheless, its current balancing policy does not address the characteristics and specific needs of the applications during data rearrangement. Besides that, it is up to the system administrator to monitor the HDFS status and, when considered necessary, run the balancer daemon, which creates a dependency that is inadequate and inefficient in many situations. To address these limitations, this work presents DARB, a dynamic architecture that promotes reactive and proactive replica balancing. The reactive strategy arises from the PRBP, a customized and prioritized replica balancing policy for the HDFSBalancer. The PRBP is based on an adaptable and configurable system of priorities, from which association rules were defined to allow the use of multiple priorities simultaneously. Along with the rules, a set of usage guidelines has been formalized and evaluated through practical experiments, which validated the behavior and applicability of the PRBP. The proactive strategy of DARB, in contrast, consists of an event-driven strategy to make the replica balancing process in HDFS transparent. To this end, a metrics observation model and a structure were created to automatically determine when corrective actions should be taken and trigger the balancing process in the file system based on standardized trigger events. The evaluation results reinforce that the proposed solution removes the need for manual configuration and utilization of the HDFS Balancer while actively acting to keep the cluster balanced taking into account performance, reliability, and data availability perspectives. In this way, DARB presents itself as a specialized solution, flexibilizing the balancing process and introducing to HDFS the concept of context-aware replica balancing. |
publishDate |
2022 |
dc.date.accessioned.fl_str_mv |
2022-10-13T18:58:56Z |
dc.date.available.fl_str_mv |
2022-10-13T18:58:56Z |
dc.date.issued.fl_str_mv |
2022-09-28 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://repositorio.ufsm.br/handle/1/26470 |
url |
http://repositorio.ufsm.br/handle/1/26470 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.cnpq.fl_str_mv |
100300000007 |
dc.relation.confidence.fl_str_mv |
600 600 600 600 600 |
dc.relation.authority.fl_str_mv |
a421e37e-ac28-4452-9943-e257a19343ca c8b2ac4b-3da3-42ec-9d1e-e5f1ef8070fb aec00059-8729-40bd-b989-bf07fcb3bdbd f919cfa2-8dac-4a9d-a070-e0819e7f3594 |
dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Santa Maria Centro de Tecnologia |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Ciência da Computação |
dc.publisher.initials.fl_str_mv |
UFSM |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
Ciência da Computação |
publisher.none.fl_str_mv |
Universidade Federal de Santa Maria Centro de Tecnologia |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações do UFSM instname:Universidade Federal de Santa Maria (UFSM) instacron:UFSM |
instname_str |
Universidade Federal de Santa Maria (UFSM) |
instacron_str |
UFSM |
institution |
UFSM |
reponame_str |
Biblioteca Digital de Teses e Dissertações do UFSM |
collection |
Biblioteca Digital de Teses e Dissertações do UFSM |
bitstream.url.fl_str_mv |
http://repositorio.ufsm.br/bitstream/1/26470/3/license.txt http://repositorio.ufsm.br/bitstream/1/26470/1/DIS_PPGCC_2022_AITA_RHAUANI.pdf http://repositorio.ufsm.br/bitstream/1/26470/2/license_rdf |
bitstream.checksum.fl_str_mv |
f8fcb28efb1c8cf0dc096bec902bf4c4 83e311e892f70ffd491c327fbf483db3 4460e5956bc1d1639be9ae6146a50347 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações do UFSM - Universidade Federal de Santa Maria (UFSM) |
repository.mail.fl_str_mv |
atendimento.sib@ufsm.br||tedebc@gmail.com |
_version_ |
1801485319444365312 |