Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data

Detalhes bibliográficos
Autor(a) principal: Alves Neto, Antônio José
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da UFS
Texto Completo: https://ri.ufs.br/jspui/handle/riufs/19478
Resumo: Currently, with the exponential advancement of technology, a large amount of data is generated daily. These data aren’t generated just by people. A range of electronic equipment has also become great generators. These large volumes of data are known as Big Data and produce valuable and helpful information for business intelligence, forecasting, and decision support, among other possibilities. However, processing this large volume of data requires a different computational approach from the traditional one, called High Performance Computing (HPC). Over the years, the HPC has been using supercomputers or computing clusters. The first one is no longer an option due to its high cost and difficulty to maintain, making clustering an ideal alternative. Clusters are loosely coupled systems, formed by a set of computers that work in collaboration with each other, using message exchange libraries. In addition, clusters formed by Single Board Computers (SBC) are a viable alternative for the development of research in this area. Among the SBCs, the Raspberry Pi stands out, a SBC initially developed to promote the teaching of computer science. Its variety of models allows it to meet several specific requirements and does not require large investments. To operate and to process this large volume of data in a cluster, it is necessary to have a big data platform, the Apache Hadoop being one of the most widely available today. Thus, a good solution to obtain a low-cost big data cluster is to combine the use the Raspberry Pi as the hardware structure and Apache Hadoop as Big Data platform. However, the lack of detailed material explaining all the installation steps, the configuration process, and, finally, the certification that the Hadoop cluster is working correctly is a problem little explored by the academic community. In addition, the monitoring of cluster resources is also a problem that is rarely addressed by the academy. In order to solve this problem, this work aims to develop and evaluate the performance of a low-cost big data cluster using Raspberry Pi as a low-cost hardware structure and Apache Hadoop as a Big Data platform. Its evaluation will be done using benchmarks widespread in the area (Terasort and TestDFSIO), in addition to accompanying and monitoring the use of its resources using the tools Zabbix and Grafana, providing a complete and detailed material of this entire process.
id UFS-2_47869e07b7727450c434d1a4d1a39b81
oai_identifier_str oai:oai:ri.ufs.br:repo_01:riufs/19478
network_acronym_str UFS-2
network_name_str Repositório Institucional da UFS
repository_id_str
spelling Alves Neto, Antônio JoséOrdonez, Edward David MorenoCarneiro Neto, José Aprígio2024-07-05T19:21:46Z2024-07-05T19:21:46Z2023-04-20ALVES NETO, Antônio José. Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data. 2023. 108 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023.https://ri.ufs.br/jspui/handle/riufs/19478Currently, with the exponential advancement of technology, a large amount of data is generated daily. These data aren’t generated just by people. A range of electronic equipment has also become great generators. These large volumes of data are known as Big Data and produce valuable and helpful information for business intelligence, forecasting, and decision support, among other possibilities. However, processing this large volume of data requires a different computational approach from the traditional one, called High Performance Computing (HPC). Over the years, the HPC has been using supercomputers or computing clusters. The first one is no longer an option due to its high cost and difficulty to maintain, making clustering an ideal alternative. Clusters are loosely coupled systems, formed by a set of computers that work in collaboration with each other, using message exchange libraries. In addition, clusters formed by Single Board Computers (SBC) are a viable alternative for the development of research in this area. Among the SBCs, the Raspberry Pi stands out, a SBC initially developed to promote the teaching of computer science. Its variety of models allows it to meet several specific requirements and does not require large investments. To operate and to process this large volume of data in a cluster, it is necessary to have a big data platform, the Apache Hadoop being one of the most widely available today. Thus, a good solution to obtain a low-cost big data cluster is to combine the use the Raspberry Pi as the hardware structure and Apache Hadoop as Big Data platform. However, the lack of detailed material explaining all the installation steps, the configuration process, and, finally, the certification that the Hadoop cluster is working correctly is a problem little explored by the academic community. In addition, the monitoring of cluster resources is also a problem that is rarely addressed by the academy. In order to solve this problem, this work aims to develop and evaluate the performance of a low-cost big data cluster using Raspberry Pi as a low-cost hardware structure and Apache Hadoop as a Big Data platform. Its evaluation will be done using benchmarks widespread in the area (Terasort and TestDFSIO), in addition to accompanying and monitoring the use of its resources using the tools Zabbix and Grafana, providing a complete and detailed material of this entire process.Atualmente, com o exponencial avanço da tecnologia, uma grande quantidade dados é gerada diariamente. Dados esses que não são gerados apenas por pessoas. Uma gama de equipamentos eletrônicos também tornaram-se grandes geradores, dos quais esses grandes volumes de dados são conhecidos como Big Data e produzem informações valiosas e úteis para business intelligence, previsão, suporte à decisão, dentre outras possibilidades. Entretanto, o processamento desse grande volume de dados requer uma abordagem computacional diferente da tradicional, chamada de Computação de Alta Performance (High Perfomance Computing - HPC). Ao longo dos anos, a HPC vem sendo obtida graças à utilização de supercomputadores ou através de clusters computacionais. O primeiro deixou de ser uma opção pelo seu alto custo e difícil de manutenção, deixando a “clusterização” como a alternativa ideal. Os clusters são sistemas fracamente acoplados, formados por um conjunto de computadores que trabalham em colaboração uns com os outros, usando bibliotecas de troca de mensagens. Além disso, os clusters formados por Computadores de Placa Única (Single Board Computer - SBC) são uma alternativa viável para o desenvolvimento de pesquisas nessa área. Dentre os computadores de placa única, destaca se a Raspberry Pi, um SBC desenvolvido inicialmente para promover o ensino da ciência da computação. Sua variedade de modelos permite atender a diversas necessidades específicas e não requer grandes investimentos. Para operacionalização e processamento desse grande volume de dados em um cluster, faz-se necessário a instalação de uma plataforma de big data, sendo o Apache Hadoop uma das mais difundidas disponíveis atualmente. Desta forma, uma boa solução para se obter um cluster big data de baixo custo é utilizar a Raspberry Pi como estrutura de hardware e o Apache Hadoop como plataforma Big Data. No entanto, a falta de um material detalhado explicando todas as etapas da instalação, o processo de configuração e, por fim, a certificação de que o cluster Hadoop está funcionando corretamente é um problema pouco explorado pela comunidade acadêmica. Além disso, o monitoramento de recursos do cluster também é um problema que é pouco abordado pela academia. Partindo dessa problemática, este trabalho tem como objetivo, o desenvolvimento e avaliação de desempenho de um cluster big data de baixo custo utilizando Raspberry Pi, como estrutura de hardware de baixo custo e o Apache Hadoop como plataforma de Big Data. A avaliação do mesmo será feita utilizando benchmarks difundidos na área (Terasort e TestDFSIO), além de acompanhar e monitorar o uso dos seus recursos utilizando as ferramentas Zabbix e Grafana, provendo um material completo e detalhado de todo esse processo.São CristóvãoporPlataforma aberta da WebBenchmarking (administração)Big dataCluster (sistema de computador)Raspberry pi (computador)Zabbix (software)Apache HadoopBenchmarksGrafanaCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAODesenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big datainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisPós-Graduação em Ciência da ComputaçãoUniversidade Federal de Sergipe (UFS)reponame:Repositório Institucional da UFSinstname:Universidade Federal de Sergipe (UFS)instacron:UFSinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81475https://ri.ufs.br/jspui/bitstream/riufs/19478/1/license.txt098cbbf65c2c15e1fb2e49c5d306a44cMD51ORIGINALANTONIO_JOSE_ALVES_NETO.pdfANTONIO_JOSE_ALVES_NETO.pdfapplication/pdf23956005https://ri.ufs.br/jspui/bitstream/riufs/19478/2/ANTONIO_JOSE_ALVES_NETO.pdf05cb4d6bbbaa47ae595f8d199c41e310MD52riufs/194782024-07-05 16:21:52.048oai:oai:ri.ufs.br:repo_01:riufs/19478TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvcihlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSByZXByb2R1emlyIHNldSB0cmFiYWxobyBubyBmb3JtYXRvIGVsZXRyw7RuaWNvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFNlcmdpcGUgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIHNldSB0cmFiYWxobyBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgZGUgc2V1IHRyYWJhbGhvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIHNldSB0cmFiYWxobyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0bywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgbsOjbyBpbmZyaW5nZSBkaXJlaXRvcyBhdXRvcmFpcyBkZSBuaW5ndcOpbS4KCkNhc28gbyB0cmFiYWxobyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvLgoKQSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIHNlIGNvbXByb21ldGUgYSBpZGVudGlmaWNhciBjbGFyYW1lbnRlIG8gc2V1IG5vbWUocykgb3UgbyhzKSBub21lKHMpIGRvKHMpIApkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRvIHRyYWJhbGhvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuIAo=Repositório InstitucionalPUBhttps://ri.ufs.br/oai/requestrepositorio@academico.ufs.bropendoar:2024-07-05T19:21:52Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)false
dc.title.pt_BR.fl_str_mv Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
title Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
spellingShingle Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
Alves Neto, Antônio José
Plataforma aberta da Web
Benchmarking (administração)
Big data
Cluster (sistema de computador)
Raspberry pi (computador)
Zabbix (software)
Apache Hadoop
Benchmarks
Grafana
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
title_full Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
title_fullStr Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
title_full_unstemmed Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
title_sort Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
author Alves Neto, Antônio José
author_facet Alves Neto, Antônio José
author_role author
dc.contributor.author.fl_str_mv Alves Neto, Antônio José
dc.contributor.advisor1.fl_str_mv Ordonez, Edward David Moreno
dc.contributor.advisor-co1.fl_str_mv Carneiro Neto, José Aprígio
contributor_str_mv Ordonez, Edward David Moreno
Carneiro Neto, José Aprígio
dc.subject.por.fl_str_mv Plataforma aberta da Web
Benchmarking (administração)
Big data
Cluster (sistema de computador)
Raspberry pi (computador)
topic Plataforma aberta da Web
Benchmarking (administração)
Big data
Cluster (sistema de computador)
Raspberry pi (computador)
Zabbix (software)
Apache Hadoop
Benchmarks
Grafana
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Zabbix (software)
Apache Hadoop
Benchmarks
Grafana
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description Currently, with the exponential advancement of technology, a large amount of data is generated daily. These data aren’t generated just by people. A range of electronic equipment has also become great generators. These large volumes of data are known as Big Data and produce valuable and helpful information for business intelligence, forecasting, and decision support, among other possibilities. However, processing this large volume of data requires a different computational approach from the traditional one, called High Performance Computing (HPC). Over the years, the HPC has been using supercomputers or computing clusters. The first one is no longer an option due to its high cost and difficulty to maintain, making clustering an ideal alternative. Clusters are loosely coupled systems, formed by a set of computers that work in collaboration with each other, using message exchange libraries. In addition, clusters formed by Single Board Computers (SBC) are a viable alternative for the development of research in this area. Among the SBCs, the Raspberry Pi stands out, a SBC initially developed to promote the teaching of computer science. Its variety of models allows it to meet several specific requirements and does not require large investments. To operate and to process this large volume of data in a cluster, it is necessary to have a big data platform, the Apache Hadoop being one of the most widely available today. Thus, a good solution to obtain a low-cost big data cluster is to combine the use the Raspberry Pi as the hardware structure and Apache Hadoop as Big Data platform. However, the lack of detailed material explaining all the installation steps, the configuration process, and, finally, the certification that the Hadoop cluster is working correctly is a problem little explored by the academic community. In addition, the monitoring of cluster resources is also a problem that is rarely addressed by the academy. In order to solve this problem, this work aims to develop and evaluate the performance of a low-cost big data cluster using Raspberry Pi as a low-cost hardware structure and Apache Hadoop as a Big Data platform. Its evaluation will be done using benchmarks widespread in the area (Terasort and TestDFSIO), in addition to accompanying and monitoring the use of its resources using the tools Zabbix and Grafana, providing a complete and detailed material of this entire process.
publishDate 2023
dc.date.issued.fl_str_mv 2023-04-20
dc.date.accessioned.fl_str_mv 2024-07-05T19:21:46Z
dc.date.available.fl_str_mv 2024-07-05T19:21:46Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv ALVES NETO, Antônio José. Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data. 2023. 108 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023.
dc.identifier.uri.fl_str_mv https://ri.ufs.br/jspui/handle/riufs/19478
identifier_str_mv ALVES NETO, Antônio José. Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data. 2023. 108 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023.
url https://ri.ufs.br/jspui/handle/riufs/19478
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.program.fl_str_mv Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv Universidade Federal de Sergipe (UFS)
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFS
instname:Universidade Federal de Sergipe (UFS)
instacron:UFS
instname_str Universidade Federal de Sergipe (UFS)
instacron_str UFS
institution UFS
reponame_str Repositório Institucional da UFS
collection Repositório Institucional da UFS
bitstream.url.fl_str_mv https://ri.ufs.br/jspui/bitstream/riufs/19478/1/license.txt
https://ri.ufs.br/jspui/bitstream/riufs/19478/2/ANTONIO_JOSE_ALVES_NETO.pdf
bitstream.checksum.fl_str_mv 098cbbf65c2c15e1fb2e49c5d306a44c
05cb4d6bbbaa47ae595f8d199c41e310
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)
repository.mail.fl_str_mv repositorio@academico.ufs.br
_version_ 1813824999485603840