Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFS |
Texto Completo: | https://ri.ufs.br/jspui/handle/riufs/19478 |
Resumo: | Currently, with the exponential advancement of technology, a large amount of data is generated daily. These data aren’t generated just by people. A range of electronic equipment has also become great generators. These large volumes of data are known as Big Data and produce valuable and helpful information for business intelligence, forecasting, and decision support, among other possibilities. However, processing this large volume of data requires a different computational approach from the traditional one, called High Performance Computing (HPC). Over the years, the HPC has been using supercomputers or computing clusters. The first one is no longer an option due to its high cost and difficulty to maintain, making clustering an ideal alternative. Clusters are loosely coupled systems, formed by a set of computers that work in collaboration with each other, using message exchange libraries. In addition, clusters formed by Single Board Computers (SBC) are a viable alternative for the development of research in this area. Among the SBCs, the Raspberry Pi stands out, a SBC initially developed to promote the teaching of computer science. Its variety of models allows it to meet several specific requirements and does not require large investments. To operate and to process this large volume of data in a cluster, it is necessary to have a big data platform, the Apache Hadoop being one of the most widely available today. Thus, a good solution to obtain a low-cost big data cluster is to combine the use the Raspberry Pi as the hardware structure and Apache Hadoop as Big Data platform. However, the lack of detailed material explaining all the installation steps, the configuration process, and, finally, the certification that the Hadoop cluster is working correctly is a problem little explored by the academic community. In addition, the monitoring of cluster resources is also a problem that is rarely addressed by the academy. In order to solve this problem, this work aims to develop and evaluate the performance of a low-cost big data cluster using Raspberry Pi as a low-cost hardware structure and Apache Hadoop as a Big Data platform. Its evaluation will be done using benchmarks widespread in the area (Terasort and TestDFSIO), in addition to accompanying and monitoring the use of its resources using the tools Zabbix and Grafana, providing a complete and detailed material of this entire process. |
id |
UFS-2_47869e07b7727450c434d1a4d1a39b81 |
---|---|
oai_identifier_str |
oai:oai:ri.ufs.br:repo_01:riufs/19478 |
network_acronym_str |
UFS-2 |
network_name_str |
Repositório Institucional da UFS |
repository_id_str |
|
spelling |
Alves Neto, Antônio JoséOrdonez, Edward David MorenoCarneiro Neto, José Aprígio2024-07-05T19:21:46Z2024-07-05T19:21:46Z2023-04-20ALVES NETO, Antônio José. Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data. 2023. 108 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023.https://ri.ufs.br/jspui/handle/riufs/19478Currently, with the exponential advancement of technology, a large amount of data is generated daily. These data aren’t generated just by people. A range of electronic equipment has also become great generators. These large volumes of data are known as Big Data and produce valuable and helpful information for business intelligence, forecasting, and decision support, among other possibilities. However, processing this large volume of data requires a different computational approach from the traditional one, called High Performance Computing (HPC). Over the years, the HPC has been using supercomputers or computing clusters. The first one is no longer an option due to its high cost and difficulty to maintain, making clustering an ideal alternative. Clusters are loosely coupled systems, formed by a set of computers that work in collaboration with each other, using message exchange libraries. In addition, clusters formed by Single Board Computers (SBC) are a viable alternative for the development of research in this area. Among the SBCs, the Raspberry Pi stands out, a SBC initially developed to promote the teaching of computer science. Its variety of models allows it to meet several specific requirements and does not require large investments. To operate and to process this large volume of data in a cluster, it is necessary to have a big data platform, the Apache Hadoop being one of the most widely available today. Thus, a good solution to obtain a low-cost big data cluster is to combine the use the Raspberry Pi as the hardware structure and Apache Hadoop as Big Data platform. However, the lack of detailed material explaining all the installation steps, the configuration process, and, finally, the certification that the Hadoop cluster is working correctly is a problem little explored by the academic community. In addition, the monitoring of cluster resources is also a problem that is rarely addressed by the academy. In order to solve this problem, this work aims to develop and evaluate the performance of a low-cost big data cluster using Raspberry Pi as a low-cost hardware structure and Apache Hadoop as a Big Data platform. Its evaluation will be done using benchmarks widespread in the area (Terasort and TestDFSIO), in addition to accompanying and monitoring the use of its resources using the tools Zabbix and Grafana, providing a complete and detailed material of this entire process.Atualmente, com o exponencial avanço da tecnologia, uma grande quantidade dados é gerada diariamente. Dados esses que não são gerados apenas por pessoas. Uma gama de equipamentos eletrônicos também tornaram-se grandes geradores, dos quais esses grandes volumes de dados são conhecidos como Big Data e produzem informações valiosas e úteis para business intelligence, previsão, suporte à decisão, dentre outras possibilidades. Entretanto, o processamento desse grande volume de dados requer uma abordagem computacional diferente da tradicional, chamada de Computação de Alta Performance (High Perfomance Computing - HPC). Ao longo dos anos, a HPC vem sendo obtida graças à utilização de supercomputadores ou através de clusters computacionais. O primeiro deixou de ser uma opção pelo seu alto custo e difícil de manutenção, deixando a “clusterização” como a alternativa ideal. Os clusters são sistemas fracamente acoplados, formados por um conjunto de computadores que trabalham em colaboração uns com os outros, usando bibliotecas de troca de mensagens. Além disso, os clusters formados por Computadores de Placa Única (Single Board Computer - SBC) são uma alternativa viável para o desenvolvimento de pesquisas nessa área. Dentre os computadores de placa única, destaca se a Raspberry Pi, um SBC desenvolvido inicialmente para promover o ensino da ciência da computação. Sua variedade de modelos permite atender a diversas necessidades específicas e não requer grandes investimentos. Para operacionalização e processamento desse grande volume de dados em um cluster, faz-se necessário a instalação de uma plataforma de big data, sendo o Apache Hadoop uma das mais difundidas disponíveis atualmente. Desta forma, uma boa solução para se obter um cluster big data de baixo custo é utilizar a Raspberry Pi como estrutura de hardware e o Apache Hadoop como plataforma Big Data. No entanto, a falta de um material detalhado explicando todas as etapas da instalação, o processo de configuração e, por fim, a certificação de que o cluster Hadoop está funcionando corretamente é um problema pouco explorado pela comunidade acadêmica. Além disso, o monitoramento de recursos do cluster também é um problema que é pouco abordado pela academia. Partindo dessa problemática, este trabalho tem como objetivo, o desenvolvimento e avaliação de desempenho de um cluster big data de baixo custo utilizando Raspberry Pi, como estrutura de hardware de baixo custo e o Apache Hadoop como plataforma de Big Data. A avaliação do mesmo será feita utilizando benchmarks difundidos na área (Terasort e TestDFSIO), além de acompanhar e monitorar o uso dos seus recursos utilizando as ferramentas Zabbix e Grafana, provendo um material completo e detalhado de todo esse processo.São CristóvãoporPlataforma aberta da WebBenchmarking (administração)Big dataCluster (sistema de computador)Raspberry pi (computador)Zabbix (software)Apache HadoopBenchmarksGrafanaCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAODesenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big datainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisPós-Graduação em Ciência da ComputaçãoUniversidade Federal de Sergipe (UFS)reponame:Repositório Institucional da UFSinstname:Universidade Federal de Sergipe (UFS)instacron:UFSinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81475https://ri.ufs.br/jspui/bitstream/riufs/19478/1/license.txt098cbbf65c2c15e1fb2e49c5d306a44cMD51ORIGINALANTONIO_JOSE_ALVES_NETO.pdfANTONIO_JOSE_ALVES_NETO.pdfapplication/pdf23956005https://ri.ufs.br/jspui/bitstream/riufs/19478/2/ANTONIO_JOSE_ALVES_NETO.pdf05cb4d6bbbaa47ae595f8d199c41e310MD52riufs/194782024-07-05 16:21:52.048oai:oai:ri.ufs.br:repo_01:riufs/19478TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvcihlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSByZXByb2R1emlyIHNldSB0cmFiYWxobyBubyBmb3JtYXRvIGVsZXRyw7RuaWNvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFNlcmdpcGUgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIHNldSB0cmFiYWxobyBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgZGUgc2V1IHRyYWJhbGhvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIHNldSB0cmFiYWxobyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0bywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgbsOjbyBpbmZyaW5nZSBkaXJlaXRvcyBhdXRvcmFpcyBkZSBuaW5ndcOpbS4KCkNhc28gbyB0cmFiYWxobyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvLgoKQSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIHNlIGNvbXByb21ldGUgYSBpZGVudGlmaWNhciBjbGFyYW1lbnRlIG8gc2V1IG5vbWUocykgb3UgbyhzKSBub21lKHMpIGRvKHMpIApkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRvIHRyYWJhbGhvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuIAo=Repositório InstitucionalPUBhttps://ri.ufs.br/oai/requestrepositorio@academico.ufs.bropendoar:2024-07-05T19:21:52Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)false |
dc.title.pt_BR.fl_str_mv |
Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data |
title |
Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data |
spellingShingle |
Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data Alves Neto, Antônio José Plataforma aberta da Web Benchmarking (administração) Big data Cluster (sistema de computador) Raspberry pi (computador) Zabbix (software) Apache Hadoop Benchmarks Grafana CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
title_short |
Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data |
title_full |
Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data |
title_fullStr |
Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data |
title_full_unstemmed |
Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data |
title_sort |
Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data |
author |
Alves Neto, Antônio José |
author_facet |
Alves Neto, Antônio José |
author_role |
author |
dc.contributor.author.fl_str_mv |
Alves Neto, Antônio José |
dc.contributor.advisor1.fl_str_mv |
Ordonez, Edward David Moreno |
dc.contributor.advisor-co1.fl_str_mv |
Carneiro Neto, José Aprígio |
contributor_str_mv |
Ordonez, Edward David Moreno Carneiro Neto, José Aprígio |
dc.subject.por.fl_str_mv |
Plataforma aberta da Web Benchmarking (administração) Big data Cluster (sistema de computador) Raspberry pi (computador) |
topic |
Plataforma aberta da Web Benchmarking (administração) Big data Cluster (sistema de computador) Raspberry pi (computador) Zabbix (software) Apache Hadoop Benchmarks Grafana CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
dc.subject.eng.fl_str_mv |
Zabbix (software) Apache Hadoop Benchmarks Grafana |
dc.subject.cnpq.fl_str_mv |
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
description |
Currently, with the exponential advancement of technology, a large amount of data is generated daily. These data aren’t generated just by people. A range of electronic equipment has also become great generators. These large volumes of data are known as Big Data and produce valuable and helpful information for business intelligence, forecasting, and decision support, among other possibilities. However, processing this large volume of data requires a different computational approach from the traditional one, called High Performance Computing (HPC). Over the years, the HPC has been using supercomputers or computing clusters. The first one is no longer an option due to its high cost and difficulty to maintain, making clustering an ideal alternative. Clusters are loosely coupled systems, formed by a set of computers that work in collaboration with each other, using message exchange libraries. In addition, clusters formed by Single Board Computers (SBC) are a viable alternative for the development of research in this area. Among the SBCs, the Raspberry Pi stands out, a SBC initially developed to promote the teaching of computer science. Its variety of models allows it to meet several specific requirements and does not require large investments. To operate and to process this large volume of data in a cluster, it is necessary to have a big data platform, the Apache Hadoop being one of the most widely available today. Thus, a good solution to obtain a low-cost big data cluster is to combine the use the Raspberry Pi as the hardware structure and Apache Hadoop as Big Data platform. However, the lack of detailed material explaining all the installation steps, the configuration process, and, finally, the certification that the Hadoop cluster is working correctly is a problem little explored by the academic community. In addition, the monitoring of cluster resources is also a problem that is rarely addressed by the academy. In order to solve this problem, this work aims to develop and evaluate the performance of a low-cost big data cluster using Raspberry Pi as a low-cost hardware structure and Apache Hadoop as a Big Data platform. Its evaluation will be done using benchmarks widespread in the area (Terasort and TestDFSIO), in addition to accompanying and monitoring the use of its resources using the tools Zabbix and Grafana, providing a complete and detailed material of this entire process. |
publishDate |
2023 |
dc.date.issued.fl_str_mv |
2023-04-20 |
dc.date.accessioned.fl_str_mv |
2024-07-05T19:21:46Z |
dc.date.available.fl_str_mv |
2024-07-05T19:21:46Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
ALVES NETO, Antônio José. Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data. 2023. 108 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023. |
dc.identifier.uri.fl_str_mv |
https://ri.ufs.br/jspui/handle/riufs/19478 |
identifier_str_mv |
ALVES NETO, Antônio José. Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data. 2023. 108 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023. |
url |
https://ri.ufs.br/jspui/handle/riufs/19478 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.program.fl_str_mv |
Pós-Graduação em Ciência da Computação |
dc.publisher.initials.fl_str_mv |
Universidade Federal de Sergipe (UFS) |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFS instname:Universidade Federal de Sergipe (UFS) instacron:UFS |
instname_str |
Universidade Federal de Sergipe (UFS) |
instacron_str |
UFS |
institution |
UFS |
reponame_str |
Repositório Institucional da UFS |
collection |
Repositório Institucional da UFS |
bitstream.url.fl_str_mv |
https://ri.ufs.br/jspui/bitstream/riufs/19478/1/license.txt https://ri.ufs.br/jspui/bitstream/riufs/19478/2/ANTONIO_JOSE_ALVES_NETO.pdf |
bitstream.checksum.fl_str_mv |
098cbbf65c2c15e1fb2e49c5d306a44c 05cb4d6bbbaa47ae595f8d199c41e310 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS) |
repository.mail.fl_str_mv |
repositorio@academico.ufs.br |
_version_ |
1813824999485603840 |