Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data

Alves Neto, Antônio José

Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data

Detalhes bibliográficos
Autor(a) principal:	Alves Neto, Antônio José
Data de Publicação:	2023
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFS
Texto Completo:	https://ri.ufs.br/jspui/handle/riufs/19478
Resumo:	Currently, with the exponential advancement of technology, a large amount of data is generated daily. These data aren’t generated just by people. A range of electronic equipment has also become great generators. These large volumes of data are known as Big Data and produce valuable and helpful information for business intelligence, forecasting, and decision support, among other possibilities. However, processing this large volume of data requires a different computational approach from the traditional one, called High Performance Computing (HPC). Over the years, the HPC has been using supercomputers or computing clusters. The first one is no longer an option due to its high cost and difficulty to maintain, making clustering an ideal alternative. Clusters are loosely coupled systems, formed by a set of computers that work in collaboration with each other, using message exchange libraries. In addition, clusters formed by Single Board Computers (SBC) are a viable alternative for the development of research in this area. Among the SBCs, the Raspberry Pi stands out, a SBC initially developed to promote the teaching of computer science. Its variety of models allows it to meet several specific requirements and does not require large investments. To operate and to process this large volume of data in a cluster, it is necessary to have a big data platform, the Apache Hadoop being one of the most widely available today. Thus, a good solution to obtain a low-cost big data cluster is to combine the use the Raspberry Pi as the hardware structure and Apache Hadoop as Big Data platform. However, the lack of detailed material explaining all the installation steps, the configuration process, and, finally, the certification that the Hadoop cluster is working correctly is a problem little explored by the academic community. In addition, the monitoring of cluster resources is also a problem that is rarely addressed by the academy. In order to solve this problem, this work aims to develop and evaluate the performance of a low-cost big data cluster using Raspberry Pi as a low-cost hardware structure and Apache Hadoop as a Big Data platform. Its evaluation will be done using benchmarks widespread in the area (Terasort and TestDFSIO), in addition to accompanying and monitoring the use of its resources using the tools Zabbix and Grafana, providing a complete and detailed material of this entire process.

Metadados do item

id	UFS-2_47869e07b7727450c434d1a4d1a39b81
oai_identifier_str	oai:oai:ri.ufs.br:repo_01:riufs/19478
network_acronym_str	UFS-2
network_name_str	Repositório Institucional da UFS
repository_id_str
spelling	Alves Neto, Antônio JoséOrdonez, Edward David MorenoCarneiro Neto, José Aprígio2024-07-05T19:21:46Z2024-07-05T19:21:46Z2023-04-20ALVES NETO, Antônio José. Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data. 2023. 108 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023.https://ri.ufs.br/jspui/handle/riufs/19478Currently, with the exponential advancement of technology, a large amount of data is generated daily. These data aren’t generated just by people. A range of electronic equipment has also become great generators. These large volumes of data are known as Big Data and produce valuable and helpful information for business intelligence, forecasting, and decision support, among other possibilities. However, processing this large volume of data requires a different computational approach from the traditional one, called High Performance Computing (HPC). Over the years, the HPC has been using supercomputers or computing clusters. The first one is no longer an option due to its high cost and difficulty to maintain, making clustering an ideal alternative. Clusters are loosely coupled systems, formed by a set of computers that work in collaboration with each other, using message exchange libraries. In addition, clusters formed by Single Board Computers (SBC) are a viable alternative for the development of research in this area. Among the SBCs, the Raspberry Pi stands out, a SBC initially developed to promote the teaching of computer science. Its variety of models allows it to meet several specific requirements and does not require large investments. To operate and to process this large volume of data in a cluster, it is necessary to have a big data platform, the Apache Hadoop being one of the most widely available today. Thus, a good solution to obtain a low-cost big data cluster is to combine the use the Raspberry Pi as the hardware structure and Apache Hadoop as Big Data platform. However, the lack of detailed material explaining all the installation steps, the configuration process, and, finally, the certification that the Hadoop cluster is working correctly is a problem little explored by the academic community. In addition, the monitoring of cluster resources is also a problem that is rarely addressed by the academy. In order to solve this problem, this work aims to develop and evaluate the performance of a low-cost big data cluster using Raspberry Pi as a low-cost hardware structure and Apache Hadoop as a Big Data platform. Its evaluation will be done using benchmarks widespread in the area (Terasort and TestDFSIO), in addition to accompanying and monitoring the use of its resources using the tools Zabbix and Grafana, providing a complete and detailed material of this entire process.Atualmente, com o exponencial avanço da tecnologia, uma grande quantidade dados é gerada diariamente. Dados esses que não são gerados apenas por pessoas. Uma gama de equipamentos eletrônicos também tornaram-se grandes geradores, dos quais esses grandes volumes de dados são conhecidos como Big Data e produzem informações valiosas e úteis para business intelligence, previsão, suporte à decisão, dentre outras possibilidades. Entretanto, o processamento desse grande volume de dados requer uma abordagem computacional diferente da tradicional, chamada de Computação de Alta Performance (High Perfomance Computing - HPC). Ao longo dos anos, a HPC vem sendo obtida graças à utilização de supercomputadores ou através de clusters computacionais. O primeiro deixou de ser uma opção pelo seu alto custo e difícil de manutenção, deixando a “clusterização” como a alternativa ideal. Os clusters são sistemas fracamente acoplados, formados por um conjunto de computadores que trabalham em colaboração uns com os outros, usando bibliotecas de troca de mensagens. Além disso, os clusters formados por Computadores de Placa Única (Single Board Computer - SBC) são uma alternativa viável para o desenvolvimento de pesquisas nessa área. Dentre os computadores de placa única, destaca se a Raspberry Pi, um SBC desenvolvido inicialmente para promover o ensino da ciência da computação. Sua variedade de modelos permite atender a diversas necessidades específicas e não requer grandes investimentos. Para operacionalização e processamento desse grande volume de dados em um cluster, faz-se necessário a instalação de uma plataforma de big data, sendo o Apache Hadoop uma das mais difundidas disponíveis atualmente. Desta forma, uma boa solução para se obter um cluster big data de baixo custo é utilizar a Raspberry Pi como estrutura de hardware e o Apache Hadoop como plataforma Big Data. No entanto, a falta de um material detalhado explicando todas as etapas da instalação, o processo de configuração e, por fim, a certificação de que o cluster Hadoop está funcionando corretamente é um problema pouco explorado pela comunidade acadêmica. Além disso, o monitoramento de recursos do cluster também é um problema que é pouco abordado pela academia. Partindo dessa problemática, este trabalho tem como objetivo, o desenvolvimento e avaliação de desempenho de um cluster big data de baixo custo utilizando Raspberry Pi, como estrutura de hardware de baixo custo e o Apache Hadoop como plataforma de Big Data. A avaliação do mesmo será feita utilizando benchmarks difundidos na área (Terasort e TestDFSIO), além de acompanhar e monitorar o uso dos seus recursos utilizando as ferramentas Zabbix e Grafana, provendo um material completo e detalhado de todo esse processo.São CristóvãoporPlataforma aberta da WebBenchmarking (administração)Big dataCluster (sistema de computador)Raspberry pi (computador)Zabbix (software)Apache HadoopBenchmarksGrafanaCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAODesenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big datainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisPós-Graduação em Ciência da ComputaçãoUniversidade Federal de Sergipe (UFS)reponame:Repositório Institucional da UFSinstname:Universidade Federal de Sergipe (UFS)instacron:UFSinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81475https://ri.ufs.br/jspui/bitstream/riufs/19478/1/license.txt098cbbf65c2c15e1fb2e49c5d306a44cMD51ORIGINALANTONIO_JOSE_ALVES_NETO.pdfANTONIO_JOSE_ALVES_NETO.pdfapplication/pdf23956005https://ri.ufs.br/jspui/bitstream/riufs/19478/2/ANTONIO_JOSE_ALVES_NETO.pdf05cb4d6bbbaa47ae595f8d199c41e310MD52riufs/194782024-07-05 16:21:52.048oai:oai:ri.ufs.br:repo_01:riufs/19478TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvcihlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSByZXByb2R1emlyIHNldSB0cmFiYWxobyBubyBmb3JtYXRvIGVsZXRyw7RuaWNvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFNlcmdpcGUgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIHNldSB0cmFiYWxobyBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgZGUgc2V1IHRyYWJhbGhvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIHNldSB0cmFiYWxobyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0bywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgbsOjbyBpbmZyaW5nZSBkaXJlaXRvcyBhdXRvcmFpcyBkZSBuaW5ndcOpbS4KCkNhc28gbyB0cmFiYWxobyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvLgoKQSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIHNlIGNvbXByb21ldGUgYSBpZGVudGlmaWNhciBjbGFyYW1lbnRlIG8gc2V1IG5vbWUocykgb3UgbyhzKSBub21lKHMpIGRvKHMpIApkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRvIHRyYWJhbGhvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuIAo=Repositório InstitucionalPUBhttps://ri.ufs.br/oai/requestrepositorio@academico.ufs.bropendoar:2024-07-05T19:21:52Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)false
dc.title.pt_BR.fl_str_mv	Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
title	Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
spellingShingle	Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data Alves Neto, Antônio José Plataforma aberta da Web Benchmarking (administração) Big data Cluster (sistema de computador) Raspberry pi (computador) Zabbix (software) Apache Hadoop Benchmarks Grafana CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
title_full	Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
title_fullStr	Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
title_full_unstemmed	Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
title_sort	Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data
author	Alves Neto, Antônio José
author_facet	Alves Neto, Antônio José
author_role	author
dc.contributor.author.fl_str_mv	Alves Neto, Antônio José
dc.contributor.advisor1.fl_str_mv	Ordonez, Edward David Moreno
dc.contributor.advisor-co1.fl_str_mv	Carneiro Neto, José Aprígio
contributor_str_mv	Ordonez, Edward David Moreno Carneiro Neto, José Aprígio
dc.subject.por.fl_str_mv	Plataforma aberta da Web Benchmarking (administração) Big data Cluster (sistema de computador) Raspberry pi (computador)
topic	Plataforma aberta da Web Benchmarking (administração) Big data Cluster (sistema de computador) Raspberry pi (computador) Zabbix (software) Apache Hadoop Benchmarks Grafana CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	Zabbix (software) Apache Hadoop Benchmarks Grafana
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	Currently, with the exponential advancement of technology, a large amount of data is generated daily. These data aren’t generated just by people. A range of electronic equipment has also become great generators. These large volumes of data are known as Big Data and produce valuable and helpful information for business intelligence, forecasting, and decision support, among other possibilities. However, processing this large volume of data requires a different computational approach from the traditional one, called High Performance Computing (HPC). Over the years, the HPC has been using supercomputers or computing clusters. The first one is no longer an option due to its high cost and difficulty to maintain, making clustering an ideal alternative. Clusters are loosely coupled systems, formed by a set of computers that work in collaboration with each other, using message exchange libraries. In addition, clusters formed by Single Board Computers (SBC) are a viable alternative for the development of research in this area. Among the SBCs, the Raspberry Pi stands out, a SBC initially developed to promote the teaching of computer science. Its variety of models allows it to meet several specific requirements and does not require large investments. To operate and to process this large volume of data in a cluster, it is necessary to have a big data platform, the Apache Hadoop being one of the most widely available today. Thus, a good solution to obtain a low-cost big data cluster is to combine the use the Raspberry Pi as the hardware structure and Apache Hadoop as Big Data platform. However, the lack of detailed material explaining all the installation steps, the configuration process, and, finally, the certification that the Hadoop cluster is working correctly is a problem little explored by the academic community. In addition, the monitoring of cluster resources is also a problem that is rarely addressed by the academy. In order to solve this problem, this work aims to develop and evaluate the performance of a low-cost big data cluster using Raspberry Pi as a low-cost hardware structure and Apache Hadoop as a Big Data platform. Its evaluation will be done using benchmarks widespread in the area (Terasort and TestDFSIO), in addition to accompanying and monitoring the use of its resources using the tools Zabbix and Grafana, providing a complete and detailed material of this entire process.
publishDate	2023
dc.date.issued.fl_str_mv	2023-04-20
dc.date.accessioned.fl_str_mv	2024-07-05T19:21:46Z
dc.date.available.fl_str_mv	2024-07-05T19:21:46Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	ALVES NETO, Antônio José. Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data. 2023. 108 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023.
dc.identifier.uri.fl_str_mv	https://ri.ufs.br/jspui/handle/riufs/19478
identifier_str_mv	ALVES NETO, Antônio José. Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data. 2023. 108 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023.
url	https://ri.ufs.br/jspui/handle/riufs/19478
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.program.fl_str_mv	Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv	Universidade Federal de Sergipe (UFS)
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFS instname:Universidade Federal de Sergipe (UFS) instacron:UFS
instname_str	Universidade Federal de Sergipe (UFS)
instacron_str	UFS
institution	UFS
reponame_str	Repositório Institucional da UFS
collection	Repositório Institucional da UFS
bitstream.url.fl_str_mv	https://ri.ufs.br/jspui/bitstream/riufs/19478/1/license.txt https://ri.ufs.br/jspui/bitstream/riufs/19478/2/ANTONIO_JOSE_ALVES_NETO.pdf
bitstream.checksum.fl_str_mv	098cbbf65c2c15e1fb2e49c5d306a44c 05cb4d6bbbaa47ae595f8d199c41e310
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)
repository.mail.fl_str_mv	repositorio@academico.ufs.br
_version_	1813824999485603840

Desenvolvimento e avaliação de desempenho de um cluster raspberry pi e apache hadoop em aplicações big data

Registros relacionados