Hadoop cluster deployment: A methodological approach

Correia, Ronaldo Celso Messias [UNESP]; Spadon, Gabriel; Gomes, Pedro Henrique De Andrade [UNESP]; Eler, Danilo Medeiros [UNESP]; Garcia, Rogério Eduardo [UNESP]; Junior, Celso Olivete [UNESP]

Hadoop cluster deployment: A methodological approach

Detalhes bibliográficos
Autor(a) principal:	Correia, Ronaldo Celso Messias [UNESP]
Data de Publicação:	2018
Outros Autores:	Spadon, Gabriel, Gomes, Pedro Henrique De Andrade [UNESP], Eler, Danilo Medeiros [UNESP], Garcia, Rogério Eduardo [UNESP], Junior, Celso Olivete [UNESP]
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://dx.doi.org/10.3390/info9060131 http://hdl.handle.net/11449/179948
Resumo:	For a long time, data has been treated as a general problem because it just represents fractions of an event without any relevant purpose. However, the last decade has been just about information and how to get it. Seeking meaning in data and trying to solve scalability problems, many frameworks have been developed to improve data storage and its analysis. As a framework, Hadoop was presented as a powerful tool to deal with large amounts of data. However, it still causes doubts about how to deal with its deployment and if there is any reliable method to compare the performance of distinct Hadoop clusters. This paper presents a methodology based on benchmark analysis to guide the Hadoop cluster deployment. The experiments employed The Apache Hadoop and the Hadoop distributions of Cloudera, Hortonworks, and MapR, analyzing the architectures on local and on clouding-using centralized and geographically distributed servers. The results show the methodology can be dynamically applied on a reliable comparison among different architectures. Additionally, the study suggests that the knowledge acquired can be used to improve the data analysis process by understanding the Hadoop architecture.

Metadados do item

id	UNSP_171fdc1a9be6eb565491a3aafc683ea2
oai_identifier_str	oai:repositorio.unesp.br:11449/179948
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	Hadoop cluster deployment: A methodological approachBenchmark methodologyBig DataComputational modelsHadoopFor a long time, data has been treated as a general problem because it just represents fractions of an event without any relevant purpose. However, the last decade has been just about information and how to get it. Seeking meaning in data and trying to solve scalability problems, many frameworks have been developed to improve data storage and its analysis. As a framework, Hadoop was presented as a powerful tool to deal with large amounts of data. However, it still causes doubts about how to deal with its deployment and if there is any reliable method to compare the performance of distinct Hadoop clusters. This paper presents a methodology based on benchmark analysis to guide the Hadoop cluster deployment. The experiments employed The Apache Hadoop and the Hadoop distributions of Cloudera, Hortonworks, and MapR, analyzing the architectures on local and on clouding-using centralized and geographically distributed servers. The results show the methodology can be dynamically applied on a reliable comparison among different architectures. Additionally, the study suggests that the knowledge acquired can be used to improve the data analysis process by understanding the Hadoop architecture.Departamento de Matematica e Computação Sao Paulo State University-UNESPInstituto de Ciencias Matematicas e Computacao University of Sao Paulo-USPDepartamento de Matematica e Computação Sao Paulo State University-UNESPUniversidade Estadual Paulista (Unesp)Universidade de São Paulo (USP)Correia, Ronaldo Celso Messias [UNESP]Spadon, GabrielGomes, Pedro Henrique De Andrade [UNESP]Eler, Danilo Medeiros [UNESP]Garcia, Rogério Eduardo [UNESP]Junior, Celso Olivete [UNESP]2018-12-11T17:37:24Z2018-12-11T17:37:24Z2018-05-29info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://dx.doi.org/10.3390/info9060131Information (Switzerland), v. 9, n. 6, 2018.2078-2489http://hdl.handle.net/11449/17994810.3390/info90601312-s2.0-850484533752-s2.0-85048453375.pdf803101257325936126161351759726290000-0003-1248-528XScopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengInformation (Switzerland)0,222info:eu-repo/semantics/openAccess2023-11-21T06:11:52Zoai:repositorio.unesp.br:11449/179948Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462023-11-21T06:11:52Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	Hadoop cluster deployment: A methodological approach
title	Hadoop cluster deployment: A methodological approach
spellingShingle	Hadoop cluster deployment: A methodological approach Correia, Ronaldo Celso Messias [UNESP] Benchmark methodology Big Data Computational models Hadoop
title_short	Hadoop cluster deployment: A methodological approach
title_full	Hadoop cluster deployment: A methodological approach
title_fullStr	Hadoop cluster deployment: A methodological approach
title_full_unstemmed	Hadoop cluster deployment: A methodological approach
title_sort	Hadoop cluster deployment: A methodological approach
author	Correia, Ronaldo Celso Messias [UNESP]
author_facet	Correia, Ronaldo Celso Messias [UNESP] Spadon, Gabriel Gomes, Pedro Henrique De Andrade [UNESP] Eler, Danilo Medeiros [UNESP] Garcia, Rogério Eduardo [UNESP] Junior, Celso Olivete [UNESP]
author_role	author
author2	Spadon, Gabriel Gomes, Pedro Henrique De Andrade [UNESP] Eler, Danilo Medeiros [UNESP] Garcia, Rogério Eduardo [UNESP] Junior, Celso Olivete [UNESP]
author2_role	author author author author author
dc.contributor.none.fl_str_mv	Universidade Estadual Paulista (Unesp) Universidade de São Paulo (USP)
dc.contributor.author.fl_str_mv	Correia, Ronaldo Celso Messias [UNESP] Spadon, Gabriel Gomes, Pedro Henrique De Andrade [UNESP] Eler, Danilo Medeiros [UNESP] Garcia, Rogério Eduardo [UNESP] Junior, Celso Olivete [UNESP]
dc.subject.por.fl_str_mv	Benchmark methodology Big Data Computational models Hadoop
topic	Benchmark methodology Big Data Computational models Hadoop
description	For a long time, data has been treated as a general problem because it just represents fractions of an event without any relevant purpose. However, the last decade has been just about information and how to get it. Seeking meaning in data and trying to solve scalability problems, many frameworks have been developed to improve data storage and its analysis. As a framework, Hadoop was presented as a powerful tool to deal with large amounts of data. However, it still causes doubts about how to deal with its deployment and if there is any reliable method to compare the performance of distinct Hadoop clusters. This paper presents a methodology based on benchmark analysis to guide the Hadoop cluster deployment. The experiments employed The Apache Hadoop and the Hadoop distributions of Cloudera, Hortonworks, and MapR, analyzing the architectures on local and on clouding-using centralized and geographically distributed servers. The results show the methodology can be dynamically applied on a reliable comparison among different architectures. Additionally, the study suggests that the knowledge acquired can be used to improve the data analysis process by understanding the Hadoop architecture.
publishDate	2018
dc.date.none.fl_str_mv	2018-12-11T17:37:24Z 2018-12-11T17:37:24Z 2018-05-29
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://dx.doi.org/10.3390/info9060131 Information (Switzerland), v. 9, n. 6, 2018. 2078-2489 http://hdl.handle.net/11449/179948 10.3390/info9060131 2-s2.0-85048453375 2-s2.0-85048453375.pdf 8031012573259361 2616135175972629 0000-0003-1248-528X
url	http://dx.doi.org/10.3390/info9060131 http://hdl.handle.net/11449/179948
identifier_str_mv	Information (Switzerland), v. 9, n. 6, 2018. 2078-2489 10.3390/info9060131 2-s2.0-85048453375 2-s2.0-85048453375.pdf 8031012573259361 2616135175972629 0000-0003-1248-528X
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Information (Switzerland) 0,222
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1792961881595445248

Hadoop cluster deployment: A methodological approach

Registros relacionados