Hadoop cluster deployment: A methodological approach
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.3390/info9060131 http://hdl.handle.net/11449/179948 |
Resumo: | For a long time, data has been treated as a general problem because it just represents fractions of an event without any relevant purpose. However, the last decade has been just about information and how to get it. Seeking meaning in data and trying to solve scalability problems, many frameworks have been developed to improve data storage and its analysis. As a framework, Hadoop was presented as a powerful tool to deal with large amounts of data. However, it still causes doubts about how to deal with its deployment and if there is any reliable method to compare the performance of distinct Hadoop clusters. This paper presents a methodology based on benchmark analysis to guide the Hadoop cluster deployment. The experiments employed The Apache Hadoop and the Hadoop distributions of Cloudera, Hortonworks, and MapR, analyzing the architectures on local and on clouding-using centralized and geographically distributed servers. The results show the methodology can be dynamically applied on a reliable comparison among different architectures. Additionally, the study suggests that the knowledge acquired can be used to improve the data analysis process by understanding the Hadoop architecture. |
id |
UNSP_171fdc1a9be6eb565491a3aafc683ea2 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/179948 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Hadoop cluster deployment: A methodological approachBenchmark methodologyBig DataComputational modelsHadoopFor a long time, data has been treated as a general problem because it just represents fractions of an event without any relevant purpose. However, the last decade has been just about information and how to get it. Seeking meaning in data and trying to solve scalability problems, many frameworks have been developed to improve data storage and its analysis. As a framework, Hadoop was presented as a powerful tool to deal with large amounts of data. However, it still causes doubts about how to deal with its deployment and if there is any reliable method to compare the performance of distinct Hadoop clusters. This paper presents a methodology based on benchmark analysis to guide the Hadoop cluster deployment. The experiments employed The Apache Hadoop and the Hadoop distributions of Cloudera, Hortonworks, and MapR, analyzing the architectures on local and on clouding-using centralized and geographically distributed servers. The results show the methodology can be dynamically applied on a reliable comparison among different architectures. Additionally, the study suggests that the knowledge acquired can be used to improve the data analysis process by understanding the Hadoop architecture.Departamento de Matematica e Computação Sao Paulo State University-UNESPInstituto de Ciencias Matematicas e Computacao University of Sao Paulo-USPDepartamento de Matematica e Computação Sao Paulo State University-UNESPUniversidade Estadual Paulista (Unesp)Universidade de São Paulo (USP)Correia, Ronaldo Celso Messias [UNESP]Spadon, GabrielGomes, Pedro Henrique De Andrade [UNESP]Eler, Danilo Medeiros [UNESP]Garcia, Rogério Eduardo [UNESP]Junior, Celso Olivete [UNESP]2018-12-11T17:37:24Z2018-12-11T17:37:24Z2018-05-29info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://dx.doi.org/10.3390/info9060131Information (Switzerland), v. 9, n. 6, 2018.2078-2489http://hdl.handle.net/11449/17994810.3390/info90601312-s2.0-850484533752-s2.0-85048453375.pdf803101257325936126161351759726290000-0003-1248-528XScopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengInformation (Switzerland)0,222info:eu-repo/semantics/openAccess2023-11-21T06:11:52Zoai:repositorio.unesp.br:11449/179948Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462023-11-21T06:11:52Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Hadoop cluster deployment: A methodological approach |
title |
Hadoop cluster deployment: A methodological approach |
spellingShingle |
Hadoop cluster deployment: A methodological approach Correia, Ronaldo Celso Messias [UNESP] Benchmark methodology Big Data Computational models Hadoop |
title_short |
Hadoop cluster deployment: A methodological approach |
title_full |
Hadoop cluster deployment: A methodological approach |
title_fullStr |
Hadoop cluster deployment: A methodological approach |
title_full_unstemmed |
Hadoop cluster deployment: A methodological approach |
title_sort |
Hadoop cluster deployment: A methodological approach |
author |
Correia, Ronaldo Celso Messias [UNESP] |
author_facet |
Correia, Ronaldo Celso Messias [UNESP] Spadon, Gabriel Gomes, Pedro Henrique De Andrade [UNESP] Eler, Danilo Medeiros [UNESP] Garcia, Rogério Eduardo [UNESP] Junior, Celso Olivete [UNESP] |
author_role |
author |
author2 |
Spadon, Gabriel Gomes, Pedro Henrique De Andrade [UNESP] Eler, Danilo Medeiros [UNESP] Garcia, Rogério Eduardo [UNESP] Junior, Celso Olivete [UNESP] |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (Unesp) Universidade de São Paulo (USP) |
dc.contributor.author.fl_str_mv |
Correia, Ronaldo Celso Messias [UNESP] Spadon, Gabriel Gomes, Pedro Henrique De Andrade [UNESP] Eler, Danilo Medeiros [UNESP] Garcia, Rogério Eduardo [UNESP] Junior, Celso Olivete [UNESP] |
dc.subject.por.fl_str_mv |
Benchmark methodology Big Data Computational models Hadoop |
topic |
Benchmark methodology Big Data Computational models Hadoop |
description |
For a long time, data has been treated as a general problem because it just represents fractions of an event without any relevant purpose. However, the last decade has been just about information and how to get it. Seeking meaning in data and trying to solve scalability problems, many frameworks have been developed to improve data storage and its analysis. As a framework, Hadoop was presented as a powerful tool to deal with large amounts of data. However, it still causes doubts about how to deal with its deployment and if there is any reliable method to compare the performance of distinct Hadoop clusters. This paper presents a methodology based on benchmark analysis to guide the Hadoop cluster deployment. The experiments employed The Apache Hadoop and the Hadoop distributions of Cloudera, Hortonworks, and MapR, analyzing the architectures on local and on clouding-using centralized and geographically distributed servers. The results show the methodology can be dynamically applied on a reliable comparison among different architectures. Additionally, the study suggests that the knowledge acquired can be used to improve the data analysis process by understanding the Hadoop architecture. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-12-11T17:37:24Z 2018-12-11T17:37:24Z 2018-05-29 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.3390/info9060131 Information (Switzerland), v. 9, n. 6, 2018. 2078-2489 http://hdl.handle.net/11449/179948 10.3390/info9060131 2-s2.0-85048453375 2-s2.0-85048453375.pdf 8031012573259361 2616135175972629 0000-0003-1248-528X |
url |
http://dx.doi.org/10.3390/info9060131 http://hdl.handle.net/11449/179948 |
identifier_str_mv |
Information (Switzerland), v. 9, n. 6, 2018. 2078-2489 10.3390/info9060131 2-s2.0-85048453375 2-s2.0-85048453375.pdf 8031012573259361 2616135175972629 0000-0003-1248-528X |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Information (Switzerland) 0,222 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1792961881595445248 |