Hive on spark and MapReduce : a methodology for parameter tuning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/52854 |
Resumo: | Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies Management |
id |
RCAP_ca2dad9351d49acf1ad2b7b94630f2a4 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/52854 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Hive on spark and MapReduce : a methodology for parameter tuningTuningHive on SparkMapReduceApache SparkBig DataHDFSHadoopData WarehouseProject Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementAs the era of “big data” has arrived, more and more companies start using distributed file systems to manage and process their data streams like the Hadoop distributed file system framework (HDFS). This software library offers a way to store large files across multiple machines. Large data sets are processed by using its inherent programming model MapReduce. Apache Spark is a relatively new alternative to Hadoop MapReduce and claims to offer a performance boost up to 10 times for certain applications, while maintaining its automatic fault tolerance. To leverage the Data Warehouse capabilities of Hadoop Apache Hive was introduced. It is a concept for Big Data analytics that works on top of Hadoop and provides data analysis tools and most importantly translates queries to MapReduce and Spark jobs. Therefore, it exploits the scalability of Hadoop and offers data exploration and mining capabilities to non-developers. However, it is difficult for users to utilize the full potential of the Apache Spark execution engine. This results in very long execution times. Therefore, this project work gives researches and companies a tuning methodology that significantly can improve the execution time of queries. As a result, this tuning methodology could optimize a real-world batch-processing query by 5 times. Moreover, it gives insides in the underlying reasons of this big improvement by using Apache Spark Monitoring tools. The result can be helpful for many practitioners and researchers that would like to optimise the performance of Spark and MapReduce queries executed in Hive on top of an Apache Hadoop cluster.Santos, Vitor Manuel Pereira Duarte dosRUNForster, Rodrigo Richard2018-11-26T14:59:01Z2018-10-292018-10-29T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/52854TID:202028755enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-22T17:35:44Zoai:run.unl.pt:10362/52854Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-22T17:35:44Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Hive on spark and MapReduce : a methodology for parameter tuning |
title |
Hive on spark and MapReduce : a methodology for parameter tuning |
spellingShingle |
Hive on spark and MapReduce : a methodology for parameter tuning Forster, Rodrigo Richard Tuning Hive on Spark MapReduce Apache Spark Big Data HDFS Hadoop Data Warehouse |
title_short |
Hive on spark and MapReduce : a methodology for parameter tuning |
title_full |
Hive on spark and MapReduce : a methodology for parameter tuning |
title_fullStr |
Hive on spark and MapReduce : a methodology for parameter tuning |
title_full_unstemmed |
Hive on spark and MapReduce : a methodology for parameter tuning |
title_sort |
Hive on spark and MapReduce : a methodology for parameter tuning |
author |
Forster, Rodrigo Richard |
author_facet |
Forster, Rodrigo Richard |
author_role |
author |
dc.contributor.none.fl_str_mv |
Santos, Vitor Manuel Pereira Duarte dos RUN |
dc.contributor.author.fl_str_mv |
Forster, Rodrigo Richard |
dc.subject.por.fl_str_mv |
Tuning Hive on Spark MapReduce Apache Spark Big Data HDFS Hadoop Data Warehouse |
topic |
Tuning Hive on Spark MapReduce Apache Spark Big Data HDFS Hadoop Data Warehouse |
description |
Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies Management |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-11-26T14:59:01Z 2018-10-29 2018-10-29T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/52854 TID:202028755 |
url |
http://hdl.handle.net/10362/52854 |
identifier_str_mv |
TID:202028755 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817545662945820672 |