Hive on spark and MapReduce : a methodology for parameter tuning

Forster, Rodrigo Richard

Hive on spark and MapReduce : a methodology for parameter tuning

Detalhes bibliográficos
Autor(a) principal:	Forster, Rodrigo Richard
Data de Publicação:	2018
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10362/52854
Resumo:	Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies Management

Metadados do item

id	RCAP_ca2dad9351d49acf1ad2b7b94630f2a4
oai_identifier_str	oai:run.unl.pt:10362/52854
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Hive on spark and MapReduce : a methodology for parameter tuningTuningHive on SparkMapReduceApache SparkBig DataHDFSHadoopData WarehouseProject Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementAs the era of “big data” has arrived, more and more companies start using distributed file systems to manage and process their data streams like the Hadoop distributed file system framework (HDFS). This software library offers a way to store large files across multiple machines. Large data sets are processed by using its inherent programming model MapReduce. Apache Spark is a relatively new alternative to Hadoop MapReduce and claims to offer a performance boost up to 10 times for certain applications, while maintaining its automatic fault tolerance. To leverage the Data Warehouse capabilities of Hadoop Apache Hive was introduced. It is a concept for Big Data analytics that works on top of Hadoop and provides data analysis tools and most importantly translates queries to MapReduce and Spark jobs. Therefore, it exploits the scalability of Hadoop and offers data exploration and mining capabilities to non-developers. However, it is difficult for users to utilize the full potential of the Apache Spark execution engine. This results in very long execution times. Therefore, this project work gives researches and companies a tuning methodology that significantly can improve the execution time of queries. As a result, this tuning methodology could optimize a real-world batch-processing query by 5 times. Moreover, it gives insides in the underlying reasons of this big improvement by using Apache Spark Monitoring tools. The result can be helpful for many practitioners and researchers that would like to optimise the performance of Spark and MapReduce queries executed in Hive on top of an Apache Hadoop cluster.Santos, Vitor Manuel Pereira Duarte dosRUNForster, Rodrigo Richard2018-11-26T14:59:01Z2018-10-292018-10-29T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/52854TID:202028755enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-22T17:35:44Zoai:run.unl.pt:10362/52854Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-22T17:35:44Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Hive on spark and MapReduce : a methodology for parameter tuning
title	Hive on spark and MapReduce : a methodology for parameter tuning
spellingShingle	Hive on spark and MapReduce : a methodology for parameter tuning Forster, Rodrigo Richard Tuning Hive on Spark MapReduce Apache Spark Big Data HDFS Hadoop Data Warehouse
title_short	Hive on spark and MapReduce : a methodology for parameter tuning
title_full	Hive on spark and MapReduce : a methodology for parameter tuning
title_fullStr	Hive on spark and MapReduce : a methodology for parameter tuning
title_full_unstemmed	Hive on spark and MapReduce : a methodology for parameter tuning
title_sort	Hive on spark and MapReduce : a methodology for parameter tuning
author	Forster, Rodrigo Richard
author_facet	Forster, Rodrigo Richard
author_role	author
dc.contributor.none.fl_str_mv	Santos, Vitor Manuel Pereira Duarte dos RUN
dc.contributor.author.fl_str_mv	Forster, Rodrigo Richard
dc.subject.por.fl_str_mv	Tuning Hive on Spark MapReduce Apache Spark Big Data HDFS Hadoop Data Warehouse
topic	Tuning Hive on Spark MapReduce Apache Spark Big Data HDFS Hadoop Data Warehouse
description	Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies Management
publishDate	2018
dc.date.none.fl_str_mv	2018-11-26T14:59:01Z 2018-10-29 2018-10-29T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10362/52854 TID:202028755
url	http://hdl.handle.net/10362/52854
identifier_str_mv	TID:202028755
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv	mluisa.alvim@gmail.com
_version_	1817545662945820672

Hive on spark and MapReduce : a methodology for parameter tuning

Registros relacionados