Escalonamento adaptativo para o Apache Hadoop

Cassales, Guilherme Weigert

Escalonamento adaptativo para o Apache Hadoop

Detalhes bibliográficos
Autor(a) principal:	Cassales, Guilherme Weigert
Data de Publicação:	2016
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Manancial - Repositório Digital da UFSM
dARK ID:	ark:/26339/001300000xwwc
Texto Completo:	http://repositorio.ufsm.br/handle/1/12025
Resumo:	Many alternatives have been employed in order to process all the data generated by current applications in a timely manner. One of these alternatives, the Apache Hadoop, combines parallel and distributed processing with the MapReduce paradigm in order to provide an environment that is able to process a huge data volume using a simple programming model. However, Apache Hadoop has been designed for dedicated and homogeneous clusters, a limitation that creates challenges for those who wish to use the framework in other circumstances. Often, acquiring a dedicated cluster can be impracticable due to the cost, and the acquisition of reposition parts can be a threat to the homogeneity of a cluster. In these cases, an option commonly used by the companies is the usage of idle computing resources in their network, however the original distribution of Hadoop would show serious performance issues in these conditions. Thus, this study was aimed to improve Hadoop’s capacity of adapting to pervasive and shared environments, where the availability of resources will undergo variations during the execution. Therefore, context-awareness techniques were used in order to collect information about the available capacity in each worker node and distributed communication techniques were used to update this information on scheduler. The joint usage of both techniques aimed at minimizing and/or eliminating the overload that would happen on shared nodes, resulting in an improvement of up to 50% on performance in a shared cluster, when compared to the original distribution, and indicated that a simple solution can positively impact the scheduling, increasing the variety of environments where the use of Hadoop is possible.

Metadados do item

id	UFSM_51898fd7de5c37a6731c3f9ce200be56
oai_identifier_str	oai:repositorio.ufsm.br:1/12025
network_acronym_str	UFSM
network_name_str	Manancial - Repositório Digital da UFSM
repository_id_str
spelling	Escalonamento adaptativo para o Apache HadoopAdaptative scheduling for Apache HadoopApache HadoopEscalonamentoSensibilidade ao contextoSchedulingContext-awareCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOMany alternatives have been employed in order to process all the data generated by current applications in a timely manner. One of these alternatives, the Apache Hadoop, combines parallel and distributed processing with the MapReduce paradigm in order to provide an environment that is able to process a huge data volume using a simple programming model. However, Apache Hadoop has been designed for dedicated and homogeneous clusters, a limitation that creates challenges for those who wish to use the framework in other circumstances. Often, acquiring a dedicated cluster can be impracticable due to the cost, and the acquisition of reposition parts can be a threat to the homogeneity of a cluster. In these cases, an option commonly used by the companies is the usage of idle computing resources in their network, however the original distribution of Hadoop would show serious performance issues in these conditions. Thus, this study was aimed to improve Hadoop’s capacity of adapting to pervasive and shared environments, where the availability of resources will undergo variations during the execution. Therefore, context-awareness techniques were used in order to collect information about the available capacity in each worker node and distributed communication techniques were used to update this information on scheduler. The joint usage of both techniques aimed at minimizing and/or eliminating the overload that would happen on shared nodes, resulting in an improvement of up to 50% on performance in a shared cluster, when compared to the original distribution, and indicated that a simple solution can positively impact the scheduling, increasing the variety of environments where the use of Hadoop is possible.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESDiversas alternativas têm sido empregadas para o processamento, em tempo hábil, da grande quantidade de dados que é gerada pelas aplicações atuais. Uma destas alternativas, o Apache Hadoop, combina processamento paralelo e distribuído com o paradigma MapReduce para fornecer um ambiente capaz de processar um grande volume de informações através de um modelo de programação simplificada. No entanto, o Apache Hadoop foi projetado para utilização em clusters dedicados e homogêneos, uma limitação que gera desafios para aqueles que desejam utilizá-lo sob outras circunstâncias. Muitas vezes um cluster dedicado pode ser inviável pelo custo de aquisição e a homogeneidade pode ser ameaçada devido à dificuldade de adquirir peças de reposição. Em muitos desses casos, uma opção encontrada pelas empresas é a utilização dos recursos computacionais ociosos em sua rede, porém a distribuição original do Hadoop apresentaria sérios problemas de desempenho nestas condições. Sendo assim, este estudo propôs melhorar a capacidade do Hadoop em adaptar-se a ambientes, pervasivos e compartilhados, onde a disponibilidade de recursos sofrerá variações no decorrer da execução. Para tanto, utilizaram-se técnicas de sensibilidade ao contexto para coletar informações sobre a capacidade disponível nos nós trabalhadores e técnicas de comunicação distribuída para atualizar estas informações no escalonador. A utilização conjunta dessas técnicas teve como objetivo a minimização e/ou eliminação da sobrecarga que seria causada em nós com compartilhamento, resultando em uma melhora de até 50% no desempenho em um cluster compartilhado, quando comparado com a distribuição original, e indicou que uma solução simples pode impactar positivamente o escalonamento, aumentando a variedade de ambientes onde a utilização do Hadoop é possível.Universidade Federal de Santa MariaBrasilCiência da ComputaçãoUFSMPrograma de Pós-Graduação em InformáticaCentro de TecnologiaCharao, Andrea Schwertnerhttp://lattes.cnpq.br/8251676116103188Stein, Benhur de Oliveirahttp://lattes.cnpq.br/4640320476003795Senger, Hermeshttp://lattes.cnpq.br/3691742159298316Cassales, Guilherme Weigert2017-11-13T11:43:37Z2017-11-13T11:43:37Z2016-03-11info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://repositorio.ufsm.br/handle/1/12025ark:/26339/001300000xwwcporAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessreponame:Manancial - Repositório Digital da UFSMinstname:Universidade Federal de Santa Maria (UFSM)instacron:UFSM2021-04-01T11:17:54Zoai:repositorio.ufsm.br:1/12025Biblioteca Digital de Teses e Dissertaçõeshttps://repositorio.ufsm.br/ONGhttps://repositorio.ufsm.br/oai/requestatendimento.sib@ufsm.br\|\|tedebc@gmail.comopendoar:2021-04-01T11:17:54Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)false
dc.title.none.fl_str_mv	Escalonamento adaptativo para o Apache Hadoop Adaptative scheduling for Apache Hadoop
title	Escalonamento adaptativo para o Apache Hadoop
spellingShingle	Escalonamento adaptativo para o Apache Hadoop Cassales, Guilherme Weigert Apache Hadoop Escalonamento Sensibilidade ao contexto Scheduling Context-aware CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Escalonamento adaptativo para o Apache Hadoop
title_full	Escalonamento adaptativo para o Apache Hadoop
title_fullStr	Escalonamento adaptativo para o Apache Hadoop
title_full_unstemmed	Escalonamento adaptativo para o Apache Hadoop
title_sort	Escalonamento adaptativo para o Apache Hadoop
author	Cassales, Guilherme Weigert
author_facet	Cassales, Guilherme Weigert
author_role	author
dc.contributor.none.fl_str_mv	Charao, Andrea Schwertner http://lattes.cnpq.br/8251676116103188 Stein, Benhur de Oliveira http://lattes.cnpq.br/4640320476003795 Senger, Hermes http://lattes.cnpq.br/3691742159298316
dc.contributor.author.fl_str_mv	Cassales, Guilherme Weigert
dc.subject.por.fl_str_mv	Apache Hadoop Escalonamento Sensibilidade ao contexto Scheduling Context-aware CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
topic	Apache Hadoop Escalonamento Sensibilidade ao contexto Scheduling Context-aware CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	Many alternatives have been employed in order to process all the data generated by current applications in a timely manner. One of these alternatives, the Apache Hadoop, combines parallel and distributed processing with the MapReduce paradigm in order to provide an environment that is able to process a huge data volume using a simple programming model. However, Apache Hadoop has been designed for dedicated and homogeneous clusters, a limitation that creates challenges for those who wish to use the framework in other circumstances. Often, acquiring a dedicated cluster can be impracticable due to the cost, and the acquisition of reposition parts can be a threat to the homogeneity of a cluster. In these cases, an option commonly used by the companies is the usage of idle computing resources in their network, however the original distribution of Hadoop would show serious performance issues in these conditions. Thus, this study was aimed to improve Hadoop’s capacity of adapting to pervasive and shared environments, where the availability of resources will undergo variations during the execution. Therefore, context-awareness techniques were used in order to collect information about the available capacity in each worker node and distributed communication techniques were used to update this information on scheduler. The joint usage of both techniques aimed at minimizing and/or eliminating the overload that would happen on shared nodes, resulting in an improvement of up to 50% on performance in a shared cluster, when compared to the original distribution, and indicated that a simple solution can positively impact the scheduling, increasing the variety of environments where the use of Hadoop is possible.
publishDate	2016
dc.date.none.fl_str_mv	2016-03-11 2017-11-13T11:43:37Z 2017-11-13T11:43:37Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://repositorio.ufsm.br/handle/1/12025
dc.identifier.dark.fl_str_mv	ark:/26339/001300000xwwc
url	http://repositorio.ufsm.br/handle/1/12025
identifier_str_mv	ark:/26339/001300000xwwc
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Santa Maria Brasil Ciência da Computação UFSM Programa de Pós-Graduação em Informática Centro de Tecnologia
publisher.none.fl_str_mv	Universidade Federal de Santa Maria Brasil Ciência da Computação UFSM Programa de Pós-Graduação em Informática Centro de Tecnologia
dc.source.none.fl_str_mv	reponame:Manancial - Repositório Digital da UFSM instname:Universidade Federal de Santa Maria (UFSM) instacron:UFSM
instname_str	Universidade Federal de Santa Maria (UFSM)
instacron_str	UFSM
institution	UFSM
reponame_str	Manancial - Repositório Digital da UFSM
collection	Manancial - Repositório Digital da UFSM
repository.name.fl_str_mv	Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)
repository.mail.fl_str_mv	atendimento.sib@ufsm.br\|\|tedebc@gmail.com
_version_	1815172415339102208

Escalonamento adaptativo para o Apache Hadoop

Registros relacionados