Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop

Bôaventura, Ricardo Soares

Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop

Detalhes bibliográficos
Autor(a) principal:	Bôaventura, Ricardo Soares
Data de Publicação:	2015
Tipo de documento:	Tese
Idioma:	por
Título da fonte:	Repositório Institucional da UFU
Texto Completo:	https://repositorio.ufu.br/handle/123456789/14350 https://doi.org/10.14393/ufu.te.2015.37
Resumo:	Cloud computing emerges as a new dominant paradigm in distributed systems, with a model that allows users to access, over demand, to a shared pool of computing configurable resources, such as networks, servers, storage, applications and services. These resources can be rapidly provided with minimal management effort or interaction from a supplier. In cloud computing, the infrastructure can be made available as a service through virtualization using hypervisors. Virtualization is a mechanism that presents the hardware and system resources of a given operating system. This technology is used in environments clouds through a large set of server using virtual machine monitors that are located between the hardware and the operating system. However, there is a wide spread of hypervisors, each with its own advantages and disadvantages. The specific characteristics of each virtual machine generates different performances. The aim of this work is to propose a methodology that seeks to discover how, when and as the increased performance of the algorithms in virtual environments is determined by the environment configuration and how the configuration parameters can influence each other, and finally, discover using statistical methods which settings of virtual environment achieve the best results on average. The tested algorithms (sudoku, pi, wordcount, testDFSIO read and write testDFSIO) belong to the benchmark Apache Hadoop. These experiments were planned and executed based on the experimental design theory. The experimental design is a pre-established set of tests using scientific and statistical criteria mainly, in order to determine the influence of various factors on the results (metric) of a system or process, identifying and observing the reasons that led to change in the expected value. The planning that was used is factorial planning 34, where each factor (core, memory, operating system and virtual machine) were varied in three levels. Tested operating systems were Ubuntu 14.04 64bit, CentOS 7.0 64bit and Windows 8.0 64bit; and virtual machines were tested KVM, Xen and VMware. Data were collected and analyzed using analysis of variance. The results show that the major analyzed factors changes the algorithm performance , but they can not be analyzed separately because there are also significant interactions belonging to these factors . At a 5% significance level, analysis of variance showed that the core interactions: memory, memory with OS, memory with VM and OS with VM, all these factors impact the runtime of the analyzed algorithms. According to the statistical method mean comparison was possible then make a comparison between the mean times of significant interaction between OS and VM, and based on results has been applied an adaptation of Pareto dominance theory called Pareto dominance. Also, with 5% significance level was possible to discover Pareto\'s borders. Considering the runtime algorithm, the Pareto Dominance introduced the virtual environment Xen with CentOS in the first border as the virtual environment that on average achieved the best performance for the analyzed computational algorithms. Virtual environments that occupied the second border were the environments Xen with Ubuntu and VMware with CentOS, ie they had on average lower times the first border and between them they were considered equivalent. The environments belonging to third border were KVM with Ubuntu, VMware with VMware and Ubuntu with Windows. The environments belonging to fourth border were Xen with Windows, KVM with CentOS and the environment that got on average lower than the other times was the KVM with Windows. It can be concluded that virtual machine Xen and CentOS operating system on average got the best performance. But if the user wants to use the Ubuntu operating system it is advisable to install it in Xen virtual machine. And if you want to use the Windows operating system recommends be installed on the VMware virtual machine.

Metadados do item

id	UFU_47b3cd4e2e09cb54948d7f95f004ac6a
oai_identifier_str	oai:repositorio.ufu.br:123456789/14350
network_acronym_str	UFU
network_name_str	Repositório Institucional da UFU
repository_id_str
spelling	Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark HadoopVirtualizaçãoComputação em nuvemNuvem privadaPlanejamento experimentalExperimentos com algoritmosDominância de paretoAnálise de variânciaAlgoritmos de computadorVirtualizationCloud computingPrivate cloudExperimental planningExperiments with algorithmsPareto dominanceAnalysis of varianceCNPQ::ENGENHARIAS::ENGENHARIA ELETRICACloud computing emerges as a new dominant paradigm in distributed systems, with a model that allows users to access, over demand, to a shared pool of computing configurable resources, such as networks, servers, storage, applications and services. These resources can be rapidly provided with minimal management effort or interaction from a supplier. In cloud computing, the infrastructure can be made available as a service through virtualization using hypervisors. Virtualization is a mechanism that presents the hardware and system resources of a given operating system. This technology is used in environments clouds through a large set of server using virtual machine monitors that are located between the hardware and the operating system. However, there is a wide spread of hypervisors, each with its own advantages and disadvantages. The specific characteristics of each virtual machine generates different performances. The aim of this work is to propose a methodology that seeks to discover how, when and as the increased performance of the algorithms in virtual environments is determined by the environment configuration and how the configuration parameters can influence each other, and finally, discover using statistical methods which settings of virtual environment achieve the best results on average. The tested algorithms (sudoku, pi, wordcount, testDFSIO read and write testDFSIO) belong to the benchmark Apache Hadoop. These experiments were planned and executed based on the experimental design theory. The experimental design is a pre-established set of tests using scientific and statistical criteria mainly, in order to determine the influence of various factors on the results (metric) of a system or process, identifying and observing the reasons that led to change in the expected value. The planning that was used is factorial planning 34, where each factor (core, memory, operating system and virtual machine) were varied in three levels. Tested operating systems were Ubuntu 14.04 64bit, CentOS 7.0 64bit and Windows 8.0 64bit; and virtual machines were tested KVM, Xen and VMware. Data were collected and analyzed using analysis of variance. The results show that the major analyzed factors changes the algorithm performance , but they can not be analyzed separately because there are also significant interactions belonging to these factors . At a 5% significance level, analysis of variance showed that the core interactions: memory, memory with OS, memory with VM and OS with VM, all these factors impact the runtime of the analyzed algorithms. According to the statistical method mean comparison was possible then make a comparison between the mean times of significant interaction between OS and VM, and based on results has been applied an adaptation of Pareto dominance theory called Pareto dominance. Also, with 5% significance level was possible to discover Pareto\'s borders. Considering the runtime algorithm, the Pareto Dominance introduced the virtual environment Xen with CentOS in the first border as the virtual environment that on average achieved the best performance for the analyzed computational algorithms. Virtual environments that occupied the second border were the environments Xen with Ubuntu and VMware with CentOS, ie they had on average lower times the first border and between them they were considered equivalent. The environments belonging to third border were KVM with Ubuntu, VMware with VMware and Ubuntu with Windows. The environments belonging to fourth border were Xen with Windows, KVM with CentOS and the environment that got on average lower than the other times was the KVM with Windows. It can be concluded that virtual machine Xen and CentOS operating system on average got the best performance. But if the user wants to use the Ubuntu operating system it is advisable to install it in Xen virtual machine. And if you want to use the Windows operating system recommends be installed on the VMware virtual machine.Doutor em CiênciasA Computação em Nuvem surge como um novo paradigma dominante em sistemas distribuídos, sendo um modelo que permite usuários acessarem, sob demanda, um conjunto compartilhado de recursos computacionais que podem ser configuráveis, como: redes, servidores, armazenamento, aplicativos e serviços. Esses recursos podem ser rapidamente fornecidos com o mínimo de esforço de gestão ou interação de um fornecedor. Na Computação em Nuvem, a infraestrutura pode ser disponibilizada como serviço através da virtualização com o uso de hipervisores. A virtualização é um mecanismo que abstrai os recursos de hardware e de sistema de um dado sistema operacional. Esse tipo de tecnologia é utilizada em ambientes em nuvens através de um grande conjunto de servidores, usando monitores de máquinas virtuais que estão localizadas entre o hardware e o sistema operacional. No entanto, existe uma grande disseminação de hipervisores, cada um com a suas próprias vantagens e desvantagens. As características específicas de cada máquina virtual permitem existir desempenhos diferentes. O objetivo do trabalho é propor uma metodologia que busca-se descobrir como, quando e quanto o aumento do desempenho dos algoritmos em ambientes virtuais é determinado pela configuração do ambiente e como os parâmetros de configuração podem influenciar-se mutuamente, e por fim, descobrir através de métodos estatísticos qual configuração de ambiente virtual obteve os melhores resultados em média. Os algoritmos testados (sudoku, pi, wordcount, testDFSIO read e testDFSIO write) pertencem ao benchmark do Apache Hadoop. Esses experimentos foram planejados e executados tendo como base a teoria de planejamento experimental. O planejamento experimental representa um conjunto de ensaios pré-estabelecidos usando critérios científicos e principalmente estatísticos, com o objetivo de determinar a influência de diversos fatores nos resultados (métricas) de um sistema ou processo, identificando e observando as razões que ocasionaram alteração do valor esperado. O planejamento utilizado foi o planejamento fatorial 34, onde cada fator (núcleo, memória, sistema operacional e máquina virtual) foram variados em três níveis. Os sistemas operacionais testados foram o Ubuntu 14.04 64bits, CentOS 7.0 64bits e Windows 8.0 64bits; e as máquinas virtuais testadas foram o KVM, Xen e VMware. Os resultados foram coletados e analisados utilizando análise de variância. Os resultados mostram que os fatores principais analisados alteram o desempenho de um algoritmo, porém eles não podem ser analisados separadamente pois existem interações que também são significativas, as quais, esses fatores pertencem. A um nível de significância de 5%, a análise de variância mostrou que as interações núcleo:memória, memória:SO, memória:VM e SO:VM juntas, impactaram o tempo de execução dos algoritmos analisados. Segundo o método estatístico de comparação de médias, foi possível então fazer uma comparação entre as médias dos tempos da interação significativa SO:VM e com base nos resultados encontrados foi aplicado uma adaptação da teoria de dominância de Pareto denominada dominância estatística de Pareto . E também, a um nível de 5% de significância foi possível descobrir as fronteiras de Pareto. Levando em consideração o tempo de execução do algoritmo, a dominância de Pareto apresentou o ambiente virtual Xen:CentOS na primeira fronteira como o ambiente virtual que em média obteve os melhores desempenhos computacionais para os algoritmos analisados. Os ambientes virtuais que ocuparam a segunda fronteira foram os ambientes Xen:Ubuntu e VMware:CentOS, ou seja eles tiveram em média tempos inferiores à primeira fronteira e entre si eles foram considerados equivalentes. Os ambientes pertencentes à terceira fronteira foram KVM:Ubuntu, VMware:Ubuntu e VMware: Windows. Os ambientes pertencentes à quarta fronteira foram Xen:Windows, KVM:CentOS e o ambiente que obteve em média tempos inferiores aos demais foi o KVM:Windows. Pode-se concluir que a máquina virtual Xen e o sistema operacional CentOS, em média, obtiveram os melhores desempenhos. Porém, se o usuário quiser utilizar o sistema operacional Ubuntu, aconselha-se instalá-lo na máquina virtual Xen. E caso o usuário deseje usar o sistema operacional Windows, aconselha ser instalado sobre a máquina virtual VMware.Universidade Federal de UberlândiaBRPrograma de Pós-graduação em Engenharia ElétricaEngenhariasUFUYamanaka, Keijihttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4798494D8Oliveira, Mônica Rocha Ferreira dehttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4107055J7Camargos, Lásaro Jonashttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4704069A5Pinto, Edmilson Rodrigueshttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4775406H8Verdi, Fábio Lucianohttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4761267U6Bôaventura, Ricardo Soares2016-06-22T18:38:13Z2015-11-192016-06-22T18:38:13Z2015-03-06info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfapplication/pdfBÔAVENTURA, Ricardo Soares. Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop. 2015. 236 f. Tese (Doutorado em Engenharias) - Universidade Federal de Uberlândia, Uberlândia, 2015. DOI https://doi.org/10.14393/ufu.te.2015.37https://repositorio.ufu.br/handle/123456789/14350https://doi.org/10.14393/ufu.te.2015.37porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFUinstname:Universidade Federal de Uberlândia (UFU)instacron:UFU2021-03-03T23:51:50Zoai:repositorio.ufu.br:123456789/14350Repositório InstitucionalONGhttp://repositorio.ufu.br/oai/requestdiinf@dirbi.ufu.bropendoar:2021-03-03T23:51:50Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)false
dc.title.none.fl_str_mv	Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
title	Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
spellingShingle	Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop Bôaventura, Ricardo Soares Virtualização Computação em nuvem Nuvem privada Planejamento experimental Experimentos com algoritmos Dominância de pareto Análise de variância Algoritmos de computador Virtualization Cloud computing Private cloud Experimental planning Experiments with algorithms Pareto dominance Analysis of variance CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
title_short	Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
title_full	Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
title_fullStr	Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
title_full_unstemmed	Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
title_sort	Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
author	Bôaventura, Ricardo Soares
author_facet	Bôaventura, Ricardo Soares
author_role	author
dc.contributor.none.fl_str_mv	Yamanaka, Keiji http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4798494D8 Oliveira, Mônica Rocha Ferreira de http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4107055J7 Camargos, Lásaro Jonas http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4704069A5 Pinto, Edmilson Rodrigues http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4775406H8 Verdi, Fábio Luciano http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4761267U6
dc.contributor.author.fl_str_mv	Bôaventura, Ricardo Soares
dc.subject.por.fl_str_mv	Virtualização Computação em nuvem Nuvem privada Planejamento experimental Experimentos com algoritmos Dominância de pareto Análise de variância Algoritmos de computador Virtualization Cloud computing Private cloud Experimental planning Experiments with algorithms Pareto dominance Analysis of variance CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
topic	Virtualização Computação em nuvem Nuvem privada Planejamento experimental Experimentos com algoritmos Dominância de pareto Análise de variância Algoritmos de computador Virtualization Cloud computing Private cloud Experimental planning Experiments with algorithms Pareto dominance Analysis of variance CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
description	Cloud computing emerges as a new dominant paradigm in distributed systems, with a model that allows users to access, over demand, to a shared pool of computing configurable resources, such as networks, servers, storage, applications and services. These resources can be rapidly provided with minimal management effort or interaction from a supplier. In cloud computing, the infrastructure can be made available as a service through virtualization using hypervisors. Virtualization is a mechanism that presents the hardware and system resources of a given operating system. This technology is used in environments clouds through a large set of server using virtual machine monitors that are located between the hardware and the operating system. However, there is a wide spread of hypervisors, each with its own advantages and disadvantages. The specific characteristics of each virtual machine generates different performances. The aim of this work is to propose a methodology that seeks to discover how, when and as the increased performance of the algorithms in virtual environments is determined by the environment configuration and how the configuration parameters can influence each other, and finally, discover using statistical methods which settings of virtual environment achieve the best results on average. The tested algorithms (sudoku, pi, wordcount, testDFSIO read and write testDFSIO) belong to the benchmark Apache Hadoop. These experiments were planned and executed based on the experimental design theory. The experimental design is a pre-established set of tests using scientific and statistical criteria mainly, in order to determine the influence of various factors on the results (metric) of a system or process, identifying and observing the reasons that led to change in the expected value. The planning that was used is factorial planning 34, where each factor (core, memory, operating system and virtual machine) were varied in three levels. Tested operating systems were Ubuntu 14.04 64bit, CentOS 7.0 64bit and Windows 8.0 64bit; and virtual machines were tested KVM, Xen and VMware. Data were collected and analyzed using analysis of variance. The results show that the major analyzed factors changes the algorithm performance , but they can not be analyzed separately because there are also significant interactions belonging to these factors . At a 5% significance level, analysis of variance showed that the core interactions: memory, memory with OS, memory with VM and OS with VM, all these factors impact the runtime of the analyzed algorithms. According to the statistical method mean comparison was possible then make a comparison between the mean times of significant interaction between OS and VM, and based on results has been applied an adaptation of Pareto dominance theory called Pareto dominance. Also, with 5% significance level was possible to discover Pareto\'s borders. Considering the runtime algorithm, the Pareto Dominance introduced the virtual environment Xen with CentOS in the first border as the virtual environment that on average achieved the best performance for the analyzed computational algorithms. Virtual environments that occupied the second border were the environments Xen with Ubuntu and VMware with CentOS, ie they had on average lower times the first border and between them they were considered equivalent. The environments belonging to third border were KVM with Ubuntu, VMware with VMware and Ubuntu with Windows. The environments belonging to fourth border were Xen with Windows, KVM with CentOS and the environment that got on average lower than the other times was the KVM with Windows. It can be concluded that virtual machine Xen and CentOS operating system on average got the best performance. But if the user wants to use the Ubuntu operating system it is advisable to install it in Xen virtual machine. And if you want to use the Windows operating system recommends be installed on the VMware virtual machine.
publishDate	2015
dc.date.none.fl_str_mv	2015-11-19 2015-03-06 2016-06-22T18:38:13Z 2016-06-22T18:38:13Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	BÔAVENTURA, Ricardo Soares. Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop. 2015. 236 f. Tese (Doutorado em Engenharias) - Universidade Federal de Uberlândia, Uberlândia, 2015. DOI https://doi.org/10.14393/ufu.te.2015.37 https://repositorio.ufu.br/handle/123456789/14350 https://doi.org/10.14393/ufu.te.2015.37
identifier_str_mv	BÔAVENTURA, Ricardo Soares. Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop. 2015. 236 f. Tese (Doutorado em Engenharias) - Universidade Federal de Uberlândia, Uberlândia, 2015. DOI https://doi.org/10.14393/ufu.te.2015.37
url	https://repositorio.ufu.br/handle/123456789/14350 https://doi.org/10.14393/ufu.te.2015.37
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Uberlândia BR Programa de Pós-graduação em Engenharia Elétrica Engenharias UFU
publisher.none.fl_str_mv	Universidade Federal de Uberlândia BR Programa de Pós-graduação em Engenharia Elétrica Engenharias UFU
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFU instname:Universidade Federal de Uberlândia (UFU) instacron:UFU
instname_str	Universidade Federal de Uberlândia (UFU)
instacron_str	UFU
institution	UFU
reponame_str	Repositório Institucional da UFU
collection	Repositório Institucional da UFU
repository.name.fl_str_mv	Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)
repository.mail.fl_str_mv	diinf@dirbi.ufu.br
_version_	1813711444168933376

Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop

Registros relacionados