Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop

Detalhes bibliográficos
Autor(a) principal: Bôaventura, Ricardo Soares
Data de Publicação: 2015
Tipo de documento: Tese
Idioma: por
Título da fonte: Repositório Institucional da UFU
Texto Completo: https://repositorio.ufu.br/handle/123456789/14350
https://doi.org/10.14393/ufu.te.2015.37
Resumo: Cloud computing emerges as a new dominant paradigm in distributed systems, with a model that allows users to access, over demand, to a shared pool of computing configurable resources, such as networks, servers, storage, applications and services. These resources can be rapidly provided with minimal management effort or interaction from a supplier. In cloud computing, the infrastructure can be made available as a service through virtualization using hypervisors. Virtualization is a mechanism that presents the hardware and system resources of a given operating system. This technology is used in environments clouds through a large set of server using virtual machine monitors that are located between the hardware and the operating system. However, there is a wide spread of hypervisors, each with its own advantages and disadvantages. The specific characteristics of each virtual machine generates different performances. The aim of this work is to propose a methodology that seeks to discover how, when and as the increased performance of the algorithms in virtual environments is determined by the environment configuration and how the configuration parameters can influence each other, and finally, discover using statistical methods which settings of virtual environment achieve the best results on average. The tested algorithms (sudoku, pi, wordcount, testDFSIO read and write testDFSIO) belong to the benchmark Apache Hadoop. These experiments were planned and executed based on the experimental design theory. The experimental design is a pre-established set of tests using scientific and statistical criteria mainly, in order to determine the influence of various factors on the results (metric) of a system or process, identifying and observing the reasons that led to change in the expected value. The planning that was used is factorial planning 34, where each factor (core, memory, operating system and virtual machine) were varied in three levels. Tested operating systems were Ubuntu 14.04 64bit, CentOS 7.0 64bit and Windows 8.0 64bit; and virtual machines were tested KVM, Xen and VMware. Data were collected and analyzed using analysis of variance. The results show that the major analyzed factors changes the algorithm performance , but they can not be analyzed separately because there are also significant interactions belonging to these factors . At a 5% significance level, analysis of variance showed that the core interactions: memory, memory with OS, memory with VM and OS with VM, all these factors impact the runtime of the analyzed algorithms. According to the statistical method mean comparison was possible then make a comparison between the mean times of significant interaction between OS and VM, and based on results has been applied an adaptation of Pareto dominance theory called Pareto dominance. Also, with 5% significance level was possible to discover Pareto\'s borders. Considering the runtime algorithm, the Pareto Dominance introduced the virtual environment Xen with CentOS in the first border as the virtual environment that on average achieved the best performance for the analyzed computational algorithms. Virtual environments that occupied the second border were the environments Xen with Ubuntu and VMware with CentOS, ie they had on average lower times the first border and between them they were considered equivalent. The environments belonging to third border were KVM with Ubuntu, VMware with VMware and Ubuntu with Windows. The environments belonging to fourth border were Xen with Windows, KVM with CentOS and the environment that got on average lower than the other times was the KVM with Windows. It can be concluded that virtual machine Xen and CentOS operating system on average got the best performance. But if the user wants to use the Ubuntu operating system it is advisable to install it in Xen virtual machine. And if you want to use the Windows operating system recommends be installed on the VMware virtual machine.
id UFU_47b3cd4e2e09cb54948d7f95f004ac6a
oai_identifier_str oai:repositorio.ufu.br:123456789/14350
network_acronym_str UFU
network_name_str Repositório Institucional da UFU
repository_id_str
spelling Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark HadoopVirtualizaçãoComputação em nuvemNuvem privadaPlanejamento experimentalExperimentos com algoritmosDominância de paretoAnálise de variânciaAlgoritmos de computadorVirtualizationCloud computingPrivate cloudExperimental planningExperiments with algorithmsPareto dominanceAnalysis of varianceCNPQ::ENGENHARIAS::ENGENHARIA ELETRICACloud computing emerges as a new dominant paradigm in distributed systems, with a model that allows users to access, over demand, to a shared pool of computing configurable resources, such as networks, servers, storage, applications and services. These resources can be rapidly provided with minimal management effort or interaction from a supplier. In cloud computing, the infrastructure can be made available as a service through virtualization using hypervisors. Virtualization is a mechanism that presents the hardware and system resources of a given operating system. This technology is used in environments clouds through a large set of server using virtual machine monitors that are located between the hardware and the operating system. However, there is a wide spread of hypervisors, each with its own advantages and disadvantages. The specific characteristics of each virtual machine generates different performances. The aim of this work is to propose a methodology that seeks to discover how, when and as the increased performance of the algorithms in virtual environments is determined by the environment configuration and how the configuration parameters can influence each other, and finally, discover using statistical methods which settings of virtual environment achieve the best results on average. The tested algorithms (sudoku, pi, wordcount, testDFSIO read and write testDFSIO) belong to the benchmark Apache Hadoop. These experiments were planned and executed based on the experimental design theory. The experimental design is a pre-established set of tests using scientific and statistical criteria mainly, in order to determine the influence of various factors on the results (metric) of a system or process, identifying and observing the reasons that led to change in the expected value. The planning that was used is factorial planning 34, where each factor (core, memory, operating system and virtual machine) were varied in three levels. Tested operating systems were Ubuntu 14.04 64bit, CentOS 7.0 64bit and Windows 8.0 64bit; and virtual machines were tested KVM, Xen and VMware. Data were collected and analyzed using analysis of variance. The results show that the major analyzed factors changes the algorithm performance , but they can not be analyzed separately because there are also significant interactions belonging to these factors . At a 5% significance level, analysis of variance showed that the core interactions: memory, memory with OS, memory with VM and OS with VM, all these factors impact the runtime of the analyzed algorithms. According to the statistical method mean comparison was possible then make a comparison between the mean times of significant interaction between OS and VM, and based on results has been applied an adaptation of Pareto dominance theory called Pareto dominance. Also, with 5% significance level was possible to discover Pareto\'s borders. Considering the runtime algorithm, the Pareto Dominance introduced the virtual environment Xen with CentOS in the first border as the virtual environment that on average achieved the best performance for the analyzed computational algorithms. Virtual environments that occupied the second border were the environments Xen with Ubuntu and VMware with CentOS, ie they had on average lower times the first border and between them they were considered equivalent. The environments belonging to third border were KVM with Ubuntu, VMware with VMware and Ubuntu with Windows. The environments belonging to fourth border were Xen with Windows, KVM with CentOS and the environment that got on average lower than the other times was the KVM with Windows. It can be concluded that virtual machine Xen and CentOS operating system on average got the best performance. But if the user wants to use the Ubuntu operating system it is advisable to install it in Xen virtual machine. And if you want to use the Windows operating system recommends be installed on the VMware virtual machine.Doutor em CiênciasA Computação em Nuvem surge como um novo paradigma dominante em sistemas distribuídos, sendo um modelo que permite usuários acessarem, sob demanda, um conjunto compartilhado de recursos computacionais que podem ser configuráveis, como: redes, servidores, armazenamento, aplicativos e serviços. Esses recursos podem ser rapidamente fornecidos com o mínimo de esforço de gestão ou interação de um fornecedor. Na Computação em Nuvem, a infraestrutura pode ser disponibilizada como serviço através da virtualização com o uso de hipervisores. A virtualização é um mecanismo que abstrai os recursos de hardware e de sistema de um dado sistema operacional. Esse tipo de tecnologia é utilizada em ambientes em nuvens através de um grande conjunto de servidores, usando monitores de máquinas virtuais que estão localizadas entre o hardware e o sistema operacional. No entanto, existe uma grande disseminação de hipervisores, cada um com a suas próprias vantagens e desvantagens. As características específicas de cada máquina virtual permitem existir desempenhos diferentes. O objetivo do trabalho é propor uma metodologia que busca-se descobrir como, quando e quanto o aumento do desempenho dos algoritmos em ambientes virtuais é determinado pela configuração do ambiente e como os parâmetros de configuração podem influenciar-se mutuamente, e por fim, descobrir através de métodos estatísticos qual configuração de ambiente virtual obteve os melhores resultados em média. Os algoritmos testados (sudoku, pi, wordcount, testDFSIO read e testDFSIO write) pertencem ao benchmark do Apache Hadoop. Esses experimentos foram planejados e executados tendo como base a teoria de planejamento experimental. O planejamento experimental representa um conjunto de ensaios pré-estabelecidos usando critérios científicos e principalmente estatísticos, com o objetivo de determinar a influência de diversos fatores nos resultados (métricas) de um sistema ou processo, identificando e observando as razões que ocasionaram alteração do valor esperado. O planejamento utilizado foi o planejamento fatorial 34, onde cada fator (núcleo, memória, sistema operacional e máquina virtual) foram variados em três níveis. Os sistemas operacionais testados foram o Ubuntu 14.04 64bits, CentOS 7.0 64bits e Windows 8.0 64bits; e as máquinas virtuais testadas foram o KVM, Xen e VMware. Os resultados foram coletados e analisados utilizando análise de variância. Os resultados mostram que os fatores principais analisados alteram o desempenho de um algoritmo, porém eles não podem ser analisados separadamente pois existem interações que também são significativas, as quais, esses fatores pertencem. A um nível de significância de 5%, a análise de variância mostrou que as interações núcleo:memória, memória:SO, memória:VM e SO:VM juntas, impactaram o tempo de execução dos algoritmos analisados. Segundo o método estatístico de comparação de médias, foi possível então fazer uma comparação entre as médias dos tempos da interação significativa SO:VM e com base nos resultados encontrados foi aplicado uma adaptação da teoria de dominância de Pareto denominada dominância estatística de Pareto . E também, a um nível de 5% de significância foi possível descobrir as fronteiras de Pareto. Levando em consideração o tempo de execução do algoritmo, a dominância de Pareto apresentou o ambiente virtual Xen:CentOS na primeira fronteira como o ambiente virtual que em média obteve os melhores desempenhos computacionais para os algoritmos analisados. Os ambientes virtuais que ocuparam a segunda fronteira foram os ambientes Xen:Ubuntu e VMware:CentOS, ou seja eles tiveram em média tempos inferiores à primeira fronteira e entre si eles foram considerados equivalentes. Os ambientes pertencentes à terceira fronteira foram KVM:Ubuntu, VMware:Ubuntu e VMware: Windows. Os ambientes pertencentes à quarta fronteira foram Xen:Windows, KVM:CentOS e o ambiente que obteve em média tempos inferiores aos demais foi o KVM:Windows. Pode-se concluir que a máquina virtual Xen e o sistema operacional CentOS, em média, obtiveram os melhores desempenhos. Porém, se o usuário quiser utilizar o sistema operacional Ubuntu, aconselha-se instalá-lo na máquina virtual Xen. E caso o usuário deseje usar o sistema operacional Windows, aconselha ser instalado sobre a máquina virtual VMware.Universidade Federal de UberlândiaBRPrograma de Pós-graduação em Engenharia ElétricaEngenhariasUFUYamanaka, Keijihttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4798494D8Oliveira, Mônica Rocha Ferreira dehttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4107055J7Camargos, Lásaro Jonashttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4704069A5Pinto, Edmilson Rodrigueshttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4775406H8Verdi, Fábio Lucianohttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4761267U6Bôaventura, Ricardo Soares2016-06-22T18:38:13Z2015-11-192016-06-22T18:38:13Z2015-03-06info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfapplication/pdfBÔAVENTURA, Ricardo Soares. Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop. 2015. 236 f. Tese (Doutorado em Engenharias) - Universidade Federal de Uberlândia, Uberlândia, 2015. DOI https://doi.org/10.14393/ufu.te.2015.37https://repositorio.ufu.br/handle/123456789/14350https://doi.org/10.14393/ufu.te.2015.37porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFUinstname:Universidade Federal de Uberlândia (UFU)instacron:UFU2021-03-03T23:51:50Zoai:repositorio.ufu.br:123456789/14350Repositório InstitucionalONGhttp://repositorio.ufu.br/oai/requestdiinf@dirbi.ufu.bropendoar:2021-03-03T23:51:50Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)false
dc.title.none.fl_str_mv Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
title Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
spellingShingle Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
Bôaventura, Ricardo Soares
Virtualização
Computação em nuvem
Nuvem privada
Planejamento experimental
Experimentos com algoritmos
Dominância de pareto
Análise de variância
Algoritmos de computador
Virtualization
Cloud computing
Private cloud
Experimental planning
Experiments with algorithms
Pareto dominance
Analysis of variance
CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
title_short Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
title_full Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
title_fullStr Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
title_full_unstemmed Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
title_sort Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop
author Bôaventura, Ricardo Soares
author_facet Bôaventura, Ricardo Soares
author_role author
dc.contributor.none.fl_str_mv Yamanaka, Keiji
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4798494D8
Oliveira, Mônica Rocha Ferreira de
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4107055J7
Camargos, Lásaro Jonas
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4704069A5
Pinto, Edmilson Rodrigues
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4775406H8
Verdi, Fábio Luciano
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4761267U6
dc.contributor.author.fl_str_mv Bôaventura, Ricardo Soares
dc.subject.por.fl_str_mv Virtualização
Computação em nuvem
Nuvem privada
Planejamento experimental
Experimentos com algoritmos
Dominância de pareto
Análise de variância
Algoritmos de computador
Virtualization
Cloud computing
Private cloud
Experimental planning
Experiments with algorithms
Pareto dominance
Analysis of variance
CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
topic Virtualização
Computação em nuvem
Nuvem privada
Planejamento experimental
Experimentos com algoritmos
Dominância de pareto
Análise de variância
Algoritmos de computador
Virtualization
Cloud computing
Private cloud
Experimental planning
Experiments with algorithms
Pareto dominance
Analysis of variance
CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
description Cloud computing emerges as a new dominant paradigm in distributed systems, with a model that allows users to access, over demand, to a shared pool of computing configurable resources, such as networks, servers, storage, applications and services. These resources can be rapidly provided with minimal management effort or interaction from a supplier. In cloud computing, the infrastructure can be made available as a service through virtualization using hypervisors. Virtualization is a mechanism that presents the hardware and system resources of a given operating system. This technology is used in environments clouds through a large set of server using virtual machine monitors that are located between the hardware and the operating system. However, there is a wide spread of hypervisors, each with its own advantages and disadvantages. The specific characteristics of each virtual machine generates different performances. The aim of this work is to propose a methodology that seeks to discover how, when and as the increased performance of the algorithms in virtual environments is determined by the environment configuration and how the configuration parameters can influence each other, and finally, discover using statistical methods which settings of virtual environment achieve the best results on average. The tested algorithms (sudoku, pi, wordcount, testDFSIO read and write testDFSIO) belong to the benchmark Apache Hadoop. These experiments were planned and executed based on the experimental design theory. The experimental design is a pre-established set of tests using scientific and statistical criteria mainly, in order to determine the influence of various factors on the results (metric) of a system or process, identifying and observing the reasons that led to change in the expected value. The planning that was used is factorial planning 34, where each factor (core, memory, operating system and virtual machine) were varied in three levels. Tested operating systems were Ubuntu 14.04 64bit, CentOS 7.0 64bit and Windows 8.0 64bit; and virtual machines were tested KVM, Xen and VMware. Data were collected and analyzed using analysis of variance. The results show that the major analyzed factors changes the algorithm performance , but they can not be analyzed separately because there are also significant interactions belonging to these factors . At a 5% significance level, analysis of variance showed that the core interactions: memory, memory with OS, memory with VM and OS with VM, all these factors impact the runtime of the analyzed algorithms. According to the statistical method mean comparison was possible then make a comparison between the mean times of significant interaction between OS and VM, and based on results has been applied an adaptation of Pareto dominance theory called Pareto dominance. Also, with 5% significance level was possible to discover Pareto\'s borders. Considering the runtime algorithm, the Pareto Dominance introduced the virtual environment Xen with CentOS in the first border as the virtual environment that on average achieved the best performance for the analyzed computational algorithms. Virtual environments that occupied the second border were the environments Xen with Ubuntu and VMware with CentOS, ie they had on average lower times the first border and between them they were considered equivalent. The environments belonging to third border were KVM with Ubuntu, VMware with VMware and Ubuntu with Windows. The environments belonging to fourth border were Xen with Windows, KVM with CentOS and the environment that got on average lower than the other times was the KVM with Windows. It can be concluded that virtual machine Xen and CentOS operating system on average got the best performance. But if the user wants to use the Ubuntu operating system it is advisable to install it in Xen virtual machine. And if you want to use the Windows operating system recommends be installed on the VMware virtual machine.
publishDate 2015
dc.date.none.fl_str_mv 2015-11-19
2015-03-06
2016-06-22T18:38:13Z
2016-06-22T18:38:13Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv BÔAVENTURA, Ricardo Soares. Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop. 2015. 236 f. Tese (Doutorado em Engenharias) - Universidade Federal de Uberlândia, Uberlândia, 2015. DOI https://doi.org/10.14393/ufu.te.2015.37
https://repositorio.ufu.br/handle/123456789/14350
https://doi.org/10.14393/ufu.te.2015.37
identifier_str_mv BÔAVENTURA, Ricardo Soares. Comparação do desempenho de ambientes virtuais na computação em nuvem privada usando a análise estatística e o benchmark Hadoop. 2015. 236 f. Tese (Doutorado em Engenharias) - Universidade Federal de Uberlândia, Uberlândia, 2015. DOI https://doi.org/10.14393/ufu.te.2015.37
url https://repositorio.ufu.br/handle/123456789/14350
https://doi.org/10.14393/ufu.te.2015.37
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Uberlândia
BR
Programa de Pós-graduação em Engenharia Elétrica
Engenharias
UFU
publisher.none.fl_str_mv Universidade Federal de Uberlândia
BR
Programa de Pós-graduação em Engenharia Elétrica
Engenharias
UFU
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFU
instname:Universidade Federal de Uberlândia (UFU)
instacron:UFU
instname_str Universidade Federal de Uberlândia (UFU)
instacron_str UFU
institution UFU
reponame_str Repositório Institucional da UFU
collection Repositório Institucional da UFU
repository.name.fl_str_mv Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)
repository.mail.fl_str_mv diinf@dirbi.ufu.br
_version_ 1805569637692211200