Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas

Ferrari, Renato Pizzinato

Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas

Detalhes bibliográficos
Autor(a) principal:	Ferrari, Renato Pizzinato
Data de Publicação:	2016
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Biblioteca Digital de Teses e Dissertações do UFSM
Texto Completo:	http://repositorio.ufsm.br/handle/1/15054
Resumo:	This master’s work focused on the development of a decision-making method with automated steps in order to assist the developer to take the following decision in a given hybrid system: in which system drive a particular task should be mapped, in order to obtain the best performance available hardware? Parallel programs target this work should be developed with the standard OpenACC, using compiler directives to express parallelism and is designed to facilitate programming in hybrid systems consisting of CPU and GPU.A approach used in this study is empirical, based on observations performance programs in different configurations and with different parameters and input data. The formulated proposals do not aim to guarantee the best decision mapping, but short, as far as possible, the decision process. Aiming to further discuss this issue of performance at the beginning of this master’s work were made experiments with a benchmark for OpenACC. The approach adopted in this study is hypothesis that performance CPU and GPU can be estimated for a given task at a given real hybrid system. This estimate can be approximated as, at worst, will be equivalent to an erroneous estimate made manually, which will be perceived and can be corrected for subsequent executions. Thus suggests that the performance estimation of CPU and GPU is made based jointly on the following criteria: size of the input data, complexity in time and space and performance target hardware benchmarks. To form a basis for decision support, it is proposed that a table is built and maintained on each line is a benchmark in OpenACC, possibly belonging to a suite of benchmarks as EPCC. His creation, which requires multiple runs of some benchmarks, occurs only once for a given hybrid system and its data are potentially utilized in different applications and executions. Aiming achieve the goal of shortening the process and require a minimum developer interference, has developed a tool that automates parts of this process. The assessment tool was carried out in order to test its functionality, limitations and quality of the forward estimates of scientific computing programs. Three programs, belonging to the benchmark Polybench were chosen. They are: gramschmidt (decomposition by Gram-Schmidt method), lu (LU decomposition) and durbin (system solution Toeplitz matrix). Each has different computational complexity. The effectiveness of automated decision can be verified by comparing the run times between Host, Device and Tools. The automated decision by the tool was determined that the Gram-Schmidt function execution on GPU when the order of the matrix was greater than or equal to 400. The difference between the observed order matrix 300 for Order 400 is calculated due to the difference between the estimated amount of arithmetic operations of the function correlation and Gram-Schmidt function. The effectiveness of the decision tool, which is based on the analysis of a benchmark is restricted to algorithms that have computational complexity in time similar to the benchmark. The differences in values of memory allocated by the benchmark and the parallelized program are due to parameters that are not easily measured, with for example the dependence between variables. Therefore it is recommended that the choice of the memory value used as a decision criterion is made through an iterative process, taking as initial parameter value obtained in the analysis of benchmark.

Metadados do item

id	UFSM_bbb3f725301b7e5a13dbfa00ba7d0d82
oai_identifier_str	oai:repositorio.ufsm.br:1/15054
network_acronym_str	UFSM
network_name_str	Biblioteca Digital de Teses e Dissertações do UFSM
repository_id_str
spelling	2018-12-10T14:56:53Z2018-12-10T14:56:53Z2016-03-28http://repositorio.ufsm.br/handle/1/15054This master’s work focused on the development of a decision-making method with automated steps in order to assist the developer to take the following decision in a given hybrid system: in which system drive a particular task should be mapped, in order to obtain the best performance available hardware? Parallel programs target this work should be developed with the standard OpenACC, using compiler directives to express parallelism and is designed to facilitate programming in hybrid systems consisting of CPU and GPU.A approach used in this study is empirical, based on observations performance programs in different configurations and with different parameters and input data. The formulated proposals do not aim to guarantee the best decision mapping, but short, as far as possible, the decision process. Aiming to further discuss this issue of performance at the beginning of this master’s work were made experiments with a benchmark for OpenACC. The approach adopted in this study is hypothesis that performance CPU and GPU can be estimated for a given task at a given real hybrid system. This estimate can be approximated as, at worst, will be equivalent to an erroneous estimate made manually, which will be perceived and can be corrected for subsequent executions. Thus suggests that the performance estimation of CPU and GPU is made based jointly on the following criteria: size of the input data, complexity in time and space and performance target hardware benchmarks. To form a basis for decision support, it is proposed that a table is built and maintained on each line is a benchmark in OpenACC, possibly belonging to a suite of benchmarks as EPCC. His creation, which requires multiple runs of some benchmarks, occurs only once for a given hybrid system and its data are potentially utilized in different applications and executions. Aiming achieve the goal of shortening the process and require a minimum developer interference, has developed a tool that automates parts of this process. The assessment tool was carried out in order to test its functionality, limitations and quality of the forward estimates of scientific computing programs. Three programs, belonging to the benchmark Polybench were chosen. They are: gramschmidt (decomposition by Gram-Schmidt method), lu (LU decomposition) and durbin (system solution Toeplitz matrix). Each has different computational complexity. The effectiveness of automated decision can be verified by comparing the run times between Host, Device and Tools. The automated decision by the tool was determined that the Gram-Schmidt function execution on GPU when the order of the matrix was greater than or equal to 400. The difference between the observed order matrix 300 for Order 400 is calculated due to the difference between the estimated amount of arithmetic operations of the function correlation and Gram-Schmidt function. The effectiveness of the decision tool, which is based on the analysis of a benchmark is restricted to algorithms that have computational complexity in time similar to the benchmark. The differences in values of memory allocated by the benchmark and the parallelized program are due to parameters that are not easily measured, with for example the dependence between variables. Therefore it is recommended that the choice of the memory value used as a decision criterion is made through an iterative process, taking as initial parameter value obtained in the analysis of benchmark.O presente trabalho de mestrado concentrou-se no desenvolvimento de um método de decisão com etapas automatizadas, a fim de auxiliar o desenvolvedor a tomar a seguinte decisão em um dado sistema híbrido: em qual unidade do sistema deve ser mapeada uma determinada tarefa, para que se obtenha o melhor desempenho no hardware disponível? Os programas paralelos alvo deste trabalho devem ser desenvolvidos com o padrão OpenACC, que usa diretivas de compilação para expressar o paralelismo e foi criado para facilitar a programação em sistemas híbridos formados por CPU e GPU.A abordagem utilizada neste trabalho é empírica, baseada em observações do desempenho de programas em diferentes configurações e com diferentes parâmetros e dados de entrada. As propostas formuladas não têm por objetivo garantir a melhor decisão de mapeamento, mas sim abreviar, na medida do possível, o processo de decisão. Visando aprofundar esta questão de desempenho, no início deste trabalho de mestrado foram feitos experimentos com um benchmark para OpenACC. A abordagem adotada neste trabalho tem por hipótese que o desempenho em CPU e em GPU possa ser estimado para uma determinada tarefa, em um dado sistema híbrido real. Essa estimativa pode ser aproximada pois, no pior dos casos, será equivalente a uma estimativa errônea realizada manualmente, que será percebida e poderá ser corrigida para execuções subsequentes. Dessa forma propõe que a estimativa de desempenho em CPU e GPU seja feita baseando-se conjuntamente nos seguintes critérios:tamanho dos dados de entrada, complexidade no tempo e no espaço e desempenho do hardware alvo em benchmarks. Para formar uma base de apoio à decisão, propõe-se que seja construída e mantida uma tabela em que cada linha é um benchmark em OpenACC, possivelmente pertencente a uma suite de benchmarks como o EPCC. Sua criação, que requer várias execuções de alguns benchmarks, ocorre uma única vez para um dado sistema híbrido e seus dados são, potencialmente, aproveitados em diferentes aplicações e execuções.Visando atingir o objetivo de abreviar o processo e exigir um mínimo de interferência do desenvolvedor, desenvolveu-se uma ferramenta que automatiza partes desse processo. Foram escolhidos três programas, pertencentes ao benchmark Polybench. São eles: gramschmidt (decomposição pelo método de Gram-Schmidt), lu (decomposição LU) e durbin (solução de sistema com matriz de Toeplitz). Cada um deles possui complexidade computacional diferente. A eficácia da decisão automatizada pode ser verificada comparando-se os tempos de execução entre Host, Device e da Ferramenta. A decisão automatizada realizada pela ferramenta determinou que a execução da função de Gram-Schmidt fosse na GPU quando a ordem da matriz fosse maior ou igual 400. A diferença entre a ordem da matriz observada 300 para ordem calculada 400 é devida à diferença entre a quantidade de operações aritméticas estimadas da função de correlação e a função de Gram-Schmidt. A eficácia da ferramenta de decisão, que tem por base a análise de um benchmark é restringida aos algoritmos que possuem complexidade computacional no tempo similar ao do benchmark. As diferenças dos valores da memória alocada pelo benchmark e o programa paralelizado devem-se a parâmetros que não são facilmente mensuráveis, como por exemplo a dependência entre as variáveis. Portanto recomenda-se que a escolha do valor da memória utilizada como critério de decisão seja feita através de um processo iterativo, tomando como parâmetro inicial o valor obtido na análise do benchmark.porUniversidade Federal de Santa MariaCentro de TecnologiaPrograma de Pós-Graduação em Ciência da ComputaçãoUFSMBrasilCiência da ComputaçãoAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessOpenACCGPUProcessamento paraleloOpenMPCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAODecisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridasAutomated decision on task mapping with OpenACC over hybrid parallel architecturesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisCharao, Andrea Schwertnerhttp://lattes.cnpq.br/8251676116103188Lima, João Vicente Ferreirahttp://lattes.cnpq.br/6266546896929217Campos Velho, Haroldo Fraga dehttp://lattes.cnpq.br/5142426481528206http://lattes.cnpq.br/0015021275451887Ferrari, Renato Pizzinato100300000007600432b3089-2a77-46f3-902e-7cc0f044f353c5701758-f51b-47e9-8dcb-2d03feb3cee0aec00059-8729-40bd-b989-bf07fcb3bdbd23864a87-e1f3-449d-a401-4f38434bb738reponame:Biblioteca Digital de Teses e Dissertações do UFSMinstname:Universidade Federal de Santa Maria (UFSM)instacron:UFSMORIGINALDIS_PPGCC_2016_FERRARI_RENATO.pdfDIS_PPGCC_2016_FERRARI_RENATO.pdfDissertação de Mestradoapplication/pdf2948728http://repositorio.ufsm.br/bitstream/1/15054/1/DIS_PPGCC_2016_FERRARI_RENATO.pdf3f2484eb4c71ba970d2dc7019b4c5164MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805http://repositorio.ufsm.br/bitstream/1/15054/2/license_rdf4460e5956bc1d1639be9ae6146a50347MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-816http://repositorio.ufsm.br/bitstream/1/15054/3/license.txtf8fcb28efb1c8cf0dc096bec902bf4c4MD53TEXTDIS_PPGCC_2016_FERRARI_RENATO.pdf.txtDIS_PPGCC_2016_FERRARI_RENATO.pdf.txtExtracted texttext/plain144091http://repositorio.ufsm.br/bitstream/1/15054/4/DIS_PPGCC_2016_FERRARI_RENATO.pdf.txta1c38150712249b596f78471c88ac220MD54THUMBNAILDIS_PPGCC_2016_FERRARI_RENATO.pdf.jpgDIS_PPGCC_2016_FERRARI_RENATO.pdf.jpgIM Thumbnailimage/jpeg4800http://repositorio.ufsm.br/bitstream/1/15054/5/DIS_PPGCC_2016_FERRARI_RENATO.pdf.jpg804e7a907cab1ac19bbb7916f2f0a081MD551/150542018-12-11 03:00:27.666oai:repositorio.ufsm.br:1/15054Q3JlYXRpdmUgQ29tbW9ucw==Biblioteca Digital de Teses e Dissertaçõeshttps://repositorio.ufsm.br/ONGhttps://repositorio.ufsm.br/oai/requestatendimento.sib@ufsm.br\|\|tedebc@gmail.comopendoar:2018-12-11T05:00:27Biblioteca Digital de Teses e Dissertações do UFSM - Universidade Federal de Santa Maria (UFSM)false
dc.title.por.fl_str_mv	Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas
dc.title.alternative.eng.fl_str_mv	Automated decision on task mapping with OpenACC over hybrid parallel architectures
title	Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas
spellingShingle	Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas Ferrari, Renato Pizzinato OpenACC GPU Processamento paralelo OpenMP CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas
title_full	Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas
title_fullStr	Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas
title_full_unstemmed	Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas
title_sort	Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas
author	Ferrari, Renato Pizzinato
author_facet	Ferrari, Renato Pizzinato
author_role	author
dc.contributor.advisor1.fl_str_mv	Charao, Andrea Schwertner
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/8251676116103188
dc.contributor.referee1.fl_str_mv	Lima, João Vicente Ferreira
dc.contributor.referee1Lattes.fl_str_mv	http://lattes.cnpq.br/6266546896929217
dc.contributor.referee2.fl_str_mv	Campos Velho, Haroldo Fraga de
dc.contributor.referee2Lattes.fl_str_mv	http://lattes.cnpq.br/5142426481528206
dc.contributor.authorLattes.fl_str_mv	http://lattes.cnpq.br/0015021275451887
dc.contributor.author.fl_str_mv	Ferrari, Renato Pizzinato
contributor_str_mv	Charao, Andrea Schwertner Lima, João Vicente Ferreira Campos Velho, Haroldo Fraga de
dc.subject.por.fl_str_mv	OpenACC GPU Processamento paralelo
topic	OpenACC GPU Processamento paralelo OpenMP CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	OpenMP
dc.subject.cnpq.fl_str_mv	CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	This master’s work focused on the development of a decision-making method with automated steps in order to assist the developer to take the following decision in a given hybrid system: in which system drive a particular task should be mapped, in order to obtain the best performance available hardware? Parallel programs target this work should be developed with the standard OpenACC, using compiler directives to express parallelism and is designed to facilitate programming in hybrid systems consisting of CPU and GPU.A approach used in this study is empirical, based on observations performance programs in different configurations and with different parameters and input data. The formulated proposals do not aim to guarantee the best decision mapping, but short, as far as possible, the decision process. Aiming to further discuss this issue of performance at the beginning of this master’s work were made experiments with a benchmark for OpenACC. The approach adopted in this study is hypothesis that performance CPU and GPU can be estimated for a given task at a given real hybrid system. This estimate can be approximated as, at worst, will be equivalent to an erroneous estimate made manually, which will be perceived and can be corrected for subsequent executions. Thus suggests that the performance estimation of CPU and GPU is made based jointly on the following criteria: size of the input data, complexity in time and space and performance target hardware benchmarks. To form a basis for decision support, it is proposed that a table is built and maintained on each line is a benchmark in OpenACC, possibly belonging to a suite of benchmarks as EPCC. His creation, which requires multiple runs of some benchmarks, occurs only once for a given hybrid system and its data are potentially utilized in different applications and executions. Aiming achieve the goal of shortening the process and require a minimum developer interference, has developed a tool that automates parts of this process. The assessment tool was carried out in order to test its functionality, limitations and quality of the forward estimates of scientific computing programs. Three programs, belonging to the benchmark Polybench were chosen. They are: gramschmidt (decomposition by Gram-Schmidt method), lu (LU decomposition) and durbin (system solution Toeplitz matrix). Each has different computational complexity. The effectiveness of automated decision can be verified by comparing the run times between Host, Device and Tools. The automated decision by the tool was determined that the Gram-Schmidt function execution on GPU when the order of the matrix was greater than or equal to 400. The difference between the observed order matrix 300 for Order 400 is calculated due to the difference between the estimated amount of arithmetic operations of the function correlation and Gram-Schmidt function. The effectiveness of the decision tool, which is based on the analysis of a benchmark is restricted to algorithms that have computational complexity in time similar to the benchmark. The differences in values of memory allocated by the benchmark and the parallelized program are due to parameters that are not easily measured, with for example the dependence between variables. Therefore it is recommended that the choice of the memory value used as a decision criterion is made through an iterative process, taking as initial parameter value obtained in the analysis of benchmark.
publishDate	2016
dc.date.issued.fl_str_mv	2016-03-28
dc.date.accessioned.fl_str_mv	2018-12-10T14:56:53Z
dc.date.available.fl_str_mv	2018-12-10T14:56:53Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://repositorio.ufsm.br/handle/1/15054
url	http://repositorio.ufsm.br/handle/1/15054
dc.language.iso.fl_str_mv	por
language	por
dc.relation.cnpq.fl_str_mv	100300000007
dc.relation.confidence.fl_str_mv	600
dc.relation.authority.fl_str_mv	432b3089-2a77-46f3-902e-7cc0f044f353 c5701758-f51b-47e9-8dcb-2d03feb3cee0 aec00059-8729-40bd-b989-bf07fcb3bdbd 23864a87-e1f3-449d-a401-4f38434bb738
dc.rights.driver.fl_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de Santa Maria Centro de Tecnologia
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv	UFSM
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	Ciência da Computação
publisher.none.fl_str_mv	Universidade Federal de Santa Maria Centro de Tecnologia
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações do UFSM instname:Universidade Federal de Santa Maria (UFSM) instacron:UFSM
instname_str	Universidade Federal de Santa Maria (UFSM)
instacron_str	UFSM
institution	UFSM
reponame_str	Biblioteca Digital de Teses e Dissertações do UFSM
collection	Biblioteca Digital de Teses e Dissertações do UFSM
bitstream.url.fl_str_mv	http://repositorio.ufsm.br/bitstream/1/15054/1/DIS_PPGCC_2016_FERRARI_RENATO.pdf http://repositorio.ufsm.br/bitstream/1/15054/2/license_rdf http://repositorio.ufsm.br/bitstream/1/15054/3/license.txt http://repositorio.ufsm.br/bitstream/1/15054/4/DIS_PPGCC_2016_FERRARI_RENATO.pdf.txt http://repositorio.ufsm.br/bitstream/1/15054/5/DIS_PPGCC_2016_FERRARI_RENATO.pdf.jpg
bitstream.checksum.fl_str_mv	3f2484eb4c71ba970d2dc7019b4c5164 4460e5956bc1d1639be9ae6146a50347 f8fcb28efb1c8cf0dc096bec902bf4c4 a1c38150712249b596f78471c88ac220 804e7a907cab1ac19bbb7916f2f0a081
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações do UFSM - Universidade Federal de Santa Maria (UFSM)
repository.mail.fl_str_mv	atendimento.sib@ufsm.br\|\|tedebc@gmail.com
_version_	1801485270897393664

Decisão automatizada de mapeamento de tarefas com OpenACC em arquiteturas paralelas híbridas

Registros relacionados