Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da UERJ |
Texto Completo: | http://www.bdtd.uerj.br/handle/1/20486 |
Resumo: | Areas of study like data science, machine learning or scientific computing are promising areas that are currently receiving a lot of investment. These areas are usually very complex and demand optimal usage of computing resources for better performance. In this context, Julia language was born. A dynamic language that offers a high performance environment with a user friendly syntax. Although its design is focused on performance, shared memory parallel computing still has some features under development and the studies in this area are so far scarce. In this work, we propose a detailed performance study of the shared memory parallel computing mechanisms of Julia. It was analyzed the performance of the Multithreading and SIMD mechanisms. In the analysis of Multithreading, we compare the data and task parallelism strategies available through the language built-in macros @threads and @spawn, focusing on the way they distribute the loop iterations. Furthermore, the loop scheduling mechanisms available for the Julia version used in this work were analyzed, which are the one static scheduling provided by @threads and the ones provided by the package FLoops.jl, and it was observed their performance behavior when the environment scales. In the analysis of the SIMD mechanisms, the compiler auto-vectorization was compared to the built-in construction @simd and two packages for vectorization. Our experiments were run with synthetic kernels, benchmark applications and a real-world optimization framework. Our results show that the macro @spawn presented a better performance on unbalanced loads and the different loop schedulers offered by FLoops.jl help improving the performance of applications with load imbalance. However, we found that applications that are commonly found in real world scenarios are susceptible to overhead and loss of performance as the nature of the problem influences on code implementation, being more noticeable when @spawn was used or if the environment could scale with threads. For the SIMD mechanisms, we showed that the package LoopVectorization.jl provided the best performance results with low programming effort. |
id |
UERJ_532ee082020956759e4e6ecac7d73f34 |
---|---|
oai_identifier_str |
oai:www.bdtd.uerj.br:1/20486 |
network_acronym_str |
UERJ |
network_name_str |
Biblioteca Digital de Teses e Dissertações da UERJ |
repository_id_str |
2903 |
spelling |
Bentes, Cristiana BarbosaSena, AlexandreHoffimann, JúlioPessoa, Tiago CarneiroBarros, Diana Almeidadiana.ab@live.com2023-10-20T14:57:22Z2023-04-26BARROS, Diana Almeida. Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language. 2023. 93 f. Dissertação( Mestrado em Ciências Computacionais) - Instituto de Matemática e Estatística, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, 2023.http://www.bdtd.uerj.br/handle/1/20486Areas of study like data science, machine learning or scientific computing are promising areas that are currently receiving a lot of investment. These areas are usually very complex and demand optimal usage of computing resources for better performance. In this context, Julia language was born. A dynamic language that offers a high performance environment with a user friendly syntax. Although its design is focused on performance, shared memory parallel computing still has some features under development and the studies in this area are so far scarce. In this work, we propose a detailed performance study of the shared memory parallel computing mechanisms of Julia. It was analyzed the performance of the Multithreading and SIMD mechanisms. In the analysis of Multithreading, we compare the data and task parallelism strategies available through the language built-in macros @threads and @spawn, focusing on the way they distribute the loop iterations. Furthermore, the loop scheduling mechanisms available for the Julia version used in this work were analyzed, which are the one static scheduling provided by @threads and the ones provided by the package FLoops.jl, and it was observed their performance behavior when the environment scales. In the analysis of the SIMD mechanisms, the compiler auto-vectorization was compared to the built-in construction @simd and two packages for vectorization. Our experiments were run with synthetic kernels, benchmark applications and a real-world optimization framework. Our results show that the macro @spawn presented a better performance on unbalanced loads and the different loop schedulers offered by FLoops.jl help improving the performance of applications with load imbalance. However, we found that applications that are commonly found in real world scenarios are susceptible to overhead and loss of performance as the nature of the problem influences on code implementation, being more noticeable when @spawn was used or if the environment could scale with threads. For the SIMD mechanisms, we showed that the package LoopVectorization.jl provided the best performance results with low programming effort.Áreas de estudo como ciência de dados, aprendizado de máquina ou computação científica são áreas promissoras que estão recebendo muitos investimentos atualmente. Essas áreas geralmente são muito complexas e exigem um bom uso dos recursos de computação para um melhor desempenho. Nesse contexto, nasceu a linguagem Julia. Uma linguagem dinâmica que oferece um ambiente de alto desempenho com uma sintaxe amigável. Embora seu design seja voltado para a performance, a computação paralela com memória compartilhada ainda possui alguns recursos em desenvolvimento e os estudos nesta área até o momento são escassos. Neste trabalho, apresentamos um estudo do desempenho dos mecanismos de computação paralela de memória compartilhada da linguagem de programação Julia. Foram analisados o desempenho dos mecanismos Multithreading e SIMD. Na análise do Multithreading, comparamos as estratégias de paralelismo de dados e de tarefas disponíveis através das macros built-in @threads e @spawn, focando na forma como distribuem as iterações do loop. Além do mais, foram analisados os mecanismos de escalonamento de loops disponíveis na versão de Julia utilizada neste trabalho, que são o próprio escalonamento estático da macro @threads e os escalonamentos do pacote FLoops.jl, e foi observado o comportamento da performance de tais mecanismos num ambiente escalável. Na análise dos mecanismos SIMD, comparamos a autovetorização do compilador com a construção built-in @simd e dois pacotes para vetorização. Executamos nossos experimentos com kernels sintéticos, aplicações de benchmarks e em um framework de otimização do mundo real. Nossos resultados mostram que a macro @spawn apresentou melhor desempenho em cargas desbalanceadas e os diferentes tipos de escalonamento de loops oferecidos pelo FLoops.jl ajudam a melhorar a performance das aplicações com desbalanceamento de carga. Contudo, aplicações comuns em cenários reais se mostraram mais suscetíveis a overhead e perda de desempenho justamente pela natureza do problema influenciar na forma em que o código é implementado, sendo mais notáveis quando @spawn é utilizado ou quando o ambiente escala em número de threads. Para os mecanismos SIMD, mostramos que o pacote LoopVectorization.jl proporcionou os melhores resultados de desempenho com baixo esforço de programação.Submitted by Bárbara CTC/A (babalusotnas@gmail.com) on 2023-10-20T14:57:22Z No. of bitstreams: 1 Dissertação - Diana Almeida Barros - 2023 - Completa.pdf: 1816811 bytes, checksum: fd019b5947968a6e1691beb337165f66 (MD5)Made available in DSpace on 2023-10-20T14:57:22Z (GMT). No. of bitstreams: 1 Dissertação - Diana Almeida Barros - 2023 - Completa.pdf: 1816811 bytes, checksum: fd019b5947968a6e1691beb337165f66 (MD5) Previous issue date: 2023-04-26application/pdfengUniversidade do Estado do Rio de JaneiroPrograma de Pós-Graduação em Ciências ComputacionaisUERJBrasilCentro de Tecnologia e Ciências::Instituto de Matemática e EstatísticaShared MemoryMultithreadingLoop SchedulingJulia LanguageJulia (Linguagem de programação de computador)Linguagem JuliaMemória CompartilhadaMultithreadingSIMDEscalonamento de LoopsCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOEvaluating Shared Memory Parallel Computing Mechanisms of the Julia LanguageAvaliando os mecanismos de computação paralela com memória compartilhada da linguagem Julia.info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UERJinstname:Universidade do Estado do Rio de Janeiro (UERJ)instacron:UERJORIGINALDissertação - Diana Almeida Barros - 2023 - Completa.pdfDissertação - Diana Almeida Barros - 2023 - Completa.pdfapplication/pdf1816811http://www.bdtd.uerj.br/bitstream/1/20486/2/Disserta%C3%A7%C3%A3o+-+Diana+Almeida+Barros+-+2023+-+Completa.pdffd019b5947968a6e1691beb337165f66MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82123http://www.bdtd.uerj.br/bitstream/1/20486/1/license.txte5502652da718045d7fcd832b79fca29MD511/204862024-02-27 14:34:49.309oai:www.bdtd.uerj.br:1/20486Tk9UQTogTElDRU7Dh0EgUkVERSBTSVJJVVMKRXN0YSBsaWNlbsOnYSBkZSBleGVtcGxvIMOpIGZvcm5lY2lkYSBhcGVuYXMgcGFyYSBmaW5zIGluZm9ybWF0aXZvcy4KCkxJQ0VOw4dBIERFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBCgpDb20gYSBhcHJlc2VudGHDp8OjbyBkZXN0YSBsaWNlbsOnYSwgdm9jw6ogKG8gYXV0b3IgKGVzKSBvdSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGRlIGF1dG9yKSBjb25jZWRlIMOgIFVuaXZlcnNpZGFkZSAKZG8gRXN0YWRvIGRvIFJpbyBkZSBKYW5laXJvIChVRVJKKSBvIGRpcmVpdG8gbsOjby1leGNsdXNpdm8gZGUgcmVwcm9kdXppciwgIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IApkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlIAplbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFVFUkogcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAKcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVFUkogcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGEgc3VhIHRlc2Ugb3UgCmRpc3NlcnRhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIApuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldSAKY29uaGVjaW1lbnRvLCBpbmZyaW5nZSBkaXJlaXRvcyBhdXRvcmFpcyBkZSBuaW5ndcOpbS4KCkNhc28gYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIGNvbnRlbmhhIG1hdGVyaWFsIHF1ZSB2b2PDqiBuw6NvIHBvc3N1aSBhIHRpdHVsYXJpZGFkZSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMsIHZvY8OqIApkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVUVSSiBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgCmlkZW50aWZpY2FkbyBlIHJlY29uaGVjaWRvIG5vIHRleHRvIG91IG5vIGNvbnRlw7pkbyBkYSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gb3JhIGRlcG9zaXRhZGEuCgpDQVNPIEEgVEVTRSBPVSBESVNTRVJUQcOHw4NPIE9SQSBERVBPU0lUQURBIFRFTkhBIFNJRE8gUkVTVUxUQURPIERFIFVNIFBBVFJPQ8ONTklPIE9VIApBUE9JTyBERSBVTUEgQUfDik5DSUEgREUgRk9NRU5UTyBPVSBPVVRSTyBPUkdBTklTTU8gUVVFIE7Dg08gU0VKQSBFU1RBClVOSVZFUlNJREFERSwgVk9Dw4ogREVDTEFSQSBRVUUgUkVTUEVJVE9VIFRPRE9TIEUgUVVBSVNRVUVSIERJUkVJVE9TIERFIFJFVklTw4NPIENPTU8gClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVbml2ZXJzaWRhZGUgZG8gRXN0YWRvIGRvIFJpbyBkZSBKYW5laXJvIChVRVJKKSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIChzKSBvdSBvKHMpIG5vbWUocykgZG8ocykgCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIApjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgo=Biblioteca Digital de Teses e Dissertaçõeshttp://www.bdtd.uerj.br/PUBhttps://www.bdtd.uerj.br:8443/oai/requestbdtd.suporte@uerj.bropendoar:29032024-02-27T17:34:49Biblioteca Digital de Teses e Dissertações da UERJ - Universidade do Estado do Rio de Janeiro (UERJ)false |
dc.title.por.fl_str_mv |
Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language |
dc.title.alternative.por.fl_str_mv |
Avaliando os mecanismos de computação paralela com memória compartilhada da linguagem Julia. |
title |
Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language |
spellingShingle |
Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language Barros, Diana Almeida Shared Memory Multithreading Loop Scheduling Julia Language Julia (Linguagem de programação de computador) Linguagem Julia Memória Compartilhada Multithreading SIMD Escalonamento de Loops CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
title_short |
Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language |
title_full |
Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language |
title_fullStr |
Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language |
title_full_unstemmed |
Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language |
title_sort |
Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language |
author |
Barros, Diana Almeida |
author_facet |
Barros, Diana Almeida diana.ab@live.com |
author_role |
author |
author2 |
diana.ab@live.com |
author2_role |
author |
dc.contributor.advisor1.fl_str_mv |
Bentes, Cristiana Barbosa |
dc.contributor.referee1.fl_str_mv |
Sena, Alexandre |
dc.contributor.referee2.fl_str_mv |
Hoffimann, Júlio |
dc.contributor.referee3.fl_str_mv |
Pessoa, Tiago Carneiro |
dc.contributor.author.fl_str_mv |
Barros, Diana Almeida diana.ab@live.com |
contributor_str_mv |
Bentes, Cristiana Barbosa Sena, Alexandre Hoffimann, Júlio Pessoa, Tiago Carneiro |
dc.subject.eng.fl_str_mv |
Shared Memory Multithreading Loop Scheduling |
topic |
Shared Memory Multithreading Loop Scheduling Julia Language Julia (Linguagem de programação de computador) Linguagem Julia Memória Compartilhada Multithreading SIMD Escalonamento de Loops CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
dc.subject.cat.fl_str_mv |
Julia Language |
dc.subject.por.fl_str_mv |
Julia (Linguagem de programação de computador) Linguagem Julia Memória Compartilhada Multithreading SIMD Escalonamento de Loops |
dc.subject.cnpq.fl_str_mv |
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
description |
Areas of study like data science, machine learning or scientific computing are promising areas that are currently receiving a lot of investment. These areas are usually very complex and demand optimal usage of computing resources for better performance. In this context, Julia language was born. A dynamic language that offers a high performance environment with a user friendly syntax. Although its design is focused on performance, shared memory parallel computing still has some features under development and the studies in this area are so far scarce. In this work, we propose a detailed performance study of the shared memory parallel computing mechanisms of Julia. It was analyzed the performance of the Multithreading and SIMD mechanisms. In the analysis of Multithreading, we compare the data and task parallelism strategies available through the language built-in macros @threads and @spawn, focusing on the way they distribute the loop iterations. Furthermore, the loop scheduling mechanisms available for the Julia version used in this work were analyzed, which are the one static scheduling provided by @threads and the ones provided by the package FLoops.jl, and it was observed their performance behavior when the environment scales. In the analysis of the SIMD mechanisms, the compiler auto-vectorization was compared to the built-in construction @simd and two packages for vectorization. Our experiments were run with synthetic kernels, benchmark applications and a real-world optimization framework. Our results show that the macro @spawn presented a better performance on unbalanced loads and the different loop schedulers offered by FLoops.jl help improving the performance of applications with load imbalance. However, we found that applications that are commonly found in real world scenarios are susceptible to overhead and loss of performance as the nature of the problem influences on code implementation, being more noticeable when @spawn was used or if the environment could scale with threads. For the SIMD mechanisms, we showed that the package LoopVectorization.jl provided the best performance results with low programming effort. |
publishDate |
2023 |
dc.date.accessioned.fl_str_mv |
2023-10-20T14:57:22Z |
dc.date.issued.fl_str_mv |
2023-04-26 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
BARROS, Diana Almeida. Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language. 2023. 93 f. Dissertação( Mestrado em Ciências Computacionais) - Instituto de Matemática e Estatística, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, 2023. |
dc.identifier.uri.fl_str_mv |
http://www.bdtd.uerj.br/handle/1/20486 |
identifier_str_mv |
BARROS, Diana Almeida. Evaluating Shared Memory Parallel Computing Mechanisms of the Julia Language. 2023. 93 f. Dissertação( Mestrado em Ciências Computacionais) - Instituto de Matemática e Estatística, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, 2023. |
url |
http://www.bdtd.uerj.br/handle/1/20486 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade do Estado do Rio de Janeiro |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Ciências Computacionais |
dc.publisher.initials.fl_str_mv |
UERJ |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
Centro de Tecnologia e Ciências::Instituto de Matemática e Estatística |
publisher.none.fl_str_mv |
Universidade do Estado do Rio de Janeiro |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da UERJ instname:Universidade do Estado do Rio de Janeiro (UERJ) instacron:UERJ |
instname_str |
Universidade do Estado do Rio de Janeiro (UERJ) |
instacron_str |
UERJ |
institution |
UERJ |
reponame_str |
Biblioteca Digital de Teses e Dissertações da UERJ |
collection |
Biblioteca Digital de Teses e Dissertações da UERJ |
bitstream.url.fl_str_mv |
http://www.bdtd.uerj.br/bitstream/1/20486/2/Disserta%C3%A7%C3%A3o+-+Diana+Almeida+Barros+-+2023+-+Completa.pdf http://www.bdtd.uerj.br/bitstream/1/20486/1/license.txt |
bitstream.checksum.fl_str_mv |
fd019b5947968a6e1691beb337165f66 e5502652da718045d7fcd832b79fca29 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da UERJ - Universidade do Estado do Rio de Janeiro (UERJ) |
repository.mail.fl_str_mv |
bdtd.suporte@uerj.br |
_version_ |
1811728741509890048 |