Upgrading a high performance computing environment for massive data processing
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFMG |
Texto Completo: | http://dx.doi.org/10.1186/s13174-019-0118-7 http://hdl.handle.net/1843/61947 http://orcid.org/0000-0002-1480-0039 https://orcid.org/0000-0003-0865-1417 |
Resumo: | High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction. |
id |
UFMG_5dce1d9527a579b0f488cd7c70f96e84 |
---|---|
oai_identifier_str |
oai:repositorio.ufmg.br:1843/61947 |
network_acronym_str |
UFMG |
network_name_str |
Repositório Institucional da UFMG |
repository_id_str |
|
spelling |
2023-12-12T20:27:30Z2023-12-12T20:27:30Z20191019http://dx.doi.org/10.1186/s13174-019-0118-71869-0238http://hdl.handle.net/1843/61947http://orcid.org/0000-0002-1480-0039https://orcid.org/0000-0003-0865-1417High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.A computação de alto desempenho (HPC) e o processamento massivo de dados (Big Data) são duas tendências que estão começando a convergir. Nesse processo, aspectos de arquiteturas de hardware, suporte de sistemas e paradigmas de programação estão sendo revisitados de ambas as perspectivas. Este artigo apresenta a nossa experiência neste caminho de convergência com a proposta de um quadro que aborda algumas das questões de programação derivadas dessa integração. Nossa contribuição é o desenvolvimento de um ambiente integrado que integre (i) COMPSs, um framework de programação para o desenvolvimento e execução de aplicações paralelas para infraestruturas distribuídas; (ii) Lemonade, ferramenta de mineração e análise de dados; e (iii) HDFS, o sistema de arquivos distribuídos mais utilizado para sistemas de Big Data. Para validar nossa estrutura, usamos Lemonade para criar aplicativos COMPSs que acessam dados por meio de HDFS e os comparamos com aplicativos equivalentes construídos com Spark, uma estrutura popular de Big Data. Os resultados mostram que a integração do HDFS beneficia os COMPSs ao simplificar o acesso aos dados e ao reorganizar a transferência de dados, reduzindo o tempo de execução. A integração com o Lemonade facilita o uso de COMPSs e pode ajudar na sua popularização na comunidade de Data Science, ao fornecer implementações eficientes de algoritmos para especialistas do domínio de dados que desejam desenvolver aplicações com maior nível de abstração.CNPq - Conselho Nacional de Desenvolvimento Científico e TecnológicoFAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas GeraisCAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorengUniversidade Federal de Minas GeraisUFMGBrasilICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃOJournal of Internet Services and ApplicationsProgramaçãoComputação de alto desempenhoBig dataProcessamento de dadosCOMPSsHigh-performance computingBig dataHDFSLemonadeUpgrading a high performance computing environment for massive data processingAtualizando um ambiente de computação de alto desempenho para processamento massivo de dadosinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://jisajournal.springeropen.com/articles/10.1186/s13174-019-0118-7Lucas Miguel Simões PonceWalter Dos SantosWagner Meira Jr.Dorgival GuedesDaniele LezziRosa M. Badiaapplication/pdfinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGLICENSELicense.txtLicense.txttext/plain; charset=utf-82042https://repositorio.ufmg.br/bitstream/1843/61947/1/License.txtfa505098d172de0bc8864fc1287ffe22MD51ORIGINALUpgrading a high performance computing environment for massive data processing.pdfUpgrading a high performance computing environment for massive data processing.pdfapplication/pdf20846488https://repositorio.ufmg.br/bitstream/1843/61947/2/Upgrading%20a%20high%20performance%20computing%20environment%20for%20massive%20data%20processing.pdfed2509be153e265a3864d337775a1309MD521843/619472023-12-12 17:27:30.646oai:repositorio.ufmg.br:1843/61947TElDRU7vv71BIERFIERJU1RSSUJVSe+/ve+/vU8gTu+/vU8tRVhDTFVTSVZBIERPIFJFUE9TSVTvv71SSU8gSU5TVElUVUNJT05BTCBEQSBVRk1HCiAKCkNvbSBhIGFwcmVzZW50Ye+/ve+/vW8gZGVzdGEgbGljZW7vv71hLCB2b2Pvv70gKG8gYXV0b3IgKGVzKSBvdSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGRlIGF1dG9yKSBjb25jZWRlIGFvIFJlcG9zaXTvv71yaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbu+/vW8gZXhjbHVzaXZvIGUgaXJyZXZvZ++/vXZlbCBkZSByZXByb2R1emlyIGUvb3UgZGlzdHJpYnVpciBhIHN1YSBwdWJsaWNh77+977+9byAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0cu+/vW5pY28gZSBlbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mg77+9dWRpbyBvdSB277+9ZGVvLgoKVm9j77+9IGRlY2xhcmEgcXVlIGNvbmhlY2UgYSBwb2zvv710aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2Pvv70gY29uY29yZGEgcXVlIG8gUmVwb3NpdO+/vXJpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250Ze+/vWRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNh77+977+9byBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHvv73vv71vLgoKVm9j77+9IHRhbWLvv71tIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTvv71yaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUgbWFudGVyIG1haXMgZGUgdW1hIGPvv71waWEgZGUgc3VhIHB1YmxpY2Hvv73vv71vIHBhcmEgZmlucyBkZSBzZWd1cmFu77+9YSwgYmFjay11cCBlIHByZXNlcnZh77+977+9by4KClZvY++/vSBkZWNsYXJhIHF1ZSBhIHN1YSBwdWJsaWNh77+977+9byDvv70gb3JpZ2luYWwgZSBxdWUgdm9j77+9IHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vu77+9YS4gVm9j77+9IHRhbWLvv71tIGRlY2xhcmEgcXVlIG8gZGVw77+9c2l0byBkZSBzdWEgcHVibGljYe+/ve+/vW8gbu+/vW8sIHF1ZSBzZWphIGRlIHNldSBjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd177+9bS4KCkNhc28gYSBzdWEgcHVibGljYe+/ve+/vW8gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY++/vSBu77+9byBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2Pvv70gZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc++/vW8gaXJyZXN0cml0YSBkbyBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgcGFyYSBjb25jZWRlciBhbyBSZXBvc2l077+9cmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7vv71hLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3Tvv70gY2xhcmFtZW50ZSBpZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdSBubyBjb250Ze+/vWRvIGRhIHB1YmxpY2Hvv73vv71vIG9yYSBkZXBvc2l0YWRhLgoKQ0FTTyBBIFBVQkxJQ0Hvv73vv71PIE9SQSBERVBPU0lUQURBIFRFTkhBIFNJRE8gUkVTVUxUQURPIERFIFVNIFBBVFJPQ++/vU5JTyBPVSBBUE9JTyBERSBVTUEgQUfvv71OQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0Pvv70gREVDTEFSQSBRVUUgUkVTUEVJVE9VIFRPRE9TIEUgUVVBSVNRVUVSIERJUkVJVE9TIERFIFJFVklT77+9TyBDT01PIFRBTULvv71NIEFTIERFTUFJUyBPQlJJR0Hvv73vv71FUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKTyBSZXBvc2l077+9cmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNh77+977+9bywgZSBu77+9byBmYXLvv70gcXVhbHF1ZXIgYWx0ZXJh77+977+9bywgYWzvv71tIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7vv71hLgo=Repositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2023-12-12T20:27:30Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
dc.title.pt_BR.fl_str_mv |
Upgrading a high performance computing environment for massive data processing |
dc.title.alternative.pt_BR.fl_str_mv |
Atualizando um ambiente de computação de alto desempenho para processamento massivo de dados |
title |
Upgrading a high performance computing environment for massive data processing |
spellingShingle |
Upgrading a high performance computing environment for massive data processing Lucas Miguel Simões Ponce COMPSs High-performance computing Big data HDFS Lemonade Programação Computação de alto desempenho Big data Processamento de dados |
title_short |
Upgrading a high performance computing environment for massive data processing |
title_full |
Upgrading a high performance computing environment for massive data processing |
title_fullStr |
Upgrading a high performance computing environment for massive data processing |
title_full_unstemmed |
Upgrading a high performance computing environment for massive data processing |
title_sort |
Upgrading a high performance computing environment for massive data processing |
author |
Lucas Miguel Simões Ponce |
author_facet |
Lucas Miguel Simões Ponce Walter Dos Santos Wagner Meira Jr. Dorgival Guedes Daniele Lezzi Rosa M. Badia |
author_role |
author |
author2 |
Walter Dos Santos Wagner Meira Jr. Dorgival Guedes Daniele Lezzi Rosa M. Badia |
author2_role |
author author author author author |
dc.contributor.author.fl_str_mv |
Lucas Miguel Simões Ponce Walter Dos Santos Wagner Meira Jr. Dorgival Guedes Daniele Lezzi Rosa M. Badia |
dc.subject.por.fl_str_mv |
COMPSs High-performance computing Big data HDFS Lemonade |
topic |
COMPSs High-performance computing Big data HDFS Lemonade Programação Computação de alto desempenho Big data Processamento de dados |
dc.subject.other.pt_BR.fl_str_mv |
Programação Computação de alto desempenho Big data Processamento de dados |
description |
High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction. |
publishDate |
2019 |
dc.date.issued.fl_str_mv |
2019 |
dc.date.accessioned.fl_str_mv |
2023-12-12T20:27:30Z |
dc.date.available.fl_str_mv |
2023-12-12T20:27:30Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1843/61947 |
dc.identifier.doi.pt_BR.fl_str_mv |
http://dx.doi.org/10.1186/s13174-019-0118-7 |
dc.identifier.issn.pt_BR.fl_str_mv |
1869-0238 |
dc.identifier.orcid.pt_BR.fl_str_mv |
http://orcid.org/0000-0002-1480-0039 https://orcid.org/0000-0003-0865-1417 |
url |
http://dx.doi.org/10.1186/s13174-019-0118-7 http://hdl.handle.net/1843/61947 http://orcid.org/0000-0002-1480-0039 https://orcid.org/0000-0003-0865-1417 |
identifier_str_mv |
1869-0238 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.ispartof.pt_BR.fl_str_mv |
Journal of Internet Services and Applications |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
dc.publisher.initials.fl_str_mv |
UFMG |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO |
publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
instname_str |
Universidade Federal de Minas Gerais (UFMG) |
instacron_str |
UFMG |
institution |
UFMG |
reponame_str |
Repositório Institucional da UFMG |
collection |
Repositório Institucional da UFMG |
bitstream.url.fl_str_mv |
https://repositorio.ufmg.br/bitstream/1843/61947/1/License.txt https://repositorio.ufmg.br/bitstream/1843/61947/2/Upgrading%20a%20high%20performance%20computing%20environment%20for%20massive%20data%20processing.pdf |
bitstream.checksum.fl_str_mv |
fa505098d172de0bc8864fc1287ffe22 ed2509be153e265a3864d337775a1309 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
repository.mail.fl_str_mv |
|
_version_ |
1803589414276825088 |