Boosting big data streaming applications in clouds with burstFlow

Detalhes bibliográficos
Autor(a) principal: Souza Junior, Paulo Ricardo Rodrigues de
Data de Publicação: 2020
Outros Autores: Matteussi, Kassiano José, Veith, Alexandre da Silva, Zanchetta, Breno Fanchiotti, Leithardt, Valderi Reis Quietinho, Murciego, Álvaro Lozano, Freitas, Edison Pignaton de, Anjos, Julio Cesar Santos dos, Geyer, Claudio Fernando Resin
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFRGS
Texto Completo: http://hdl.handle.net/10183/256835
Resumo: The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers' communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%.
id UFRGS-2_5524164f2769fb5b2b61440043330415
oai_identifier_str oai:www.lume.ufrgs.br:10183/256835
network_acronym_str UFRGS-2
network_name_str Repositório Institucional da UFRGS
repository_id_str
spelling Souza Junior, Paulo Ricardo Rodrigues deMatteussi, Kassiano JoséVeith, Alexandre da SilvaZanchetta, Breno FanchiottiLeithardt, Valderi Reis QuietinhoMurciego, Álvaro LozanoFreitas, Edison Pignaton deAnjos, Julio Cesar Santos dosGeyer, Claudio Fernando Resin2023-04-07T03:26:39Z20202169-3536http://hdl.handle.net/10183/256835001135243The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers' communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%.application/pdfengIEEE Access. [Piscataway, NJ]. Vol. 8 (2020), p. 219124 - 219136Processamento de dadosBig dataComputação em nuvemStream processing applicationsMulti cloudMicro-batchesData partitionBoosting big data streaming applications in clouds with burstFlowEstrangeiroinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001135243.pdf.txt001135243.pdf.txtExtracted Texttext/plain66453http://www.lume.ufrgs.br/bitstream/10183/256835/2/001135243.pdf.txtc31ef6e9d701248ed38e52adf28f5397MD52ORIGINAL001135243.pdfTexto completo (inglês)application/pdf1210794http://www.lume.ufrgs.br/bitstream/10183/256835/1/001135243.pdf08e0d2ce44a914edd6d9720a93fd5f68MD5110183/2568352023-04-08 03:29:52.890849oai:www.lume.ufrgs.br:10183/256835Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2023-04-08T06:29:52Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv Boosting big data streaming applications in clouds with burstFlow
title Boosting big data streaming applications in clouds with burstFlow
spellingShingle Boosting big data streaming applications in clouds with burstFlow
Souza Junior, Paulo Ricardo Rodrigues de
Processamento de dados
Big data
Computação em nuvem
Stream processing applications
Multi cloud
Micro-batches
Data partition
title_short Boosting big data streaming applications in clouds with burstFlow
title_full Boosting big data streaming applications in clouds with burstFlow
title_fullStr Boosting big data streaming applications in clouds with burstFlow
title_full_unstemmed Boosting big data streaming applications in clouds with burstFlow
title_sort Boosting big data streaming applications in clouds with burstFlow
author Souza Junior, Paulo Ricardo Rodrigues de
author_facet Souza Junior, Paulo Ricardo Rodrigues de
Matteussi, Kassiano José
Veith, Alexandre da Silva
Zanchetta, Breno Fanchiotti
Leithardt, Valderi Reis Quietinho
Murciego, Álvaro Lozano
Freitas, Edison Pignaton de
Anjos, Julio Cesar Santos dos
Geyer, Claudio Fernando Resin
author_role author
author2 Matteussi, Kassiano José
Veith, Alexandre da Silva
Zanchetta, Breno Fanchiotti
Leithardt, Valderi Reis Quietinho
Murciego, Álvaro Lozano
Freitas, Edison Pignaton de
Anjos, Julio Cesar Santos dos
Geyer, Claudio Fernando Resin
author2_role author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Souza Junior, Paulo Ricardo Rodrigues de
Matteussi, Kassiano José
Veith, Alexandre da Silva
Zanchetta, Breno Fanchiotti
Leithardt, Valderi Reis Quietinho
Murciego, Álvaro Lozano
Freitas, Edison Pignaton de
Anjos, Julio Cesar Santos dos
Geyer, Claudio Fernando Resin
dc.subject.por.fl_str_mv Processamento de dados
Big data
Computação em nuvem
topic Processamento de dados
Big data
Computação em nuvem
Stream processing applications
Multi cloud
Micro-batches
Data partition
dc.subject.eng.fl_str_mv Stream processing applications
Multi cloud
Micro-batches
Data partition
description The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers' communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%.
publishDate 2020
dc.date.issued.fl_str_mv 2020
dc.date.accessioned.fl_str_mv 2023-04-07T03:26:39Z
dc.type.driver.fl_str_mv Estrangeiro
info:eu-repo/semantics/article
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10183/256835
dc.identifier.issn.pt_BR.fl_str_mv 2169-3536
dc.identifier.nrb.pt_BR.fl_str_mv 001135243
identifier_str_mv 2169-3536
001135243
url http://hdl.handle.net/10183/256835
dc.language.iso.fl_str_mv eng
language eng
dc.relation.ispartof.pt_BR.fl_str_mv IEEE Access. [Piscataway, NJ]. Vol. 8 (2020), p. 219124 - 219136
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFRGS
instname:Universidade Federal do Rio Grande do Sul (UFRGS)
instacron:UFRGS
instname_str Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str UFRGS
institution UFRGS
reponame_str Repositório Institucional da UFRGS
collection Repositório Institucional da UFRGS
bitstream.url.fl_str_mv http://www.lume.ufrgs.br/bitstream/10183/256835/2/001135243.pdf.txt
http://www.lume.ufrgs.br/bitstream/10183/256835/1/001135243.pdf
bitstream.checksum.fl_str_mv c31ef6e9d701248ed38e52adf28f5397
08e0d2ce44a914edd6d9720a93fd5f68
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv
_version_ 1801225085574447104