Boosting big data streaming applications in clouds with burstFlow
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | , , , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFRGS |
Texto Completo: | http://hdl.handle.net/10183/256835 |
Resumo: | The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers' communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%. |
id |
UFRGS-2_5524164f2769fb5b2b61440043330415 |
---|---|
oai_identifier_str |
oai:www.lume.ufrgs.br:10183/256835 |
network_acronym_str |
UFRGS-2 |
network_name_str |
Repositório Institucional da UFRGS |
repository_id_str |
|
spelling |
Souza Junior, Paulo Ricardo Rodrigues deMatteussi, Kassiano JoséVeith, Alexandre da SilvaZanchetta, Breno FanchiottiLeithardt, Valderi Reis QuietinhoMurciego, Álvaro LozanoFreitas, Edison Pignaton deAnjos, Julio Cesar Santos dosGeyer, Claudio Fernando Resin2023-04-07T03:26:39Z20202169-3536http://hdl.handle.net/10183/256835001135243The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers' communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%.application/pdfengIEEE Access. [Piscataway, NJ]. Vol. 8 (2020), p. 219124 - 219136Processamento de dadosBig dataComputação em nuvemStream processing applicationsMulti cloudMicro-batchesData partitionBoosting big data streaming applications in clouds with burstFlowEstrangeiroinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001135243.pdf.txt001135243.pdf.txtExtracted Texttext/plain66453http://www.lume.ufrgs.br/bitstream/10183/256835/2/001135243.pdf.txtc31ef6e9d701248ed38e52adf28f5397MD52ORIGINAL001135243.pdfTexto completo (inglês)application/pdf1210794http://www.lume.ufrgs.br/bitstream/10183/256835/1/001135243.pdf08e0d2ce44a914edd6d9720a93fd5f68MD5110183/2568352023-04-08 03:29:52.890849oai:www.lume.ufrgs.br:10183/256835Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2023-04-08T06:29:52Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false |
dc.title.pt_BR.fl_str_mv |
Boosting big data streaming applications in clouds with burstFlow |
title |
Boosting big data streaming applications in clouds with burstFlow |
spellingShingle |
Boosting big data streaming applications in clouds with burstFlow Souza Junior, Paulo Ricardo Rodrigues de Processamento de dados Big data Computação em nuvem Stream processing applications Multi cloud Micro-batches Data partition |
title_short |
Boosting big data streaming applications in clouds with burstFlow |
title_full |
Boosting big data streaming applications in clouds with burstFlow |
title_fullStr |
Boosting big data streaming applications in clouds with burstFlow |
title_full_unstemmed |
Boosting big data streaming applications in clouds with burstFlow |
title_sort |
Boosting big data streaming applications in clouds with burstFlow |
author |
Souza Junior, Paulo Ricardo Rodrigues de |
author_facet |
Souza Junior, Paulo Ricardo Rodrigues de Matteussi, Kassiano José Veith, Alexandre da Silva Zanchetta, Breno Fanchiotti Leithardt, Valderi Reis Quietinho Murciego, Álvaro Lozano Freitas, Edison Pignaton de Anjos, Julio Cesar Santos dos Geyer, Claudio Fernando Resin |
author_role |
author |
author2 |
Matteussi, Kassiano José Veith, Alexandre da Silva Zanchetta, Breno Fanchiotti Leithardt, Valderi Reis Quietinho Murciego, Álvaro Lozano Freitas, Edison Pignaton de Anjos, Julio Cesar Santos dos Geyer, Claudio Fernando Resin |
author2_role |
author author author author author author author author |
dc.contributor.author.fl_str_mv |
Souza Junior, Paulo Ricardo Rodrigues de Matteussi, Kassiano José Veith, Alexandre da Silva Zanchetta, Breno Fanchiotti Leithardt, Valderi Reis Quietinho Murciego, Álvaro Lozano Freitas, Edison Pignaton de Anjos, Julio Cesar Santos dos Geyer, Claudio Fernando Resin |
dc.subject.por.fl_str_mv |
Processamento de dados Big data Computação em nuvem |
topic |
Processamento de dados Big data Computação em nuvem Stream processing applications Multi cloud Micro-batches Data partition |
dc.subject.eng.fl_str_mv |
Stream processing applications Multi cloud Micro-batches Data partition |
description |
The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers' communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%. |
publishDate |
2020 |
dc.date.issued.fl_str_mv |
2020 |
dc.date.accessioned.fl_str_mv |
2023-04-07T03:26:39Z |
dc.type.driver.fl_str_mv |
Estrangeiro info:eu-repo/semantics/article |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10183/256835 |
dc.identifier.issn.pt_BR.fl_str_mv |
2169-3536 |
dc.identifier.nrb.pt_BR.fl_str_mv |
001135243 |
identifier_str_mv |
2169-3536 001135243 |
url |
http://hdl.handle.net/10183/256835 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.ispartof.pt_BR.fl_str_mv |
IEEE Access. [Piscataway, NJ]. Vol. 8 (2020), p. 219124 - 219136 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS |
instname_str |
Universidade Federal do Rio Grande do Sul (UFRGS) |
instacron_str |
UFRGS |
institution |
UFRGS |
reponame_str |
Repositório Institucional da UFRGS |
collection |
Repositório Institucional da UFRGS |
bitstream.url.fl_str_mv |
http://www.lume.ufrgs.br/bitstream/10183/256835/2/001135243.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/256835/1/001135243.pdf |
bitstream.checksum.fl_str_mv |
c31ef6e9d701248ed38e52adf28f5397 08e0d2ce44a914edd6d9720a93fd5f68 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS) |
repository.mail.fl_str_mv |
|
_version_ |
1801225085574447104 |