Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.26/43560 |
Resumo: | A significant rise in the adoption of streaming applications has changed the decisionmaking processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related inmemory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions. |
id |
RCAP_e7a1834ba5c972d102d7b53381c9b2fe |
---|---|
oai_identifier_str |
oai:comum.rcaap.pt:10400.26/43560 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelinesbackpressure;big data;spark streaming;stream processingA significant rise in the adoption of streaming applications has changed the decisionmaking processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related inmemory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.Repositório ComumMatteussi, Kassiano JoséAnjos, JulioLEITHARDT, VALDERIResin Geyer, Claudio Fernando2023-02-01T18:32:23Z2022-06-232022-06-28T08:44:32Z2022-06-23T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.26/43560engcv-prod-301454310.3390/s22134756info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-02-16T10:30:13Zoai:comum.rcaap.pt:10400.26/43560Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T16:46:42.299963Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines |
title |
Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines |
spellingShingle |
Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines Matteussi, Kassiano José backpressure; big data; spark streaming; stream processing |
title_short |
Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines |
title_full |
Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines |
title_fullStr |
Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines |
title_full_unstemmed |
Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines |
title_sort |
Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines |
author |
Matteussi, Kassiano José |
author_facet |
Matteussi, Kassiano José Anjos, Julio LEITHARDT, VALDERI Resin Geyer, Claudio Fernando |
author_role |
author |
author2 |
Anjos, Julio LEITHARDT, VALDERI Resin Geyer, Claudio Fernando |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Repositório Comum |
dc.contributor.author.fl_str_mv |
Matteussi, Kassiano José Anjos, Julio LEITHARDT, VALDERI Resin Geyer, Claudio Fernando |
dc.subject.por.fl_str_mv |
backpressure; big data; spark streaming; stream processing |
topic |
backpressure; big data; spark streaming; stream processing |
description |
A significant rise in the adoption of streaming applications has changed the decisionmaking processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related inmemory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-06-23 2022-06-28T08:44:32Z 2022-06-23T00:00:00Z 2023-02-01T18:32:23Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.26/43560 |
url |
http://hdl.handle.net/10400.26/43560 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
cv-prod-3014543 10.3390/s22134756 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799130938394804224 |