Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

Detalhes bibliográficos
Autor(a) principal: Matteussi, Kassiano José
Data de Publicação: 2022
Outros Autores: Anjos, Julio, LEITHARDT, VALDERI, Resin Geyer, Claudio Fernando
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.26/43560
Resumo: A significant rise in the adoption of streaming applications has changed the decisionmaking processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related inmemory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.
id RCAP_e7a1834ba5c972d102d7b53381c9b2fe
oai_identifier_str oai:comum.rcaap.pt:10400.26/43560
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelinesbackpressure;big data;spark streaming;stream processingA significant rise in the adoption of streaming applications has changed the decisionmaking processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related inmemory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.Repositório ComumMatteussi, Kassiano JoséAnjos, JulioLEITHARDT, VALDERIResin Geyer, Claudio Fernando2023-02-01T18:32:23Z2022-06-232022-06-28T08:44:32Z2022-06-23T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.26/43560engcv-prod-301454310.3390/s22134756info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-02-16T10:30:13Zoai:comum.rcaap.pt:10400.26/43560Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T16:46:42.299963Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines
title Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines
spellingShingle Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines
Matteussi, Kassiano José
backpressure;
big data;
spark streaming;
stream processing
title_short Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines
title_full Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines
title_fullStr Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines
title_full_unstemmed Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines
title_sort Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines
author Matteussi, Kassiano José
author_facet Matteussi, Kassiano José
Anjos, Julio
LEITHARDT, VALDERI
Resin Geyer, Claudio Fernando
author_role author
author2 Anjos, Julio
LEITHARDT, VALDERI
Resin Geyer, Claudio Fernando
author2_role author
author
author
dc.contributor.none.fl_str_mv Repositório Comum
dc.contributor.author.fl_str_mv Matteussi, Kassiano José
Anjos, Julio
LEITHARDT, VALDERI
Resin Geyer, Claudio Fernando
dc.subject.por.fl_str_mv backpressure;
big data;
spark streaming;
stream processing
topic backpressure;
big data;
spark streaming;
stream processing
description A significant rise in the adoption of streaming applications has changed the decisionmaking processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related inmemory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.
publishDate 2022
dc.date.none.fl_str_mv 2022-06-23
2022-06-28T08:44:32Z
2022-06-23T00:00:00Z
2023-02-01T18:32:23Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.26/43560
url http://hdl.handle.net/10400.26/43560
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv cv-prod-3014543
10.3390/s22134756
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799130938394804224