A data driven dispatcher for big data applications in heterogeneous systems

Souza Junior, Paulo Ricardo Rodrigues de

A data driven dispatcher for big data applications in heterogeneous systems

Detalhes bibliográficos
Autor(a) principal:	Souza Junior, Paulo Ricardo Rodrigues de
Data de Publicação:	2018
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da UFRGS
Texto Completo:	http://hdl.handle.net/10183/187882
Resumo:	Mankind is increasing technology capacity every day, as it is taking place in multiple areas like automation, predicting, making actions, and so on. In this process, data is produced in different ratios and quantities, and from a close point of view the data production of a single sensor is not much and does not provide clear insights. However, a global vision and the union of that information may contain helpful knowledge about business intelligence, people and sensor behavior. The global view of all this data is called Big Data and may achieve overwhelming amounts of data, which is being produced in outstanding rates by devices and people. Therefore, it is necessary to provide solutions to manage Big Data systems, which give robustness and quality of service. In order to achieve robust systems to process high amounts of data, Big Data frameworks are proposed and deployed using several management tools. Furthermore, Big Data frameworks are usually separated in different perspectives of processing (i.e., batch and stream processing), and focuses on processing balanced data in homogeneous environments. Stream and Batch Processing Engines have to support high data ingestion to ensure the quality and efficiency for the end-user or a system administrator. The data flow processed by SPE fluctuates over time and requires real-time or near real-time resource pool adjustments (network, memory, CPU and other). This scenario leads to the problem known as skewed data production caused by the non-uniform incoming flow at specific points on the environment, resulting in slow down of applications produced by network bottlenecks and inefficient load balance. The current proposal of this thesis is the Aten a data-driven dispatcher as a solution to overcome unbalanced data flows processed by Big Data Stream applications in heterogeneous systems. Aten manages data aggregation and data streams within message queues, assuming different algorithms as strategies to partition data flow over all the available computational resources. The thesis presents results indicating that is possible to maximize the throughput and also provide low latency levels for SPEs.

Metadados do item

id	URGS_8a7bd92d24641d2b78c0d18e5cc8ee87
oai_identifier_str	oai:www.lume.ufrgs.br:10183/187882
network_acronym_str	URGS
network_name_str	Biblioteca Digital de Teses e Dissertações da UFRGS
repository_id_str	1853
spelling	Souza Junior, Paulo Ricardo Rodrigues deGeyer, Claudio Fernando Resin2019-01-18T02:31:31Z2018http://hdl.handle.net/10183/187882001084082Mankind is increasing technology capacity every day, as it is taking place in multiple areas like automation, predicting, making actions, and so on. In this process, data is produced in different ratios and quantities, and from a close point of view the data production of a single sensor is not much and does not provide clear insights. However, a global vision and the union of that information may contain helpful knowledge about business intelligence, people and sensor behavior. The global view of all this data is called Big Data and may achieve overwhelming amounts of data, which is being produced in outstanding rates by devices and people. Therefore, it is necessary to provide solutions to manage Big Data systems, which give robustness and quality of service. In order to achieve robust systems to process high amounts of data, Big Data frameworks are proposed and deployed using several management tools. Furthermore, Big Data frameworks are usually separated in different perspectives of processing (i.e., batch and stream processing), and focuses on processing balanced data in homogeneous environments. Stream and Batch Processing Engines have to support high data ingestion to ensure the quality and efficiency for the end-user or a system administrator. The data flow processed by SPE fluctuates over time and requires real-time or near real-time resource pool adjustments (network, memory, CPU and other). This scenario leads to the problem known as skewed data production caused by the non-uniform incoming flow at specific points on the environment, resulting in slow down of applications produced by network bottlenecks and inefficient load balance. The current proposal of this thesis is the Aten a data-driven dispatcher as a solution to overcome unbalanced data flows processed by Big Data Stream applications in heterogeneous systems. Aten manages data aggregation and data streams within message queues, assuming different algorithms as strategies to partition data flow over all the available computational resources. The thesis presents results indicating that is possible to maximize the throughput and also provide low latency levels for SPEs.application/pdfengBig dataProcessamento de dadosBig dataCommunication optimizationData-stream partitionLoad balanceA data driven dispatcher for big data applications in heterogeneous systemsUm dispatcher acionado por dados de aplicações de big data em sistemas heterogêneos info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPrograma de Pós-Graduação em ComputaçãoPorto Alegre, BR-RS2018mestradoinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001084082.pdf.txt001084082.pdf.txtExtracted Texttext/plain212719http://www.lume.ufrgs.br/bitstream/10183/187882/2/001084082.pdf.txt040c72e08f3a3a6a46967a54fb0ba753MD52ORIGINAL001084082.pdfTexto completo (inglês)application/pdf2624623http://www.lume.ufrgs.br/bitstream/10183/187882/1/001084082.pdfcc76b052b91995dbf99b93a0cd369b2dMD5110183/1878822021-05-26 04:35:38.911804oai:www.lume.ufrgs.br:10183/187882Biblioteca Digital de Teses e Dissertaçõeshttps://lume.ufrgs.br/handle/10183/2PUBhttps://lume.ufrgs.br/oai/requestlume@ufrgs.br\|\|lume@ufrgs.bropendoar:18532021-05-26T07:35:38Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv	A data driven dispatcher for big data applications in heterogeneous systems
dc.title.alternative.pt.fl_str_mv	Um dispatcher acionado por dados de aplicações de big data em sistemas heterogêneos
title	A data driven dispatcher for big data applications in heterogeneous systems
spellingShingle	A data driven dispatcher for big data applications in heterogeneous systems Souza Junior, Paulo Ricardo Rodrigues de Big data Processamento de dados Big data Communication optimization Data-stream partition Load balance
title_short	A data driven dispatcher for big data applications in heterogeneous systems
title_full	A data driven dispatcher for big data applications in heterogeneous systems
title_fullStr	A data driven dispatcher for big data applications in heterogeneous systems
title_full_unstemmed	A data driven dispatcher for big data applications in heterogeneous systems
title_sort	A data driven dispatcher for big data applications in heterogeneous systems
author	Souza Junior, Paulo Ricardo Rodrigues de
author_facet	Souza Junior, Paulo Ricardo Rodrigues de
author_role	author
dc.contributor.author.fl_str_mv	Souza Junior, Paulo Ricardo Rodrigues de
dc.contributor.advisor1.fl_str_mv	Geyer, Claudio Fernando Resin
contributor_str_mv	Geyer, Claudio Fernando Resin
dc.subject.por.fl_str_mv	Big data Processamento de dados
topic	Big data Processamento de dados Big data Communication optimization Data-stream partition Load balance
dc.subject.eng.fl_str_mv	Big data Communication optimization Data-stream partition Load balance
description	Mankind is increasing technology capacity every day, as it is taking place in multiple areas like automation, predicting, making actions, and so on. In this process, data is produced in different ratios and quantities, and from a close point of view the data production of a single sensor is not much and does not provide clear insights. However, a global vision and the union of that information may contain helpful knowledge about business intelligence, people and sensor behavior. The global view of all this data is called Big Data and may achieve overwhelming amounts of data, which is being produced in outstanding rates by devices and people. Therefore, it is necessary to provide solutions to manage Big Data systems, which give robustness and quality of service. In order to achieve robust systems to process high amounts of data, Big Data frameworks are proposed and deployed using several management tools. Furthermore, Big Data frameworks are usually separated in different perspectives of processing (i.e., batch and stream processing), and focuses on processing balanced data in homogeneous environments. Stream and Batch Processing Engines have to support high data ingestion to ensure the quality and efficiency for the end-user or a system administrator. The data flow processed by SPE fluctuates over time and requires real-time or near real-time resource pool adjustments (network, memory, CPU and other). This scenario leads to the problem known as skewed data production caused by the non-uniform incoming flow at specific points on the environment, resulting in slow down of applications produced by network bottlenecks and inefficient load balance. The current proposal of this thesis is the Aten a data-driven dispatcher as a solution to overcome unbalanced data flows processed by Big Data Stream applications in heterogeneous systems. Aten manages data aggregation and data streams within message queues, assuming different algorithms as strategies to partition data flow over all the available computational resources. The thesis presents results indicating that is possible to maximize the throughput and also provide low latency levels for SPEs.
publishDate	2018
dc.date.issued.fl_str_mv	2018
dc.date.accessioned.fl_str_mv	2019-01-18T02:31:31Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10183/187882
dc.identifier.nrb.pt_BR.fl_str_mv	001084082
url	http://hdl.handle.net/10183/187882
identifier_str_mv	001084082
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS
instname_str	Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str	UFRGS
institution	UFRGS
reponame_str	Biblioteca Digital de Teses e Dissertações da UFRGS
collection	Biblioteca Digital de Teses e Dissertações da UFRGS
bitstream.url.fl_str_mv	http://www.lume.ufrgs.br/bitstream/10183/187882/2/001084082.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/187882/1/001084082.pdf
bitstream.checksum.fl_str_mv	040c72e08f3a3a6a46967a54fb0ba753 cc76b052b91995dbf99b93a0cd369b2d
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv	lume@ufrgs.br\|\|lume@ufrgs.br
_version_	1800309136677666816

A data driven dispatcher for big data applications in heterogeneous systems

Registros relacionados