Augmenting data warehousing architectures with hadoop

Detalhes bibliográficos
Autor(a) principal: Dias, Henrique José Rosa
Data de Publicação: 2018
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/28933
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies Management
id RCAP_5a511c344db433117c65321256280fb0
oai_identifier_str oai:run.unl.pt:10362/28933
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str
spelling Augmenting data warehousing architectures with hadoopBig DataHadoopHiveTezData WarehousingETLTelevision Audience MeasurementsDissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementAs the volume of available data increases exponentially, traditional data warehouses struggle to transform this data into actionable knowledge. Data strategies that include the creation and maintenance of data warehouses have a lot to gain by incorporating technologies from the Big Data’s spectrum. Hadoop, as a transformation tool, can add a theoretical infinite dimension of data processing, feeding transformed information into traditional data warehouses that ultimately will retain their value as central components in organizations’ decision support systems. This study explores the potentialities of Hadoop as a data transformation tool in the setting of a traditional data warehouse environment. Hadoop’s execution model, which is oriented for distributed parallel processing, offers great capabilities when the amounts of data to be processed require the infrastructure to expand. Horizontal scalability, which is a key aspect in a Hadoop cluster, will allow for proportional growth in processing power as the volume of data increases. Through the use of a Hive on Tez, in a Hadoop cluster, this study transforms television viewing events, extracted from Ericsson’s Mediaroom Internet Protocol Television infrastructure, into pertinent audience metrics, like Rating, Reach and Share. These measurements are then made available in a traditional data warehouse, supported by a traditional Relational Database Management System, where they are presented through a set of reports. The main contribution of this research is a proposed augmented data warehouse architecture where the traditional ETL layer is replaced by a Hadoop cluster, running Hive on Tez, with the purpose of performing the heaviest transformations that convert raw data into actionable information. Through a typification of the SQL statements, responsible for the data transformation processes, we were able to understand that Hadoop, and its distributed processing model, delivers outstanding performance results associated with the analytical layer, namely in the aggregation of large data sets. Ultimately, we demonstrate, empirically, the performance gains that can be extracted from Hadoop, in comparison to an RDBMS, regarding speed, storage usage and scalability potential, and suggest how this can be used to evolve data warehouses into the age of Big Data.Henriques, Roberto André PereiraRUNDias, Henrique José Rosa2018-01-24T15:47:16Z2018-01-092018-01-09T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/28933TID:201826690enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-10T15:42:03ZPortal AgregadorONG
dc.title.none.fl_str_mv Augmenting data warehousing architectures with hadoop
title Augmenting data warehousing architectures with hadoop
spellingShingle Augmenting data warehousing architectures with hadoop
Dias, Henrique José Rosa
Big Data
Hadoop
Hive
Tez
Data Warehousing
ETL
Television Audience Measurements
title_short Augmenting data warehousing architectures with hadoop
title_full Augmenting data warehousing architectures with hadoop
title_fullStr Augmenting data warehousing architectures with hadoop
title_full_unstemmed Augmenting data warehousing architectures with hadoop
title_sort Augmenting data warehousing architectures with hadoop
author Dias, Henrique José Rosa
author_facet Dias, Henrique José Rosa
author_role author
dc.contributor.none.fl_str_mv Henriques, Roberto André Pereira
RUN
dc.contributor.author.fl_str_mv Dias, Henrique José Rosa
dc.subject.por.fl_str_mv Big Data
Hadoop
Hive
Tez
Data Warehousing
ETL
Television Audience Measurements
topic Big Data
Hadoop
Hive
Tez
Data Warehousing
ETL
Television Audience Measurements
description Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies Management
publishDate 2018
dc.date.none.fl_str_mv 2018-01-24T15:47:16Z
2018-01-09
2018-01-09T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/28933
TID:201826690
url http://hdl.handle.net/10362/28933
identifier_str_mv TID:201826690
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1777302956393103360