A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization

Detalhes bibliográficos
Autor(a) principal: Vissaram, Tahira Jéssica da Silva Ruivo
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/94884
Resumo: Dissertation presented as partial requirement for obtaining the master’s degree in Information Management, specialization in Information Systems and Technologies Management
id RCAP_f119fe08b3f9b1ca3614ca9f2f22f8a3
oai_identifier_str oai:run.unl.pt:10362/94884
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimizationBig DataData VaultDatabase ModelingLiimitationsOptimizationDissertation presented as partial requirement for obtaining the master’s degree in Information Management, specialization in Information Systems and Technologies ManagementData becomes the most powerful asset in an organization due to the insights and patterns that can be discovered and because it can be transformed into real-time information through BI tools to support decision making. So, it is crucial to have a DW architecture that stores all the business data of an organization in a central repository to be accessible for all end-users, allowing them to query the data for reporting. When we want to design a DW, the most common approach used is the Star Schema, created by Kimball; however, the costs of maintenance and the re-design of the model, when the business requirements and business processes change, or even when the model needs to be incremented are very high and have a significant impact on the whole structure. For that reason, a Data Vault approach invented by Dan Linstedt emerged, which brings a methodology more oriented to auditability, traceability, and agility of the data, which rapidly adapts to the changes of the business rules and requirements, while handling large amounts of data. Therefore, this hybrid modus operandi combines the best of 3NF and Star schema, being flexible, scalable, consistent, whereupon the costs of implementation and maintenance become reduced, without the need to modify all the model structure, allowing increment building of new business processes and requirements. However, as it is still recent, the Data Vault approach has limitations compared to Star Schema, requiring many associations to access and execute ad-hoc queries, which makes end-user access to the model difficult. Consequently, the model has low performance, and more storage is required due to denormalization. Although both are competitors, when we refer to building an EDW capable of providing a central view of all business, the Star Schema and Data Vault 2.0 approaches complement each other according to Data Vault Architecture. On the top of the Data Vault, in the information delivery layer, as the Data Vault cannot be accessed for end-users, Data Marts are created using Star Schemas or OLAP cubes to apply BI tools to perform reports for organizational decision-making. So, briefly, the purpose of this Dissertation is, through a case study, to compare the Star Schema model with the Data Vault 2.0 Ensemble model. Also, to demonstrate the limitations of Data Vault 2.0 studied and present an optimized way of designing a Data Vault 2.0 model, reducing the joins required to query the data, minimizing the complexity of the model, and allowing users to access directly to the data, instead of creating Data Marts.Santos, Vitor Manuel Pereira Duarte dosRUNVissaram, Tahira Jéssica da Silva Ruivo2020-03-24T15:34:25Z2020-02-042020-02-04T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/94884TID:202468399enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:42:53Zoai:run.unl.pt:10362/94884Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:38:08.136636Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization
title A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization
spellingShingle A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization
Vissaram, Tahira Jéssica da Silva Ruivo
Big Data
Data Vault
Database Modeling
Liimitations
Optimization
title_short A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization
title_full A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization
title_fullStr A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization
title_full_unstemmed A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization
title_sort A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization
author Vissaram, Tahira Jéssica da Silva Ruivo
author_facet Vissaram, Tahira Jéssica da Silva Ruivo
author_role author
dc.contributor.none.fl_str_mv Santos, Vitor Manuel Pereira Duarte dos
RUN
dc.contributor.author.fl_str_mv Vissaram, Tahira Jéssica da Silva Ruivo
dc.subject.por.fl_str_mv Big Data
Data Vault
Database Modeling
Liimitations
Optimization
topic Big Data
Data Vault
Database Modeling
Liimitations
Optimization
description Dissertation presented as partial requirement for obtaining the master’s degree in Information Management, specialization in Information Systems and Technologies Management
publishDate 2020
dc.date.none.fl_str_mv 2020-03-24T15:34:25Z
2020-02-04
2020-02-04T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/94884
TID:202468399
url http://hdl.handle.net/10362/94884
identifier_str_mv TID:202468399
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137997331890176