A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/94884 |
Resumo: | Dissertation presented as partial requirement for obtaining the master’s degree in Information Management, specialization in Information Systems and Technologies Management |
id |
RCAP_f119fe08b3f9b1ca3614ca9f2f22f8a3 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/94884 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimizationBig DataData VaultDatabase ModelingLiimitationsOptimizationDissertation presented as partial requirement for obtaining the master’s degree in Information Management, specialization in Information Systems and Technologies ManagementData becomes the most powerful asset in an organization due to the insights and patterns that can be discovered and because it can be transformed into real-time information through BI tools to support decision making. So, it is crucial to have a DW architecture that stores all the business data of an organization in a central repository to be accessible for all end-users, allowing them to query the data for reporting. When we want to design a DW, the most common approach used is the Star Schema, created by Kimball; however, the costs of maintenance and the re-design of the model, when the business requirements and business processes change, or even when the model needs to be incremented are very high and have a significant impact on the whole structure. For that reason, a Data Vault approach invented by Dan Linstedt emerged, which brings a methodology more oriented to auditability, traceability, and agility of the data, which rapidly adapts to the changes of the business rules and requirements, while handling large amounts of data. Therefore, this hybrid modus operandi combines the best of 3NF and Star schema, being flexible, scalable, consistent, whereupon the costs of implementation and maintenance become reduced, without the need to modify all the model structure, allowing increment building of new business processes and requirements. However, as it is still recent, the Data Vault approach has limitations compared to Star Schema, requiring many associations to access and execute ad-hoc queries, which makes end-user access to the model difficult. Consequently, the model has low performance, and more storage is required due to denormalization. Although both are competitors, when we refer to building an EDW capable of providing a central view of all business, the Star Schema and Data Vault 2.0 approaches complement each other according to Data Vault Architecture. On the top of the Data Vault, in the information delivery layer, as the Data Vault cannot be accessed for end-users, Data Marts are created using Star Schemas or OLAP cubes to apply BI tools to perform reports for organizational decision-making. So, briefly, the purpose of this Dissertation is, through a case study, to compare the Star Schema model with the Data Vault 2.0 Ensemble model. Also, to demonstrate the limitations of Data Vault 2.0 studied and present an optimized way of designing a Data Vault 2.0 model, reducing the joins required to query the data, minimizing the complexity of the model, and allowing users to access directly to the data, instead of creating Data Marts.Santos, Vitor Manuel Pereira Duarte dosRUNVissaram, Tahira Jéssica da Silva Ruivo2020-03-24T15:34:25Z2020-02-042020-02-04T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/94884TID:202468399enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:42:53Zoai:run.unl.pt:10362/94884Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:38:08.136636Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization |
title |
A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization |
spellingShingle |
A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization Vissaram, Tahira Jéssica da Silva Ruivo Big Data Data Vault Database Modeling Liimitations Optimization |
title_short |
A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization |
title_full |
A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization |
title_fullStr |
A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization |
title_full_unstemmed |
A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization |
title_sort |
A proposal for improvements of data vault ensemble process approach to retrieve big data: data vault limitations and optimization |
author |
Vissaram, Tahira Jéssica da Silva Ruivo |
author_facet |
Vissaram, Tahira Jéssica da Silva Ruivo |
author_role |
author |
dc.contributor.none.fl_str_mv |
Santos, Vitor Manuel Pereira Duarte dos RUN |
dc.contributor.author.fl_str_mv |
Vissaram, Tahira Jéssica da Silva Ruivo |
dc.subject.por.fl_str_mv |
Big Data Data Vault Database Modeling Liimitations Optimization |
topic |
Big Data Data Vault Database Modeling Liimitations Optimization |
description |
Dissertation presented as partial requirement for obtaining the master’s degree in Information Management, specialization in Information Systems and Technologies Management |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-03-24T15:34:25Z 2020-02-04 2020-02-04T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/94884 TID:202468399 |
url |
http://hdl.handle.net/10362/94884 |
identifier_str_mv |
TID:202468399 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137997331890176 |