Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/117609 |
Resumo: | Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics |
id |
RCAP_5905da4900a286156f52a8822d99918d |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/117609 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud PlatformData QualityData ProfileDatabaseData WarehouseCloudData MigrationPandas ProfilingPersonal Identifiable InformationInternship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn today times, corporations have gained a vast interest in data. More and more, companies realized that the key to improving their efficiency and effectiveness and understanding their customers’ needs and preferences better was reachable by mining data. However, as the amount of data grow, so must the companies necessities for storage capacity and ensuring data quality for more accurate insights. As such, new data storage methods must be considered, evolving from old ones, still keeping data integrity. Migrating a company’s data from an old method like a Data Warehouse to a new one, Google Cloud Platform is an elaborate task. Even more so when data quality needs to be assured and sensible data, like Personal Identifiable Information, needs to be anonymized in a Cloud computing environment. To ensure these points, profiling data, before or after it migrated, has a significant value by design a profile for the data available in each data source (e.g., Databases, files, and others) based on statistics, metadata information, and pattern rules. Thus, ensuring data quality is within reasonable standards through statistics metrics, and all Personal Identifiable Information is identified and anonymized accordingly. This work will reflect the required process of how profiling Data Warehouse data can improve data quality to better migrate to the Cloud.Pinheiro, Flávio Luís PortasFigueira, Pedro SantosRUNCabral, Andreia Filipa Gonçalves2021-05-13T16:44:16Z2021-05-062021-05-06T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/117609TID:202726967enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:00:39Zoai:run.unl.pt:10362/117609Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:43:39.683950Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform |
title |
Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform |
spellingShingle |
Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform Cabral, Andreia Filipa Gonçalves Data Quality Data Profile Database Data Warehouse Cloud Data Migration Pandas Profiling Personal Identifiable Information |
title_short |
Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform |
title_full |
Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform |
title_fullStr |
Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform |
title_full_unstemmed |
Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform |
title_sort |
Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform |
author |
Cabral, Andreia Filipa Gonçalves |
author_facet |
Cabral, Andreia Filipa Gonçalves |
author_role |
author |
dc.contributor.none.fl_str_mv |
Pinheiro, Flávio Luís Portas Figueira, Pedro Santos RUN |
dc.contributor.author.fl_str_mv |
Cabral, Andreia Filipa Gonçalves |
dc.subject.por.fl_str_mv |
Data Quality Data Profile Database Data Warehouse Cloud Data Migration Pandas Profiling Personal Identifiable Information |
topic |
Data Quality Data Profile Database Data Warehouse Cloud Data Migration Pandas Profiling Personal Identifiable Information |
description |
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-05-13T16:44:16Z 2021-05-06 2021-05-06T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/117609 TID:202726967 |
url |
http://hdl.handle.net/10362/117609 |
identifier_str_mv |
TID:202726967 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138045856841728 |