Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform

Detalhes bibliográficos
Autor(a) principal: Cabral, Andreia Filipa Gonçalves
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/117609
Resumo: Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
id RCAP_5905da4900a286156f52a8822d99918d
oai_identifier_str oai:run.unl.pt:10362/117609
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud PlatformData QualityData ProfileDatabaseData WarehouseCloudData MigrationPandas ProfilingPersonal Identifiable InformationInternship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn today times, corporations have gained a vast interest in data. More and more, companies realized that the key to improving their efficiency and effectiveness and understanding their customers’ needs and preferences better was reachable by mining data. However, as the amount of data grow, so must the companies necessities for storage capacity and ensuring data quality for more accurate insights. As such, new data storage methods must be considered, evolving from old ones, still keeping data integrity. Migrating a company’s data from an old method like a Data Warehouse to a new one, Google Cloud Platform is an elaborate task. Even more so when data quality needs to be assured and sensible data, like Personal Identifiable Information, needs to be anonymized in a Cloud computing environment. To ensure these points, profiling data, before or after it migrated, has a significant value by design a profile for the data available in each data source (e.g., Databases, files, and others) based on statistics, metadata information, and pattern rules. Thus, ensuring data quality is within reasonable standards through statistics metrics, and all Personal Identifiable Information is identified and anonymized accordingly. This work will reflect the required process of how profiling Data Warehouse data can improve data quality to better migrate to the Cloud.Pinheiro, Flávio Luís PortasFigueira, Pedro SantosRUNCabral, Andreia Filipa Gonçalves2021-05-13T16:44:16Z2021-05-062021-05-06T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/117609TID:202726967enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:00:39Zoai:run.unl.pt:10362/117609Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:43:39.683950Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform
title Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform
spellingShingle Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform
Cabral, Andreia Filipa Gonçalves
Data Quality
Data Profile
Database
Data Warehouse
Cloud
Data Migration
Pandas Profiling
Personal Identifiable Information
title_short Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform
title_full Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform
title_fullStr Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform
title_full_unstemmed Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform
title_sort Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform
author Cabral, Andreia Filipa Gonçalves
author_facet Cabral, Andreia Filipa Gonçalves
author_role author
dc.contributor.none.fl_str_mv Pinheiro, Flávio Luís Portas
Figueira, Pedro Santos
RUN
dc.contributor.author.fl_str_mv Cabral, Andreia Filipa Gonçalves
dc.subject.por.fl_str_mv Data Quality
Data Profile
Database
Data Warehouse
Cloud
Data Migration
Pandas Profiling
Personal Identifiable Information
topic Data Quality
Data Profile
Database
Data Warehouse
Cloud
Data Migration
Pandas Profiling
Personal Identifiable Information
description Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
publishDate 2021
dc.date.none.fl_str_mv 2021-05-13T16:44:16Z
2021-05-06
2021-05-06T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/117609
TID:202726967
url http://hdl.handle.net/10362/117609
identifier_str_mv TID:202726967
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138045856841728