Pri-View: Privacy-Preserving Views for Data Analysis and Publication

Detalhes bibliográficos
Autor(a) principal: Costa, João Miguel Pereira
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/138793
Resumo: Data is being generated and processed at an unprecedented scale. Statistical data analysis is in high demand, with many organizations using it for a broad range of interests, from researching to guiding business decisions. However, this massive generation of data raises privacy concerns, as most of this data contains sensitive information about individuals. In turn, several regulations have emerged to give people more control over their data, such as the European General Data Protection Regulation. For organizations, the challenge is how to analyze and publish data without compro- mising an individual’s privacy. In the context of Relational Databases, they still lack features for this, with solutions involving manually removing identifying information from the data or only allowing certain aggregate queries. However, these solutions can be susceptible to attacks and do not provide strong privacy guarantees. In this thesis, we propose to explore a solution to address the challenge of privately analyzing and publishing data on Relational Databases. To this end, we present a new type of Views - privacy-preserving views - which allow for computing statistical aggre- gations on data while preserving privacy. We focus our studies on Differential Privacy, a recent mathematical definition of privacy, and explore how to turn common aggregation functions into their private counterparts. We present our solution in two parts. In the first part, we present a solution to create privacy-preserving views for a specific database, namely PostgreSQL. In the second part, we present the design and implementation of a database proxy, which supports any SQL database and produces private statistical results. The experimental results show that our proposed solutions can achieve balanced performance - views containing count functions perform better than views containing other functions. They also show that both solutions are capable of providing accurate privacy-preserving data for large databases and sample sizes.
id RCAP_0a5b35da2ed8e70851ec8f70d9510279
oai_identifier_str oai:run.unl.pt:10362/138793
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Pri-View: Privacy-Preserving Views for Data Analysis and PublicationRelational DatabasesStatistical DataDifferential PrivacyViewsProxyDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaData is being generated and processed at an unprecedented scale. Statistical data analysis is in high demand, with many organizations using it for a broad range of interests, from researching to guiding business decisions. However, this massive generation of data raises privacy concerns, as most of this data contains sensitive information about individuals. In turn, several regulations have emerged to give people more control over their data, such as the European General Data Protection Regulation. For organizations, the challenge is how to analyze and publish data without compro- mising an individual’s privacy. In the context of Relational Databases, they still lack features for this, with solutions involving manually removing identifying information from the data or only allowing certain aggregate queries. However, these solutions can be susceptible to attacks and do not provide strong privacy guarantees. In this thesis, we propose to explore a solution to address the challenge of privately analyzing and publishing data on Relational Databases. To this end, we present a new type of Views - privacy-preserving views - which allow for computing statistical aggre- gations on data while preserving privacy. We focus our studies on Differential Privacy, a recent mathematical definition of privacy, and explore how to turn common aggregation functions into their private counterparts. We present our solution in two parts. In the first part, we present a solution to create privacy-preserving views for a specific database, namely PostgreSQL. In the second part, we present the design and implementation of a database proxy, which supports any SQL database and produces private statistical results. The experimental results show that our proposed solutions can achieve balanced performance - views containing count functions perform better than views containing other functions. They also show that both solutions are capable of providing accurate privacy-preserving data for large databases and sample sizes.Dados estão a ser gerados e processados a uma escala sem precedentes. A análise de dados estatísticos é uma área bastante requisitada, com muitas organizações a utilizá-la para uma ampla variedade de interesses, desde investigação até à orientação de decisões empresariais. No entanto, esta geração massiva de dados tem suscitado preocupações quanto à privacidade, uma vez que a maioria destes dados contém informações sensíveis sobre indivíduos. Por sua vez, vários regulamentos têm surgido para dar às pessoas mais controlo sobre os seus dados, tais como o Regulamento Geral sobre a Proteção de Dados na União Europeia. Para as organizações, o desafio é como analisar e publicar dados sem comprometer a privacidade das pessoas. No contexto das Bases de Dados Relacionais, estas não tem qualquer mecanismo para tal, sendo que as soluções existentes envolvem a remoção ma- nual de informação sensível ou permitir apenas a execução de consultas de agregação. No entanto, estas soluções são suscetíveis a ataques e não oferecem garantias de privacidade suficientes. Nesta dissertação propomos explorar uma solução para endereçar o desafio da análise e publicação de dados de forma privada, em Bases de Dados Relacionais. Para este efeito, apresentamos um novo tipo de vistas - vistas privadas - que permitem calcular agregações estatísticas sobre os dados a ao mesmo tempo preservar a privacidade. Iremos estudar o conceito de Privacidade Diferencial, uma definição matemática recente de privacidade, e explorar como transformar funções de agregação comuns em funções privadas. Iremos apresentar a nossa solução em duas partes. Na primeira parte apresentaremos uma solução para criar vistas privadas para uma base de dados específica - PostgreSQL. Na segunda parte apresentaremos o design e implementação de um proxy que suporta qualquer base de dados SQL e produz resultados estatísticos privados. Os resultados ex- perimentais mostram que as soluções propostas são capazes de alcançar um desempenho equilibrado e de produzir dados privados precisos.Preguiça, NunoRUNCosta, João Miguel Pereira2022-05-27T14:58:35Z2022-012022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/138793enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:16:10Zoai:run.unl.pt:10362/138793Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:49:13.407049Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Pri-View: Privacy-Preserving Views for Data Analysis and Publication
title Pri-View: Privacy-Preserving Views for Data Analysis and Publication
spellingShingle Pri-View: Privacy-Preserving Views for Data Analysis and Publication
Costa, João Miguel Pereira
Relational Databases
Statistical Data
Differential Privacy
Views
Proxy
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Pri-View: Privacy-Preserving Views for Data Analysis and Publication
title_full Pri-View: Privacy-Preserving Views for Data Analysis and Publication
title_fullStr Pri-View: Privacy-Preserving Views for Data Analysis and Publication
title_full_unstemmed Pri-View: Privacy-Preserving Views for Data Analysis and Publication
title_sort Pri-View: Privacy-Preserving Views for Data Analysis and Publication
author Costa, João Miguel Pereira
author_facet Costa, João Miguel Pereira
author_role author
dc.contributor.none.fl_str_mv Preguiça, Nuno
RUN
dc.contributor.author.fl_str_mv Costa, João Miguel Pereira
dc.subject.por.fl_str_mv Relational Databases
Statistical Data
Differential Privacy
Views
Proxy
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Relational Databases
Statistical Data
Differential Privacy
Views
Proxy
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Data is being generated and processed at an unprecedented scale. Statistical data analysis is in high demand, with many organizations using it for a broad range of interests, from researching to guiding business decisions. However, this massive generation of data raises privacy concerns, as most of this data contains sensitive information about individuals. In turn, several regulations have emerged to give people more control over their data, such as the European General Data Protection Regulation. For organizations, the challenge is how to analyze and publish data without compro- mising an individual’s privacy. In the context of Relational Databases, they still lack features for this, with solutions involving manually removing identifying information from the data or only allowing certain aggregate queries. However, these solutions can be susceptible to attacks and do not provide strong privacy guarantees. In this thesis, we propose to explore a solution to address the challenge of privately analyzing and publishing data on Relational Databases. To this end, we present a new type of Views - privacy-preserving views - which allow for computing statistical aggre- gations on data while preserving privacy. We focus our studies on Differential Privacy, a recent mathematical definition of privacy, and explore how to turn common aggregation functions into their private counterparts. We present our solution in two parts. In the first part, we present a solution to create privacy-preserving views for a specific database, namely PostgreSQL. In the second part, we present the design and implementation of a database proxy, which supports any SQL database and produces private statistical results. The experimental results show that our proposed solutions can achieve balanced performance - views containing count functions perform better than views containing other functions. They also show that both solutions are capable of providing accurate privacy-preserving data for large databases and sample sizes.
publishDate 2022
dc.date.none.fl_str_mv 2022-05-27T14:58:35Z
2022-01
2022-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/138793
url http://hdl.handle.net/10362/138793
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138091532812288