Ensuring privacy when querying distributed databases

Detalhes bibliográficos
Autor(a) principal: Almeida, João Rafael Duarte de
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/35125
Resumo: Anonymisation is currently one of the biggest challenges when sharing sensitive personal information. Its importance depends largely on the application domain, but when dealing with health information, this becomes a more serious issue. A simpler approach to avoid this disclosure is to ensure that all data that can be associated directly with an individual is removed from the original dataset. However, some studies have shown that simple anonymisation procedures can sometimes be reverted using specific patients’ characteristics, namely when the anonymisation is based on hidden key attributes. In this work, we propose a secure architecture to share information from distributed databases without compromising the subjects’ privacy. The work was initially focused on identifying techniques to link information between multiple data sources, in order to revert the anonymization procedures. In a second phase, we developed the methodology to perform queries over distributed databases was proposed. The architecture was validated using a standard data schema that is widely adopted in observational research studies.
id RCAP_ddf207b464e4120aab28773aefaa6fc3
oai_identifier_str oai:ria.ua.pt:10773/35125
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Ensuring privacy when querying distributed databasesPrivacy preservingData anonymisationk-Anonymityl-DiversityAnonymisation is currently one of the biggest challenges when sharing sensitive personal information. Its importance depends largely on the application domain, but when dealing with health information, this becomes a more serious issue. A simpler approach to avoid this disclosure is to ensure that all data that can be associated directly with an individual is removed from the original dataset. However, some studies have shown that simple anonymisation procedures can sometimes be reverted using specific patients’ characteristics, namely when the anonymisation is based on hidden key attributes. In this work, we propose a secure architecture to share information from distributed databases without compromising the subjects’ privacy. The work was initially focused on identifying techniques to link information between multiple data sources, in order to revert the anonymization procedures. In a second phase, we developed the methodology to perform queries over distributed databases was proposed. The architecture was validated using a standard data schema that is widely adopted in observational research studies.A garantia da anonimização de dados é atualmente um dos maiores desafios quando existe a necessidade de partilhar informações pessoais de carácter sensível. Apesar de ser um problema transversal a muitos domínios de aplicação, este torna-se mais crítico quando a anonimização envolve dados clinicos. Nestes casos, a abordagem mais comum para evitar a divulgação de dados, que possam ser associados diretamente a um indivíduo, consiste na remoção de atributos identificadores. No entanto, segundo a literatura, esta abordagem não oferece uma garantia total de anonimato, que pode ser quebrada através de ataques específicos que permitem a reidentificação dos sujeitos. Neste trabalho, é proposta uma arquitetura que permite partilhar dados armazenados em repositórios distribuídos, de forma segura e sem comprometer a privacidade. Numa primeira fase deste trabalho, foi feita uma análise de técnicas que permitam reverter os procedimentos de anonimização. Na fase seguinte, foi proposta uma metodologia que permite realizar pesquisas em bases de dados distribuídas, sem que o anonimato seja quebrado. Esta arquitetura foi validada sobre um esquema de base de dados relacional que é amplamente utilizado em estudos clínicos observacionais.2022-11-04T15:04:22Z2022-07-20T00:00:00Z2022-07-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/35125engAlmeida, João Rafael Duarte deinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:07:42Zoai:ria.ua.pt:10773/35125Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:06:14.996907Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Ensuring privacy when querying distributed databases
title Ensuring privacy when querying distributed databases
spellingShingle Ensuring privacy when querying distributed databases
Almeida, João Rafael Duarte de
Privacy preserving
Data anonymisation
k-Anonymity
l-Diversity
title_short Ensuring privacy when querying distributed databases
title_full Ensuring privacy when querying distributed databases
title_fullStr Ensuring privacy when querying distributed databases
title_full_unstemmed Ensuring privacy when querying distributed databases
title_sort Ensuring privacy when querying distributed databases
author Almeida, João Rafael Duarte de
author_facet Almeida, João Rafael Duarte de
author_role author
dc.contributor.author.fl_str_mv Almeida, João Rafael Duarte de
dc.subject.por.fl_str_mv Privacy preserving
Data anonymisation
k-Anonymity
l-Diversity
topic Privacy preserving
Data anonymisation
k-Anonymity
l-Diversity
description Anonymisation is currently one of the biggest challenges when sharing sensitive personal information. Its importance depends largely on the application domain, but when dealing with health information, this becomes a more serious issue. A simpler approach to avoid this disclosure is to ensure that all data that can be associated directly with an individual is removed from the original dataset. However, some studies have shown that simple anonymisation procedures can sometimes be reverted using specific patients’ characteristics, namely when the anonymisation is based on hidden key attributes. In this work, we propose a secure architecture to share information from distributed databases without compromising the subjects’ privacy. The work was initially focused on identifying techniques to link information between multiple data sources, in order to revert the anonymization procedures. In a second phase, we developed the methodology to perform queries over distributed databases was proposed. The architecture was validated using a standard data schema that is widely adopted in observational research studies.
publishDate 2022
dc.date.none.fl_str_mv 2022-11-04T15:04:22Z
2022-07-20T00:00:00Z
2022-07-20
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/35125
url http://hdl.handle.net/10773/35125
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137717339029504