Utility-driven assessment of anonymized data via clustering

Ferrão, Maria Eugénia; Prata, Paula; Fazendeiro, Paulo

Utility-driven assessment of anonymized data via clustering

Detalhes bibliográficos
Autor(a) principal:	Ferrão, Maria Eugénia
Data de Publicação:	2022
Outros Autores:	Prata, Paula, Fazendeiro, Paulo
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10400.6/12328
Resumo:	In this study, clustering is conceived as an auxiliary tool to identify groups of special interest. This approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law students. Several anonymized clustering scenarios were compared against the original cluster solution. The clustering techniques were explored as data utility models in the context of data anonymization, using k-anonymity and (ε, δ)-differential as privacy models. The purpose was to assess anonymized data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a relevant metric in social sciences research). For a matter of self-containment, we present an overview of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed several clustering validity indices to understand to what extent the data structure is preserved, or not, after data anonymization. The results suggest that for low dimensionality/cardinality datasets the anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that relevant field-of-study estimates obtained from anonymized data are biased.

Metadados do item

id	RCAP_b9fcc3487760539aac7274bca3b0828e
oai_identifier_str	oai:ubibliorum.ubi.pt:10400.6/12328
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Utility-driven assessment of anonymized data via clusteringData privacyData utilityClusteringEducationIn this study, clustering is conceived as an auxiliary tool to identify groups of special interest. This approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law students. Several anonymized clustering scenarios were compared against the original cluster solution. The clustering techniques were explored as data utility models in the context of data anonymization, using k-anonymity and (ε, δ)-differential as privacy models. The purpose was to assess anonymized data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a relevant metric in social sciences research). For a matter of self-containment, we present an overview of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed several clustering validity indices to understand to what extent the data structure is preserved, or not, after data anonymization. The results suggest that for low dimensionality/cardinality datasets the anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that relevant field-of-study estimates obtained from anonymized data are biased.Springer NatureuBibliorumFerrão, Maria EugéniaPrata, PaulaFazendeiro, Paulo2022-08-26T08:33:17Z2022-07-302022-07-30T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.6/12328engFerrão, M.E., Prata, P. & Fazendeiro, P. Utility-driven assessment of anonymized data via clustering. Sci Data 9, 456 (2022). https://doi.org/10.1038/s41597-022-01561-6.2052-446310.1038/s41597-022-01561-6info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-12-15T09:55:27Zoai:ubibliorum.ubi.pt:10400.6/12328Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:51:57.668154Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Utility-driven assessment of anonymized data via clustering
title	Utility-driven assessment of anonymized data via clustering
spellingShingle	Utility-driven assessment of anonymized data via clustering Ferrão, Maria Eugénia Data privacy Data utility Clustering Education
title_short	Utility-driven assessment of anonymized data via clustering
title_full	Utility-driven assessment of anonymized data via clustering
title_fullStr	Utility-driven assessment of anonymized data via clustering
title_full_unstemmed	Utility-driven assessment of anonymized data via clustering
title_sort	Utility-driven assessment of anonymized data via clustering
author	Ferrão, Maria Eugénia
author_facet	Ferrão, Maria Eugénia Prata, Paula Fazendeiro, Paulo
author_role	author
author2	Prata, Paula Fazendeiro, Paulo
author2_role	author author
dc.contributor.none.fl_str_mv	uBibliorum
dc.contributor.author.fl_str_mv	Ferrão, Maria Eugénia Prata, Paula Fazendeiro, Paulo
dc.subject.por.fl_str_mv	Data privacy Data utility Clustering Education
topic	Data privacy Data utility Clustering Education
description	In this study, clustering is conceived as an auxiliary tool to identify groups of special interest. This approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law students. Several anonymized clustering scenarios were compared against the original cluster solution. The clustering techniques were explored as data utility models in the context of data anonymization, using k-anonymity and (ε, δ)-differential as privacy models. The purpose was to assess anonymized data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a relevant metric in social sciences research). For a matter of self-containment, we present an overview of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed several clustering validity indices to understand to what extent the data structure is preserved, or not, after data anonymization. The results suggest that for low dimensionality/cardinality datasets the anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that relevant field-of-study estimates obtained from anonymized data are biased.
publishDate	2022
dc.date.none.fl_str_mv	2022-08-26T08:33:17Z 2022-07-30 2022-07-30T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10400.6/12328
url	http://hdl.handle.net/10400.6/12328
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Ferrão, M.E., Prata, P. & Fazendeiro, P. Utility-driven assessment of anonymized data via clustering. Sci Data 9, 456 (2022). https://doi.org/10.1038/s41597-022-01561-6. 2052-4463 10.1038/s41597-022-01561-6
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Springer Nature
publisher.none.fl_str_mv	Springer Nature
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799136408066064384

Utility-driven assessment of anonymized data via clustering

Registros relacionados