Using natural language processing to detect privacy violations in online contracts
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10316/93812 https://doi.org/10.1145/3341105.3375774 |
Resumo: | As information systems deal with contracts and documents in essential services, there is a lack of mechanisms to help organizations in protecting the involved data subjects. In this paper, we evaluate the use of named entity recognition as a way to identify, monitor and validate personally identifiable information. In our experiments, we use three of the most well-known Natural Language Processing tools (NLTK, Stanford CoreNLP, and spaCy). First, the effectiveness of the tools is evaluated in a generic dataset. Then, the tools are applied in datasets built based on contracts that contain personally identifiable information. The results show that models' performance was highly positive in accurately classifying both the generic and the contracts' data. Furthermore, we discuss how our proposal can effectively act as a Privacy Enhancing Technology. |
id |
RCAP_c206c52847e36076cb2a722a1bd6419f |
---|---|
oai_identifier_str |
oai:estudogeral.uc.pt:10316/93812 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Using natural language processing to detect privacy violations in online contractsAs information systems deal with contracts and documents in essential services, there is a lack of mechanisms to help organizations in protecting the involved data subjects. In this paper, we evaluate the use of named entity recognition as a way to identify, monitor and validate personally identifiable information. In our experiments, we use three of the most well-known Natural Language Processing tools (NLTK, Stanford CoreNLP, and spaCy). First, the effectiveness of the tools is evaluated in a generic dataset. Then, the tools are applied in datasets built based on contracts that contain personally identifiable information. The results show that models' performance was highly positive in accurately classifying both the generic and the contracts' data. Furthermore, we discuss how our proposal can effectively act as a Privacy Enhancing Technology.ACM2020-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10316/93812http://hdl.handle.net/10316/93812https://doi.org/10.1145/3341105.3375774eng9781450368667https://doi.org/10.1145/3341105.3375774Silva, PauloGonçalves, CarolinaGodinho, CarolinaAntunes, Nuno Manuel dos SantosCurado, Maríliainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2022-08-24T13:09:50Zoai:estudogeral.uc.pt:10316/93812Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:12:42.300182Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Using natural language processing to detect privacy violations in online contracts |
title |
Using natural language processing to detect privacy violations in online contracts |
spellingShingle |
Using natural language processing to detect privacy violations in online contracts Silva, Paulo |
title_short |
Using natural language processing to detect privacy violations in online contracts |
title_full |
Using natural language processing to detect privacy violations in online contracts |
title_fullStr |
Using natural language processing to detect privacy violations in online contracts |
title_full_unstemmed |
Using natural language processing to detect privacy violations in online contracts |
title_sort |
Using natural language processing to detect privacy violations in online contracts |
author |
Silva, Paulo |
author_facet |
Silva, Paulo Gonçalves, Carolina Godinho, Carolina Antunes, Nuno Manuel dos Santos Curado, Marília |
author_role |
author |
author2 |
Gonçalves, Carolina Godinho, Carolina Antunes, Nuno Manuel dos Santos Curado, Marília |
author2_role |
author author author author |
dc.contributor.author.fl_str_mv |
Silva, Paulo Gonçalves, Carolina Godinho, Carolina Antunes, Nuno Manuel dos Santos Curado, Marília |
description |
As information systems deal with contracts and documents in essential services, there is a lack of mechanisms to help organizations in protecting the involved data subjects. In this paper, we evaluate the use of named entity recognition as a way to identify, monitor and validate personally identifiable information. In our experiments, we use three of the most well-known Natural Language Processing tools (NLTK, Stanford CoreNLP, and spaCy). First, the effectiveness of the tools is evaluated in a generic dataset. Then, the tools are applied in datasets built based on contracts that contain personally identifiable information. The results show that models' performance was highly positive in accurately classifying both the generic and the contracts' data. Furthermore, we discuss how our proposal can effectively act as a Privacy Enhancing Technology. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-03 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10316/93812 http://hdl.handle.net/10316/93812 https://doi.org/10.1145/3341105.3375774 |
url |
http://hdl.handle.net/10316/93812 https://doi.org/10.1145/3341105.3375774 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
9781450368667 https://doi.org/10.1145/3341105.3375774 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
ACM |
publisher.none.fl_str_mv |
ACM |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134022429835264 |