Using natural language processing to detect privacy violations in online contracts

Detalhes bibliográficos
Autor(a) principal: Silva, Paulo
Data de Publicação: 2020
Outros Autores: Gonçalves, Carolina, Godinho, Carolina, Antunes, Nuno Manuel dos Santos, Curado, Marília
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10316/93812
https://doi.org/10.1145/3341105.3375774
Resumo: As information systems deal with contracts and documents in essential services, there is a lack of mechanisms to help organizations in protecting the involved data subjects. In this paper, we evaluate the use of named entity recognition as a way to identify, monitor and validate personally identifiable information. In our experiments, we use three of the most well-known Natural Language Processing tools (NLTK, Stanford CoreNLP, and spaCy). First, the effectiveness of the tools is evaluated in a generic dataset. Then, the tools are applied in datasets built based on contracts that contain personally identifiable information. The results show that models' performance was highly positive in accurately classifying both the generic and the contracts' data. Furthermore, we discuss how our proposal can effectively act as a Privacy Enhancing Technology.
id RCAP_c206c52847e36076cb2a722a1bd6419f
oai_identifier_str oai:estudogeral.uc.pt:10316/93812
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Using natural language processing to detect privacy violations in online contractsAs information systems deal with contracts and documents in essential services, there is a lack of mechanisms to help organizations in protecting the involved data subjects. In this paper, we evaluate the use of named entity recognition as a way to identify, monitor and validate personally identifiable information. In our experiments, we use three of the most well-known Natural Language Processing tools (NLTK, Stanford CoreNLP, and spaCy). First, the effectiveness of the tools is evaluated in a generic dataset. Then, the tools are applied in datasets built based on contracts that contain personally identifiable information. The results show that models' performance was highly positive in accurately classifying both the generic and the contracts' data. Furthermore, we discuss how our proposal can effectively act as a Privacy Enhancing Technology.ACM2020-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10316/93812http://hdl.handle.net/10316/93812https://doi.org/10.1145/3341105.3375774eng9781450368667https://doi.org/10.1145/3341105.3375774Silva, PauloGonçalves, CarolinaGodinho, CarolinaAntunes, Nuno Manuel dos SantosCurado, Maríliainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2022-08-24T13:09:50Zoai:estudogeral.uc.pt:10316/93812Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:12:42.300182Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Using natural language processing to detect privacy violations in online contracts
title Using natural language processing to detect privacy violations in online contracts
spellingShingle Using natural language processing to detect privacy violations in online contracts
Silva, Paulo
title_short Using natural language processing to detect privacy violations in online contracts
title_full Using natural language processing to detect privacy violations in online contracts
title_fullStr Using natural language processing to detect privacy violations in online contracts
title_full_unstemmed Using natural language processing to detect privacy violations in online contracts
title_sort Using natural language processing to detect privacy violations in online contracts
author Silva, Paulo
author_facet Silva, Paulo
Gonçalves, Carolina
Godinho, Carolina
Antunes, Nuno Manuel dos Santos
Curado, Marília
author_role author
author2 Gonçalves, Carolina
Godinho, Carolina
Antunes, Nuno Manuel dos Santos
Curado, Marília
author2_role author
author
author
author
dc.contributor.author.fl_str_mv Silva, Paulo
Gonçalves, Carolina
Godinho, Carolina
Antunes, Nuno Manuel dos Santos
Curado, Marília
description As information systems deal with contracts and documents in essential services, there is a lack of mechanisms to help organizations in protecting the involved data subjects. In this paper, we evaluate the use of named entity recognition as a way to identify, monitor and validate personally identifiable information. In our experiments, we use three of the most well-known Natural Language Processing tools (NLTK, Stanford CoreNLP, and spaCy). First, the effectiveness of the tools is evaluated in a generic dataset. Then, the tools are applied in datasets built based on contracts that contain personally identifiable information. The results show that models' performance was highly positive in accurately classifying both the generic and the contracts' data. Furthermore, we discuss how our proposal can effectively act as a Privacy Enhancing Technology.
publishDate 2020
dc.date.none.fl_str_mv 2020-03
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10316/93812
http://hdl.handle.net/10316/93812
https://doi.org/10.1145/3341105.3375774
url http://hdl.handle.net/10316/93812
https://doi.org/10.1145/3341105.3375774
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 9781450368667
https://doi.org/10.1145/3341105.3375774
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv ACM
publisher.none.fl_str_mv ACM
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134022429835264