A Graph Database Representation of Portuguese Criminal-Related Documents

Detalhes bibliográficos
Autor(a) principal: Carnaz, Gonçalo
Data de Publicação: 2021
Outros Autores: Nogueira, Vitor Beires, Antunes, Antunes
Tipo de documento: Artigo
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/34688
https://doi.org/Carnaz, G.; Nogueira, V.B.; Antunes, M. A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics 2021, 8, 37. https://doi.org/10.3390/informatics8020037
https://doi.org/10.3390/informatics8020037
Resumo: Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.
id RCAP_b89485d600cfeca71e605342d1bc9a1b
oai_identifier_str oai:dspace.uevora.pt:10174/34688
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A Graph Database Representation of Portuguese Criminal-Related Documentsknowledge representationgraph databasesnatural language processingcriminal-related documentscybersecuritycriminal domainpolice reportsOrganizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.2023-02-24T12:33:45Z2023-02-242021-06-04T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/34688https://doi.org/Carnaz, G.; Nogueira, V.B.; Antunes, M. A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics 2021, 8, 37. https://doi.org/10.3390/informatics8020037http://hdl.handle.net/10174/34688https://doi.org/10.3390/informatics8020037pord34707@alunos.uevora.ptvbn@uevora.ptmario.antunes@ipleiria.pt498Carnaz, GonçaloNogueira, Vitor BeiresAntunes, Antunesinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:37:32Zoai:dspace.uevora.pt:10174/34688Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:23:13.815486Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A Graph Database Representation of Portuguese Criminal-Related Documents
title A Graph Database Representation of Portuguese Criminal-Related Documents
spellingShingle A Graph Database Representation of Portuguese Criminal-Related Documents
Carnaz, Gonçalo
knowledge representation
graph databases
natural language processing
criminal-related documents
cybersecurity
criminal domain
police reports
title_short A Graph Database Representation of Portuguese Criminal-Related Documents
title_full A Graph Database Representation of Portuguese Criminal-Related Documents
title_fullStr A Graph Database Representation of Portuguese Criminal-Related Documents
title_full_unstemmed A Graph Database Representation of Portuguese Criminal-Related Documents
title_sort A Graph Database Representation of Portuguese Criminal-Related Documents
author Carnaz, Gonçalo
author_facet Carnaz, Gonçalo
Nogueira, Vitor Beires
Antunes, Antunes
author_role author
author2 Nogueira, Vitor Beires
Antunes, Antunes
author2_role author
author
dc.contributor.author.fl_str_mv Carnaz, Gonçalo
Nogueira, Vitor Beires
Antunes, Antunes
dc.subject.por.fl_str_mv knowledge representation
graph databases
natural language processing
criminal-related documents
cybersecurity
criminal domain
police reports
topic knowledge representation
graph databases
natural language processing
criminal-related documents
cybersecurity
criminal domain
police reports
description Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.
publishDate 2021
dc.date.none.fl_str_mv 2021-06-04T00:00:00Z
2023-02-24T12:33:45Z
2023-02-24
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/34688
https://doi.org/Carnaz, G.; Nogueira, V.B.; Antunes, M. A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics 2021, 8, 37. https://doi.org/10.3390/informatics8020037
http://hdl.handle.net/10174/34688
https://doi.org/10.3390/informatics8020037
url http://hdl.handle.net/10174/34688
https://doi.org/Carnaz, G.; Nogueira, V.B.; Antunes, M. A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics 2021, 8, 37. https://doi.org/10.3390/informatics8020037
https://doi.org/10.3390/informatics8020037
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv d34707@alunos.uevora.pt
vbn@uevora.pt
mario.antunes@ipleiria.pt
498
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136714856333312