A Graph Database Representation of Portuguese Criminal-Related Documents
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | por |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10174/34688 https://doi.org/Carnaz, G.; Nogueira, V.B.; Antunes, M. A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics 2021, 8, 37. https://doi.org/10.3390/informatics8020037 https://doi.org/10.3390/informatics8020037 |
Resumo: | Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65. |
id |
RCAP_b89485d600cfeca71e605342d1bc9a1b |
---|---|
oai_identifier_str |
oai:dspace.uevora.pt:10174/34688 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
A Graph Database Representation of Portuguese Criminal-Related Documentsknowledge representationgraph databasesnatural language processingcriminal-related documentscybersecuritycriminal domainpolice reportsOrganizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.2023-02-24T12:33:45Z2023-02-242021-06-04T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/34688https://doi.org/Carnaz, G.; Nogueira, V.B.; Antunes, M. A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics 2021, 8, 37. https://doi.org/10.3390/informatics8020037http://hdl.handle.net/10174/34688https://doi.org/10.3390/informatics8020037pord34707@alunos.uevora.ptvbn@uevora.ptmario.antunes@ipleiria.pt498Carnaz, GonçaloNogueira, Vitor BeiresAntunes, Antunesinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:37:32Zoai:dspace.uevora.pt:10174/34688Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:23:13.815486Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
A Graph Database Representation of Portuguese Criminal-Related Documents |
title |
A Graph Database Representation of Portuguese Criminal-Related Documents |
spellingShingle |
A Graph Database Representation of Portuguese Criminal-Related Documents Carnaz, Gonçalo knowledge representation graph databases natural language processing criminal-related documents cybersecurity criminal domain police reports |
title_short |
A Graph Database Representation of Portuguese Criminal-Related Documents |
title_full |
A Graph Database Representation of Portuguese Criminal-Related Documents |
title_fullStr |
A Graph Database Representation of Portuguese Criminal-Related Documents |
title_full_unstemmed |
A Graph Database Representation of Portuguese Criminal-Related Documents |
title_sort |
A Graph Database Representation of Portuguese Criminal-Related Documents |
author |
Carnaz, Gonçalo |
author_facet |
Carnaz, Gonçalo Nogueira, Vitor Beires Antunes, Antunes |
author_role |
author |
author2 |
Nogueira, Vitor Beires Antunes, Antunes |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Carnaz, Gonçalo Nogueira, Vitor Beires Antunes, Antunes |
dc.subject.por.fl_str_mv |
knowledge representation graph databases natural language processing criminal-related documents cybersecurity criminal domain police reports |
topic |
knowledge representation graph databases natural language processing criminal-related documents cybersecurity criminal domain police reports |
description |
Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-06-04T00:00:00Z 2023-02-24T12:33:45Z 2023-02-24 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10174/34688 https://doi.org/Carnaz, G.; Nogueira, V.B.; Antunes, M. A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics 2021, 8, 37. https://doi.org/10.3390/informatics8020037 http://hdl.handle.net/10174/34688 https://doi.org/10.3390/informatics8020037 |
url |
http://hdl.handle.net/10174/34688 https://doi.org/Carnaz, G.; Nogueira, V.B.; Antunes, M. A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics 2021, 8, 37. https://doi.org/10.3390/informatics8020037 https://doi.org/10.3390/informatics8020037 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.none.fl_str_mv |
d34707@alunos.uevora.pt vbn@uevora.pt mario.antunes@ipleiria.pt 498 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136714856333312 |