Crime inference using machine learning and geographical data

Detalhes bibliográficos
Autor(a) principal: Roque, Miguel Francisco Frade
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.14/41436
Resumo: Crimes are not random events in society, and eventually something must influence their occurrence. It is by characterizing the environment that it is possible to create algorithms that predict the criminal activity in a certain place and at some point in time, which allows its anticipation and prevention through decision-making in public policy. This study focusses on finding the best way to predict crimes, that is, which types of features are the most important to consider while predicting crimes, and which methods are the most predictive. An analysis of the city of Philadelphia, in the state of Pennsylvania (USA), is made, taking into account the urban, racial, demographic and socioeconomic characteristics of its different geographical blocks, and the number of criminal occurrences in each of them, over multiple years. The methods used are both linear and non-linear. When non-linear methods are used, via machine learning techniques, it is evident that the prediction of the number of crimes is much more assertive for any type of variable, leading to the conclusion that the relationships studied here are not linear in nature, and therefore tree based models (especially gradient boosting and random forest) represent the most suitable approach for this data. In this perspective, the models that consider only the socio-demographic characteristics of the neighborhoods are significantly more effective in forecasting than the entirely urban ones.
id RCAP_79c724f3d64611ef2cbe9c2f264a79b5
oai_identifier_str oai:repositorio.ucp.pt:10400.14/41436
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Crime inference using machine learning and geographical dataCrimesSocio-demographicUrbanLinearNon-linearDomínio/Área Científica::Ciências Sociais::Economia e GestãoCrimes are not random events in society, and eventually something must influence their occurrence. It is by characterizing the environment that it is possible to create algorithms that predict the criminal activity in a certain place and at some point in time, which allows its anticipation and prevention through decision-making in public policy. This study focusses on finding the best way to predict crimes, that is, which types of features are the most important to consider while predicting crimes, and which methods are the most predictive. An analysis of the city of Philadelphia, in the state of Pennsylvania (USA), is made, taking into account the urban, racial, demographic and socioeconomic characteristics of its different geographical blocks, and the number of criminal occurrences in each of them, over multiple years. The methods used are both linear and non-linear. When non-linear methods are used, via machine learning techniques, it is evident that the prediction of the number of crimes is much more assertive for any type of variable, leading to the conclusion that the relationships studied here are not linear in nature, and therefore tree based models (especially gradient boosting and random forest) represent the most suitable approach for this data. In this perspective, the models that consider only the socio-demographic characteristics of the neighborhoods are significantly more effective in forecasting than the entirely urban ones.Os crimes não são eventos aleatórios na sociedade e, eventualmente, algo deve influenciar a sua ocorrência. É pela caracterização do ambiente que é possível criar algoritmos que preveem a atividade criminosa num determinado local e em algum momento no tempo, o que permite a sua antecipação e prevenção por meio das tomadas de decisão na política pública. Este estudo foca-se em encontrar a melhor forma de prever crimes, ou seja, que tipos de características são as mais importantes a considerar na previsão de crimes, e que métodos são os mais preditivos. É feita uma análise da cidade de Filadélfia, no estado da Pensilvânia (EUA), tendo em consideração as características urbanas, raciais, demográficas e socioeconómicas dos seus diferentes quarteirões geográficos, e o número de ocorrências criminais em cada um deles, ao longo de vários anos. Os métodos utilizados são lineares e não lineares. Quando são utilizados métodos não lineares, através de técnicas de machine learning, fica evidente que a previsão do número de crimes é muito mais assertiva para qualquer tipo de variável, levando à conclusão de que as relações aqui estudadas não são de natureza linear e, portanto, modelos baseados em árvores de decisão (especialmente gradient boosting e random forest) representam a abordagem mais adequada para estes dados. Nessa perspetiva, os modelos que consideram apenas as características sociodemográficas dos bairros são significativamente mais eficazes na previsão do que os inteiramente urbanos.Bertani, NicolòVeritati - Repositório Institucional da Universidade Católica PortuguesaRoque, Miguel Francisco Frade2023-06-26T13:20:36Z2023-02-032023-012023-02-03T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.14/41436TID:203278755enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-12T17:47:01Zoai:repositorio.ucp.pt:10400.14/41436Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T18:34:07.766095Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Crime inference using machine learning and geographical data
title Crime inference using machine learning and geographical data
spellingShingle Crime inference using machine learning and geographical data
Roque, Miguel Francisco Frade
Crimes
Socio-demographic
Urban
Linear
Non-linear
Domínio/Área Científica::Ciências Sociais::Economia e Gestão
title_short Crime inference using machine learning and geographical data
title_full Crime inference using machine learning and geographical data
title_fullStr Crime inference using machine learning and geographical data
title_full_unstemmed Crime inference using machine learning and geographical data
title_sort Crime inference using machine learning and geographical data
author Roque, Miguel Francisco Frade
author_facet Roque, Miguel Francisco Frade
author_role author
dc.contributor.none.fl_str_mv Bertani, Nicolò
Veritati - Repositório Institucional da Universidade Católica Portuguesa
dc.contributor.author.fl_str_mv Roque, Miguel Francisco Frade
dc.subject.por.fl_str_mv Crimes
Socio-demographic
Urban
Linear
Non-linear
Domínio/Área Científica::Ciências Sociais::Economia e Gestão
topic Crimes
Socio-demographic
Urban
Linear
Non-linear
Domínio/Área Científica::Ciências Sociais::Economia e Gestão
description Crimes are not random events in society, and eventually something must influence their occurrence. It is by characterizing the environment that it is possible to create algorithms that predict the criminal activity in a certain place and at some point in time, which allows its anticipation and prevention through decision-making in public policy. This study focusses on finding the best way to predict crimes, that is, which types of features are the most important to consider while predicting crimes, and which methods are the most predictive. An analysis of the city of Philadelphia, in the state of Pennsylvania (USA), is made, taking into account the urban, racial, demographic and socioeconomic characteristics of its different geographical blocks, and the number of criminal occurrences in each of them, over multiple years. The methods used are both linear and non-linear. When non-linear methods are used, via machine learning techniques, it is evident that the prediction of the number of crimes is much more assertive for any type of variable, leading to the conclusion that the relationships studied here are not linear in nature, and therefore tree based models (especially gradient boosting and random forest) represent the most suitable approach for this data. In this perspective, the models that consider only the socio-demographic characteristics of the neighborhoods are significantly more effective in forecasting than the entirely urban ones.
publishDate 2023
dc.date.none.fl_str_mv 2023-06-26T13:20:36Z
2023-02-03
2023-01
2023-02-03T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.14/41436
TID:203278755
url http://hdl.handle.net/10400.14/41436
identifier_str_mv TID:203278755
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132067619930112