Spatial crash prediction models: an evaluation of the impacts of enriched information on model performance and the suitability of different spatial modeling approaches

Detalhes bibliográficos
Autor(a) principal: Gomes, Monique Martins
Data de Publicação: 2018
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: http://www.teses.usp.br/teses/disponiveis/18/18144/tde-18022019-112104/
Resumo: The unavailability of crash-related data has been a long lasting challenge in Brazil. In addition to the poor implementation and follow-up of road safety strategies, this drawback has hampered the development of studies that could contribute to national goals toward road safety. In contrast, developed countries have built their effective strategies on solid data basis, therefore, investing a considerable time and money in obtaining and creating pertinent information. In this research, we aim to assess the potential impacts of supplementary data on spatial model performance and the suitability of different spatial modeling approaches on crash prediction. The intention is to notify the authorities in Brazil and other developing countries, about the importance of having appropriate data. In this thesis, we set two specific objectives: (I) to investigate the spatial model prediction accuracy at unsampled subzones; (II) to evaluate the performance of spatial data analysis approaches on crash prediction. Firstly, we carry out a benchmarking based on Geographically Weighted Regression (GWR) models developed for Flanders, Belgium, and São Paulo, Brazil. Models are developed for two modes of transport: active (i.e. pedestrians and cyclists) and motorized transport (i.e. motorized vehicles occupants). Subsequently, we apply the repeated holdout method on the Flemish models, introducing two GWR validation approaches, named GWR holdout1 and GWR holdout2. While the former is based on the local coefficient estimates derived from the neighboring subzones and measures of the explanatory variables for the validation subzones, the latter uses the casualty estimates of the neighboring subzones directly to estimate outcomes for the missing subzones. Lastly, we compare the performance of GWR models with Mean Imputation (MEI), K-Nearest Neighbor (KNN) and Kriging with External Drift (KED). Findings showed that by adding the supplementary data, reductions of 20% and 25% for motorized transport, and 25% and 35% for active transport resulted in corrected Akaike Information Criterion (AICc) and Mean Squared Prediction Errors (MSPE), respectively. From a practical perspective, the results could help us identify hotspots and prioritize data collection strategies besides identify, implement and enforce appropriate countermeasures. Concerning the spatial approaches, GWR holdout2 out performed all other techniques and proved that GWR is an appropriate spatial technique for both prediction and impact analyses. Especially in countries where data availability has been an issue, this validation framework allows casualties or crash frequencies to be estimated while effectively capturing the spatial variation of the data.
id USP_c492cb382be224df12b1e0fa5ebaf414
oai_identifier_str oai:teses.usp.br:tde-18022019-112104
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Spatial crash prediction models: an evaluation of the impacts of enriched information on model performance and the suitability of different spatial modeling approachesModelos espaciais de previsão de acidentes: uma avaliação do desempenho dos modelos a partir da incorporação de informações aprimoradas e a adequação de diferentes abordagens de modelagem espacialHoldout repetidoCrash prediction modelsGeoestatísticaGeographically weighted regressionGeostatisticsModelos de previsão de acidentesModelos espaciais de prediçãoRegressão geograficamente ponderadaRepeated holdoutRoad safetySegurança no trânsitoSpatial prediction modelsThe unavailability of crash-related data has been a long lasting challenge in Brazil. In addition to the poor implementation and follow-up of road safety strategies, this drawback has hampered the development of studies that could contribute to national goals toward road safety. In contrast, developed countries have built their effective strategies on solid data basis, therefore, investing a considerable time and money in obtaining and creating pertinent information. In this research, we aim to assess the potential impacts of supplementary data on spatial model performance and the suitability of different spatial modeling approaches on crash prediction. The intention is to notify the authorities in Brazil and other developing countries, about the importance of having appropriate data. In this thesis, we set two specific objectives: (I) to investigate the spatial model prediction accuracy at unsampled subzones; (II) to evaluate the performance of spatial data analysis approaches on crash prediction. Firstly, we carry out a benchmarking based on Geographically Weighted Regression (GWR) models developed for Flanders, Belgium, and São Paulo, Brazil. Models are developed for two modes of transport: active (i.e. pedestrians and cyclists) and motorized transport (i.e. motorized vehicles occupants). Subsequently, we apply the repeated holdout method on the Flemish models, introducing two GWR validation approaches, named GWR holdout1 and GWR holdout2. While the former is based on the local coefficient estimates derived from the neighboring subzones and measures of the explanatory variables for the validation subzones, the latter uses the casualty estimates of the neighboring subzones directly to estimate outcomes for the missing subzones. Lastly, we compare the performance of GWR models with Mean Imputation (MEI), K-Nearest Neighbor (KNN) and Kriging with External Drift (KED). Findings showed that by adding the supplementary data, reductions of 20% and 25% for motorized transport, and 25% and 35% for active transport resulted in corrected Akaike Information Criterion (AICc) and Mean Squared Prediction Errors (MSPE), respectively. From a practical perspective, the results could help us identify hotspots and prioritize data collection strategies besides identify, implement and enforce appropriate countermeasures. Concerning the spatial approaches, GWR holdout2 out performed all other techniques and proved that GWR is an appropriate spatial technique for both prediction and impact analyses. Especially in countries where data availability has been an issue, this validation framework allows casualties or crash frequencies to be estimated while effectively capturing the spatial variation of the data.A indisponibilidade de variáveis explicativas de acidentes de trânsito tem sido um desafio duradouro no Brasil. Além da má implementação e acompanhamento de estratégias de segurança viária, esse inconveniente tem dificultado o desenvolvimento de estudos que poderiam contribuir com as metas nacionais de segurança no trânsito. Em contraste, países desenvolvidos tem construído suas estratégias efetivas com base em dados sólidos, e portanto, investindo tempo e dinheiro consideráveis na obtenção e criação de informações pertinentes. O objetivo dessa pesquisa é avaliar os possíveis impactos de dados suplementares sobre o desempenho de modelos espaciais, e a adequação de diferentes abordagens de modelagem espacial na previsão de acidentes. A intenção é notificar as autoridades brasileiras e de outros países em desenvolvimento sobre a importância de dados adequados. Nesta tese, foram definidos dois objetivos específicos: (I) investigar a acurácia do modelo espacial em subzonas sem amostragem; (II) avaliar o desempenho de técnicas de análise espacial de dados na previsão de acidentes. Primeiramente, foi realizado um estudo comparativo, baseado em modelos desenvolvidos para Flandres (Bélgica) e São Paulo (Brasil), através do método de Regressão Geograficamente Ponderada (RGP). Os modelos foram desenvolvidos para dois modos de transporte: ativos (pedestres e ciclistas) e motorizados (ocupantes de veículos motorizados). Subsequentemente, foi aplicado o método de holdout repetido nos modelos Flamengos, introduzindo duas abordagens de validação para GWR, denominados RGP holdout1 e RGP holdout2. Enquanto o primeiro é baseado nas estimativas de coeficientes locais derivados das subzonas vizinhas e medidas das variáveis explicativas para as subzonas de validação, o último usa as estimativas de acidentes das subzonas vizinhas, diretamente, para estimar os resultados para as subzonas ausentes. Por fim, foi comparado o desempenho de modelos RGP e outras abordagens, tais como Imputação pela Média de dados faltantes (IM), K-vizinhos mais próximos (KNN) e Krigagem com Deriva Externa (KDE). Os resultados mostraram que, adicionando os dados suplementares, reduções de 20% e 25% para o transporte motorizado, e 25% e 35% para o transporte ativo, foram resultantes em termos de Critério de Informação de Akaike corrigido (AICc) e Erro Quadrático Médio da Predição (EQMP), respectivamente. Do ponto de vista prático, os resultados poderiam ajudar a identificar hotspots e priorizar estratégias de coleta de dados, além de identificar, implementar e aplicar contramedidas adequadas. No que diz respeito às abordagens espaciais, RGP holdout2 teve melhor desempenho em relação a todas as outras técnicas e, provou que a RGP é uma técnica espacial apropriada para ambas as análises de previsão e impactos. Especialmente em países onde a disponibilidade de dados tem sido um problema, essa estrutura de validação permite que as acidentes sejam estimados enquanto, capturando efetivamente a variação espacial dos dados.Biblioteca Digitais de Teses e Dissertações da USPPitombo, Cira SouzaGomes, Monique Martins2018-12-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/18/18144/tde-18022019-112104/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-10-09T13:16:04Zoai:teses.usp.br:tde-18022019-112104Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-10-09T13:16:04Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Spatial crash prediction models: an evaluation of the impacts of enriched information on model performance and the suitability of different spatial modeling approaches
Modelos espaciais de previsão de acidentes: uma avaliação do desempenho dos modelos a partir da incorporação de informações aprimoradas e a adequação de diferentes abordagens de modelagem espacial
title Spatial crash prediction models: an evaluation of the impacts of enriched information on model performance and the suitability of different spatial modeling approaches
spellingShingle Spatial crash prediction models: an evaluation of the impacts of enriched information on model performance and the suitability of different spatial modeling approaches
Gomes, Monique Martins
Holdout repetido
Crash prediction models
Geoestatística
Geographically weighted regression
Geostatistics
Modelos de previsão de acidentes
Modelos espaciais de predição
Regressão geograficamente ponderada
Repeated holdout
Road safety
Segurança no trânsito
Spatial prediction models
title_short Spatial crash prediction models: an evaluation of the impacts of enriched information on model performance and the suitability of different spatial modeling approaches
title_full Spatial crash prediction models: an evaluation of the impacts of enriched information on model performance and the suitability of different spatial modeling approaches
title_fullStr Spatial crash prediction models: an evaluation of the impacts of enriched information on model performance and the suitability of different spatial modeling approaches
title_full_unstemmed Spatial crash prediction models: an evaluation of the impacts of enriched information on model performance and the suitability of different spatial modeling approaches
title_sort Spatial crash prediction models: an evaluation of the impacts of enriched information on model performance and the suitability of different spatial modeling approaches
author Gomes, Monique Martins
author_facet Gomes, Monique Martins
author_role author
dc.contributor.none.fl_str_mv Pitombo, Cira Souza
dc.contributor.author.fl_str_mv Gomes, Monique Martins
dc.subject.por.fl_str_mv Holdout repetido
Crash prediction models
Geoestatística
Geographically weighted regression
Geostatistics
Modelos de previsão de acidentes
Modelos espaciais de predição
Regressão geograficamente ponderada
Repeated holdout
Road safety
Segurança no trânsito
Spatial prediction models
topic Holdout repetido
Crash prediction models
Geoestatística
Geographically weighted regression
Geostatistics
Modelos de previsão de acidentes
Modelos espaciais de predição
Regressão geograficamente ponderada
Repeated holdout
Road safety
Segurança no trânsito
Spatial prediction models
description The unavailability of crash-related data has been a long lasting challenge in Brazil. In addition to the poor implementation and follow-up of road safety strategies, this drawback has hampered the development of studies that could contribute to national goals toward road safety. In contrast, developed countries have built their effective strategies on solid data basis, therefore, investing a considerable time and money in obtaining and creating pertinent information. In this research, we aim to assess the potential impacts of supplementary data on spatial model performance and the suitability of different spatial modeling approaches on crash prediction. The intention is to notify the authorities in Brazil and other developing countries, about the importance of having appropriate data. In this thesis, we set two specific objectives: (I) to investigate the spatial model prediction accuracy at unsampled subzones; (II) to evaluate the performance of spatial data analysis approaches on crash prediction. Firstly, we carry out a benchmarking based on Geographically Weighted Regression (GWR) models developed for Flanders, Belgium, and São Paulo, Brazil. Models are developed for two modes of transport: active (i.e. pedestrians and cyclists) and motorized transport (i.e. motorized vehicles occupants). Subsequently, we apply the repeated holdout method on the Flemish models, introducing two GWR validation approaches, named GWR holdout1 and GWR holdout2. While the former is based on the local coefficient estimates derived from the neighboring subzones and measures of the explanatory variables for the validation subzones, the latter uses the casualty estimates of the neighboring subzones directly to estimate outcomes for the missing subzones. Lastly, we compare the performance of GWR models with Mean Imputation (MEI), K-Nearest Neighbor (KNN) and Kriging with External Drift (KED). Findings showed that by adding the supplementary data, reductions of 20% and 25% for motorized transport, and 25% and 35% for active transport resulted in corrected Akaike Information Criterion (AICc) and Mean Squared Prediction Errors (MSPE), respectively. From a practical perspective, the results could help us identify hotspots and prioritize data collection strategies besides identify, implement and enforce appropriate countermeasures. Concerning the spatial approaches, GWR holdout2 out performed all other techniques and proved that GWR is an appropriate spatial technique for both prediction and impact analyses. Especially in countries where data availability has been an issue, this validation framework allows casualties or crash frequencies to be estimated while effectively capturing the spatial variation of the data.
publishDate 2018
dc.date.none.fl_str_mv 2018-12-04
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.teses.usp.br/teses/disponiveis/18/18144/tde-18022019-112104/
url http://www.teses.usp.br/teses/disponiveis/18/18144/tde-18022019-112104/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815256516498817024