Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification

Detalhes bibliográficos
Autor(a) principal: infante, paulo
Data de Publicação: 2022
Outros Autores: Jacinto, Gonçalo, Afonso, Anabela, Rego, Leonor, Nogueira, Vitor, Quaresma, Paulo, Saias, José, Santos, Daniel, Nogueira, Pedro, Silva, Marcelo, Costa, Rosalina, Góis, Patrícia, Manuel, Paulo Rebelo
Tipo de documento: Artigo
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/33513
https://doi.org/3. Infante, P., Jacinto, G., Afonso, A., Rego, L., Nogueira, V., Quaresma, P., Saias, J., Santos, D., Nogueira, P., Silva, M., Costa, R. P., Gois, P., Manuel, P. R. (2022). Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification. Computers, 11, 80. https://doi.org/10.3390/computers11050080
https://doi.org/10.3390/computers11050080
Resumo: Portugal has the sixth highest road fatality rate among European Union members. This is a problem of different dimensions with serious consequences in people’s lives. This study analyses daily data from police and government authorities on road traffic accidents that occurred between 2016 and 2019 in a district of Portugal. This paper looks for the determinants that contribute to the existence of victims in road traffic accidents, as well as the determinants for fatalities and/or serious injuries in accidents with victims. We use logistic regression models, and the results are compared to the machine-learning model results. For the severity model, where the response variable indicates whether only property damage or casualties resulted in the traffic accident, we used a large sample with a small imbalance. For the serious injuries model, where the response variable indicates whether or not there were victims with serious injuries and/or fatalities in the traffic accident with victims, we used a small sample with very imbalanced data. Empirical analysis supports the conclusion that, with a small sample of imbalanced data, machine-learning models generally do not perform better than statistical models; however, they perform similarly when the sample is large and has a small imbalance.
id RCAP_94cdf8f2e1b014bd0f380dc7ed2fee88
oai_identifier_str oai:dspace.uevora.pt:10174/33513
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classificationinjurylogistic regressionmachine learningroad traffic accidentsseverity of victimsPortugal has the sixth highest road fatality rate among European Union members. This is a problem of different dimensions with serious consequences in people’s lives. This study analyses daily data from police and government authorities on road traffic accidents that occurred between 2016 and 2019 in a district of Portugal. This paper looks for the determinants that contribute to the existence of victims in road traffic accidents, as well as the determinants for fatalities and/or serious injuries in accidents with victims. We use logistic regression models, and the results are compared to the machine-learning model results. For the severity model, where the response variable indicates whether only property damage or casualties resulted in the traffic accident, we used a large sample with a small imbalance. For the serious injuries model, where the response variable indicates whether or not there were victims with serious injuries and/or fatalities in the traffic accident with victims, we used a small sample with very imbalanced data. Empirical analysis supports the conclusion that, with a small sample of imbalanced data, machine-learning models generally do not perform better than statistical models; however, they perform similarly when the sample is large and has a small imbalance.2023-01-17T12:03:20Z2023-01-172022-05-16T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/33513https://doi.org/3. Infante, P., Jacinto, G., Afonso, A., Rego, L., Nogueira, V., Quaresma, P., Saias, J., Santos, D., Nogueira, P., Silva, M., Costa, R. P., Gois, P., Manuel, P. R. (2022). Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification. Computers, 11, 80. https://doi.org/10.3390/computers11050080http://hdl.handle.net/10174/33513https://doi.org/10.3390/computers11050080por805Computers11pinfante@uevora.ptgjcj@uevora.ptaafonso@uevora.ptlrego@uevora.ptvbn@uevora.ptpq@uevora.ptjsaias@uevora.ptdfsantos@uevora.ptpmn@uevora.ptmarcelogs@uevora.ptrosalina@uevora.ptpafg@uevora.ptpjsrm@uevora.pt336infante, pauloJacinto, GonçaloAfonso, AnabelaRego, LeonorNogueira, VitorQuaresma, PauloSaias, JoséSantos, DanielNogueira, PedroSilva, MarceloCosta, RosalinaGóis, PatríciaManuel, Paulo Rebeloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:34:47Zoai:dspace.uevora.pt:10174/33513Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:22:05.428882Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification
title Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification
spellingShingle Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification
infante, paulo
injury
logistic regression
machine learning
road traffic accidents
severity of victims
title_short Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification
title_full Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification
title_fullStr Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification
title_full_unstemmed Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification
title_sort Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification
author infante, paulo
author_facet infante, paulo
Jacinto, Gonçalo
Afonso, Anabela
Rego, Leonor
Nogueira, Vitor
Quaresma, Paulo
Saias, José
Santos, Daniel
Nogueira, Pedro
Silva, Marcelo
Costa, Rosalina
Góis, Patrícia
Manuel, Paulo Rebelo
author_role author
author2 Jacinto, Gonçalo
Afonso, Anabela
Rego, Leonor
Nogueira, Vitor
Quaresma, Paulo
Saias, José
Santos, Daniel
Nogueira, Pedro
Silva, Marcelo
Costa, Rosalina
Góis, Patrícia
Manuel, Paulo Rebelo
author2_role author
author
author
author
author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv infante, paulo
Jacinto, Gonçalo
Afonso, Anabela
Rego, Leonor
Nogueira, Vitor
Quaresma, Paulo
Saias, José
Santos, Daniel
Nogueira, Pedro
Silva, Marcelo
Costa, Rosalina
Góis, Patrícia
Manuel, Paulo Rebelo
dc.subject.por.fl_str_mv injury
logistic regression
machine learning
road traffic accidents
severity of victims
topic injury
logistic regression
machine learning
road traffic accidents
severity of victims
description Portugal has the sixth highest road fatality rate among European Union members. This is a problem of different dimensions with serious consequences in people’s lives. This study analyses daily data from police and government authorities on road traffic accidents that occurred between 2016 and 2019 in a district of Portugal. This paper looks for the determinants that contribute to the existence of victims in road traffic accidents, as well as the determinants for fatalities and/or serious injuries in accidents with victims. We use logistic regression models, and the results are compared to the machine-learning model results. For the severity model, where the response variable indicates whether only property damage or casualties resulted in the traffic accident, we used a large sample with a small imbalance. For the serious injuries model, where the response variable indicates whether or not there were victims with serious injuries and/or fatalities in the traffic accident with victims, we used a small sample with very imbalanced data. Empirical analysis supports the conclusion that, with a small sample of imbalanced data, machine-learning models generally do not perform better than statistical models; however, they perform similarly when the sample is large and has a small imbalance.
publishDate 2022
dc.date.none.fl_str_mv 2022-05-16T00:00:00Z
2023-01-17T12:03:20Z
2023-01-17
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/33513
https://doi.org/3. Infante, P., Jacinto, G., Afonso, A., Rego, L., Nogueira, V., Quaresma, P., Saias, J., Santos, D., Nogueira, P., Silva, M., Costa, R. P., Gois, P., Manuel, P. R. (2022). Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification. Computers, 11, 80. https://doi.org/10.3390/computers11050080
http://hdl.handle.net/10174/33513
https://doi.org/10.3390/computers11050080
url http://hdl.handle.net/10174/33513
https://doi.org/3. Infante, P., Jacinto, G., Afonso, A., Rego, L., Nogueira, V., Quaresma, P., Saias, J., Santos, D., Nogueira, P., Silva, M., Costa, R. P., Gois, P., Manuel, P. R. (2022). Comparison of Statistical and Machine Learning Models on Road Traffic Accident Severity Classification. Computers, 11, 80. https://doi.org/10.3390/computers11050080
https://doi.org/10.3390/computers11050080
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv 80
5
Computers
11
pinfante@uevora.pt
gjcj@uevora.pt
aafonso@uevora.pt
lrego@uevora.pt
vbn@uevora.pt
pq@uevora.pt
jsaias@uevora.pt
dfsantos@uevora.pt
pmn@uevora.pt
marcelogs@uevora.pt
rosalina@uevora.pt
pafg@uevora.pt
pjsrm@uevora.pt
336
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136702330044416