journal article Open Access May 16, 2022

Comparison of Statistical and Machine-Learning Models on Road Traffic Accident Severity Classification

Computers Vol. 11 No. 5 pp. 80 · MDPI AG
View at Publisher Save 10.3390/computers11050080
Abstract
Portugal has the sixth highest road fatality rate among European Union members. This is a problem of different dimensions with serious consequences in people’s lives. This study analyses daily data from police and government authorities on road traffic accidents that occurred between 2016 and 2019 in a district of Portugal. This paper looks for the determinants that contribute to the existence of victims in road traffic accidents, as well as the determinants for fatalities and/or serious injuries in accidents with victims. We use logistic regression models, and the results are compared to the machine-learning model results. For the severity model, where the response variable indicates whether only property damage or casualties resulted in the traffic accident, we used a large sample with a small imbalance. For the serious injuries model, where the response variable indicates whether or not there were victims with serious injuries and/or fatalities in the traffic accident with victims, we used a small sample with very imbalanced data. Empirical analysis supports the conclusion that, with a small sample of imbalanced data, machine-learning models generally do not perform better than statistical models; however, they perform similarly when the sample is large and has a small imbalance.
Topics

No keywords indexed for this article. Browse by subject →

References
22
[1]
Belokurov "Determining passenger traffic as important factor in urban public transport system" Transp. Res. Procedia (2020) 10.1016/j.trpro.2020.10.007
[2]
World Health Organization (2022, January 25). Global Status Report on Road Safety 2018. Available online: https://apps.who.int/iris/bitstream/handle/10665/276462/9789241565684-eng.pdf?sequence=1&isAllowed=y.
[3]
Eurostat (2022, January 25). Road Accidents: Number of Fatalities Continues Falling. Available online: https://ec.europa.eu/eurostat/en/web/products-eurostat-news/-/ddn-20210624-1.
[4]
Lusa (2022, January 25). Sinistralidade Rodoviária tem Impacto Económico e Social Negativo de 1.2% do PIB-Governo. Available online: https://www.rtp.pt/noticias/pais/sinistralidade-rodoviaria-tem-impacto-economico-e-social-negativo-de-12-do-pib-governo_n1112193.
[5]
Savolainen "The statistical analysis of highway crash-injury severities: A review and assessment of methodological alternatives" Accid. Anal. Prev. (2011) 10.1016/j.aap.2011.03.025
[6]
Garrido "Prediction of road accident severity using the ordered probit model" Transp. Res. Procedia (2014) 10.1016/j.trpro.2014.10.107
[7]
Zhang "Comparing prediction performance for crash injury severity among various machine learning and statistical methods" IEEE Access (2018) 10.1109/access.2018.2874979
[8]
Silva "Machine learning applied to road safety modeling: A systematic literature review" J. Traffic Transp. Eng. (Engl. Ed.) (2020)
[9]
Jamal "Injury severity prediction of traffic crashes with ensemble machine learning techniques: A comparative study" Int. J. Inj. Control Saf. Promot. (2021) 10.1080/17457300.2021.1928233
[10]
Iranitalab "Comparison of four statistical and machine learning methods for crash severity prediction" Accid. Anal. Prev. (2017) 10.1016/j.aap.2017.08.008
[11]
Li "Impact of pavement conditions on crash severity" Accid. Anal. Prev. (2013) 10.1016/j.aap.2013.06.028
[12]
Martensen "Comparing single vehicle and multivehicle fatal road crashes: A joint analysis of road conditions, time variables and driver characteristics" Accid. Anal. Prev. (2013) 10.1016/j.aap.2013.03.005
[13]
Hosseinpour "Exploring the effects of roadway characteristics on the frequency and severity of head-on crashes: Case studies from Malaysian Federal Roads" Accid. Anal. Prev. (2014) 10.1016/j.aap.2013.10.001
[14]
Yasmin "A latent segmentation based generalized ordered logit model to examine factors influencing driver injury severity" Anal. Methods Accid. Res. (2014)
[15]
Rezapour "Ordered logistic models of influencing factors on crash injury severity of single and multiple-vehicle downgrade crashes: A case study in Wyoming" J. Saf. Res. (2019) 10.1016/j.jsr.2018.12.006
[16]
ANSR (2022, January 25). Manual de Prenchimento. Boletim Estatístico de Acidente de Viação. Available online: http://www.ansr.pt/Estatisticas/BEAV/Documents/MANUALPREENCHIMENTOBEAV.pdf.
[17]
Hosmer, D.W., Lemeshow, S., and Sturdivant, R.X. (2013). Applied Logistic Regression, John Wiley & Sons. 10.1002/9781118548387
[18]
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers.
[19]
Research, R. (2022, January 25). Is See5/C5.0 Better Than C4.5?. Available online: https://rulequest.com/see5-comparison.html.
[20]
Learning from Imbalanced Data

Haibo He, E.A. Garcia

IEEE Transactions on Knowledge and Data Engineerin... 2009 10.1109/tkde.2008.239
[21]
R Core Team (2021). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing.
[22]
Fiorentini, N., and Losa, M. (2020). Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms. Infrastructures, 5. 10.3390/infrastructures5070061
Metrics
41
Citations
22
References
Details
Published
May 16, 2022
Vol/Issue
11(5)
Pages
80
License
View
Authors
Cite This Article
Paulo Infante, Gonçalo Jacinto, Anabela Afonso, et al. (2022). Comparison of Statistical and Machine-Learning Models on Road Traffic Accident Severity Classification. Computers, 11(5), 80. https://doi.org/10.3390/computers11050080
Related

You May Also Like