Semantic N-Gram Feature Analysis and Machine Learning&#x2013;Based Classification of Drivers&#x2019; Hazardous Actions at Signal-Controlled Intersections

Keneth Morgan Kwayu; Valerian Kwigizile; Jiansong Zhang; Jun-Seok Oh

Source: Journal of Computing in Civil Engineering:;2020:;Volume ( 034 ):;issue: 004

Author:

DOI: 10.1061/(ASCE)CP.1943-5487.0000895

Publisher: ASCE

Abstract: In the United States, it is common for crash reports to include a narrative that contains a police officer’s written summary of the crash. The crash narratives provide valuable information that can assist in understanding circumstances surrounding a crash at a given roadway location. However, the crash report narratives contain unstructured textual information, which is hard to extract or utilize in analyses considering there are hundreds of thousands of reports. This study uses Michigan’s crash reports (UD-10) to demonstrate how natural language processing (NLP) techniques can be useful in extracting information from the UD-10 crash report narratives to better understand crash scenarios. Reports of crashes at signal-controlled intersections in Michigan involving responsible (i.e., at fault) drivers who were issued a “fail to yield” or “disregard traffic control” hazardous action citation were used in the analysis. Semantic analysis was conducted to discern the most likely crash scenario at signal-controlled intersections for each of the hazardous action with respect to the responsible driver’s movement. Support vector machines and boosted classification trees were developed using unigram and bigram features with different n-gram feature deployment scenarios to predict hazardous action citations. Support vector machines using a mixture of unigram and bigram features performed better than the boosted classification tree, with an out-of-sample predictive accuracy of 86.1 percent and area under Receiver Operating Curve (ROC) of 0.917. Overall, the results can help safety engineers and analysts to ascertain the causes of a crash by detailing the chain of precrash events leading to a crash.

Download: (3.036Mb)
Show Full MetaData Hide Full MetaData
Get RIS
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

Semantic N-Gram Feature Analysis and Machine Learning–Based Classification of Drivers’ Hazardous Actions at Signal-Controlled Intersections

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/4265263

Collections

Journal of Computing in Civil Engineering

Show full item record

contributor author	Keneth Morgan Kwayu
contributor author	Valerian Kwigizile
contributor author	Jiansong Zhang
contributor author	Jun-Seok Oh
date accessioned	2022-01-30T19:25:05Z
date available	2022-01-30T19:25:05Z
date issued	2020
identifier other	%28ASCE%29CP.1943-5487.0000895.pdf
identifier uri	http://yetl.yabesh.ir/yetl1/handle/yetl/4265263
description abstract	In the United States, it is common for crash reports to include a narrative that contains a police officer’s written summary of the crash. The crash narratives provide valuable information that can assist in understanding circumstances surrounding a crash at a given roadway location. However, the crash report narratives contain unstructured textual information, which is hard to extract or utilize in analyses considering there are hundreds of thousands of reports. This study uses Michigan’s crash reports (UD-10) to demonstrate how natural language processing (NLP) techniques can be useful in extracting information from the UD-10 crash report narratives to better understand crash scenarios. Reports of crashes at signal-controlled intersections in Michigan involving responsible (i.e., at fault) drivers who were issued a “fail to yield” or “disregard traffic control” hazardous action citation were used in the analysis. Semantic analysis was conducted to discern the most likely crash scenario at signal-controlled intersections for each of the hazardous action with respect to the responsible driver’s movement. Support vector machines and boosted classification trees were developed using unigram and bigram features with different n-gram feature deployment scenarios to predict hazardous action citations. Support vector machines using a mixture of unigram and bigram features performed better than the boosted classification tree, with an out-of-sample predictive accuracy of 86.1 percent and area under Receiver Operating Curve (ROC) of 0.917. Overall, the results can help safety engineers and analysts to ascertain the causes of a crash by detailing the chain of precrash events leading to a crash.
publisher	ASCE
title	Semantic N-Gram Feature Analysis and Machine Learning–Based Classification of Drivers’ Hazardous Actions at Signal-Controlled Intersections
type	Journal Paper
journal volume	34
journal issue	4
journal title	Journal of Computing in Civil Engineering
identifier doi	10.1061/(ASCE)CP.1943-5487.0000895
page	04020015
tree	Journal of Computing in Civil Engineering:;2020:;Volume ( 034 ):;issue: 004
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive