Automatic Identification of Causal Factors from Fall-Related Accident Investigation Reports Using Machine Learning and Ensemble Learning ApproachesSource: Journal of Management in Engineering:;2024:;Volume ( 040 ):;issue: 001::page 04023050-1Author:Haonan Qi
,
Zhipeng Zhou
,
Javier Irizarry
,
Dong Lin
,
Haoyu Zhang
,
Nan Li
,
Jianqiang Cui
DOI: 10.1061/JMENEA.MEENG-5485Publisher: ASCE
Abstract: To enhance the performance of learning from past fall-related accidents, this study developed an innovative framework for automatically extracting every individual causal factor from accident investigation reports based upon the modified framework of the human factors analysis and classification system. Multiple techniques including the synthetic minority oversampling technique (SMOTE) algorithm for handling imbalanced data, soft voting with unequal weights for ensemble learning, and hyperparameter optimization were adopted to improve automatic identification of causal factors from unstructured text data. Experimental results denoted there were no classifiers with the best accuracy and F1 score unanimously for any of the 19 subcategories of causal factors. Therefore, one or more specific classifiers were preferred for predicting one specific causal factor with the best performance. Further comparative analyses between seven classifiers demonstrated that the ensemble learning model by the algorithm of soft voting (ELSV) could provide more stable predictions with low variance across different causal factors compared with individual machine learning models. It was suggested that the ELSV ought to be prioritized for collectively identifying all 19 causal factors. These findings are beneficial for substantial learning from past fall-related accidents with high efficiency and reliability, and valuable insights can be discerned and utilized for controlling the risk of fall-from-height at construction sites. This study aims to propose an innovative framework based on multiple machine learning models (i.e., support vector machine, naive Bayes, decision tree, k-nearest neighbors, random forest, and multilayer perceptron) and one ensemble learning approach. Several techniques (i.e., SMOTE for handling imbalanced data, soft voting with unequal weights for ensemble learning, and hyperparameter optimization) were used for improving automatic identification of causal factors. It was found that there were no best classifiers unanimously for all 19 subcategories of causal factors. Comparative analysis results between seven classifiers demonstrated that the ensemble learning approach was able to provide more stable predictions with low variance across various causal factors compared with individual machine learning models. This innovative framework provides a feasible method of automatic identification of causal factors from fall-from-height postaccident investigation reports at construction workplaces. It decreases the time and subjectivity through a manual process, enhancing the efficiency and reliability in extracting causal factors. It also satisfies the requirement that an investigation process should be implemented as fast as possible after an accident. Safety managers on site will adopt corrective and preventive measures to deal with causal factors immediately, in order to effectively reduce falling risks in the construction industry.
|
Collections
Show full item record
contributor author | Haonan Qi | |
contributor author | Zhipeng Zhou | |
contributor author | Javier Irizarry | |
contributor author | Dong Lin | |
contributor author | Haoyu Zhang | |
contributor author | Nan Li | |
contributor author | Jianqiang Cui | |
date accessioned | 2024-04-27T22:23:18Z | |
date available | 2024-04-27T22:23:18Z | |
date issued | 2024/01/01 | |
identifier other | 10.1061-JMENEA.MEENG-5485.pdf | |
identifier uri | http://yetl.yabesh.ir/yetl1/handle/yetl/4296542 | |
description abstract | To enhance the performance of learning from past fall-related accidents, this study developed an innovative framework for automatically extracting every individual causal factor from accident investigation reports based upon the modified framework of the human factors analysis and classification system. Multiple techniques including the synthetic minority oversampling technique (SMOTE) algorithm for handling imbalanced data, soft voting with unequal weights for ensemble learning, and hyperparameter optimization were adopted to improve automatic identification of causal factors from unstructured text data. Experimental results denoted there were no classifiers with the best accuracy and F1 score unanimously for any of the 19 subcategories of causal factors. Therefore, one or more specific classifiers were preferred for predicting one specific causal factor with the best performance. Further comparative analyses between seven classifiers demonstrated that the ensemble learning model by the algorithm of soft voting (ELSV) could provide more stable predictions with low variance across different causal factors compared with individual machine learning models. It was suggested that the ELSV ought to be prioritized for collectively identifying all 19 causal factors. These findings are beneficial for substantial learning from past fall-related accidents with high efficiency and reliability, and valuable insights can be discerned and utilized for controlling the risk of fall-from-height at construction sites. This study aims to propose an innovative framework based on multiple machine learning models (i.e., support vector machine, naive Bayes, decision tree, k-nearest neighbors, random forest, and multilayer perceptron) and one ensemble learning approach. Several techniques (i.e., SMOTE for handling imbalanced data, soft voting with unequal weights for ensemble learning, and hyperparameter optimization) were used for improving automatic identification of causal factors. It was found that there were no best classifiers unanimously for all 19 subcategories of causal factors. Comparative analysis results between seven classifiers demonstrated that the ensemble learning approach was able to provide more stable predictions with low variance across various causal factors compared with individual machine learning models. This innovative framework provides a feasible method of automatic identification of causal factors from fall-from-height postaccident investigation reports at construction workplaces. It decreases the time and subjectivity through a manual process, enhancing the efficiency and reliability in extracting causal factors. It also satisfies the requirement that an investigation process should be implemented as fast as possible after an accident. Safety managers on site will adopt corrective and preventive measures to deal with causal factors immediately, in order to effectively reduce falling risks in the construction industry. | |
publisher | ASCE | |
title | Automatic Identification of Causal Factors from Fall-Related Accident Investigation Reports Using Machine Learning and Ensemble Learning Approaches | |
type | Journal Article | |
journal volume | 40 | |
journal issue | 1 | |
journal title | Journal of Management in Engineering | |
identifier doi | 10.1061/JMENEA.MEENG-5485 | |
journal fristpage | 04023050-1 | |
journal lastpage | 04023050-17 | |
page | 17 | |
tree | Journal of Management in Engineering:;2024:;Volume ( 040 ):;issue: 001 | |
contenttype | Fulltext |