description abstract | Construction work zone crashes represent a critical area of concern within the realm of traffic safety, posing unique challenges for both road users and transportation authorities. Common factors contributing to work zone crashes include reduced speeds, lane closures, and the presence of construction equipment and workers. Fatal crashes within work zones, while relatively rare compared to nonfatal incidents, carry substantial significance due to their severe consequences. The complexity and variability of work zone conditions, coupled with the infrequency of fatal crashes, make it challenging for both machine learning (ML) and statistical models to predict them accurately. Furthermore, existing ML models for predicting crash severity are computationally demanding, which may not be feasible in all situations. To this end, this paper investigates the potential use of conditional tabular generative adversarial networks (CTGAN) and knowledge distillation (KD) in overcoming these challenges through a comprehensive framework. The results demonstrate that synthetic data generated by CTGAN markedly boost the models’ ability to identify underrepresented classes by up to 15.2 percentage points. Moreover, the distillation process exhibited promising outcomes in enhancing the performance of simpler models, such as decision trees, which could be beneficial for deployment on devices with limited computational resources. Construction work zone crashes pose significant challenges for transportation authorities and safety managers due to the dynamic and complex nature of these environments. Common contributing factors to work zone crashes include reduced speeds, lane closures, and the presence of construction equipment and workers. The severity of crashes, particularly fatal incidents, underscores the need for accurate and efficient prediction models. To this end, this study presents a hybrid framework combining CTGAN and KD to enhance the predictive accuracy and efficiency of crash severity models in work zones. By generating synthetic data, CTGAN addresses the challenge of imbalanced data sets, particularly for low-frequency, high-severity crashes. This leads to a more comprehensive data set for training predictive models, resulting in improved accuracy. The framework also leverages KD to distill knowledge from complex models into smaller, efficient models suitable for deployment on mobile devices. This allows DOT safety managers to run predictive models in the field, enabling real-time decision-making and rapid response to safety concerns. This approach is particularly beneficial for DOT safety managers who often work in field environments with limited computational resources. | |