Developing and Evaluating a Classification Model for Construction Defect Control: A Text Mining and Ensemble Learning Approach

Inho Jo; SangHyeok Han; Lei Hou; Sungkon Moon; Jae-Jun Kim

Source: Journal of Management in Engineering:;2025:;Volume ( 041 ):;issue: 002::page 04024071-1

Author:

DOI: 10.1061/JMENEA.MEENG-6296

Publisher: American Society of Civil Engineers

Abstract: In the construction industry, customer satisfaction is of paramount importance, as it significantly impacts company success and reputation. In Korea’s competitive apartment market, customer satisfaction—particularly feedback on newly built apartments—is vital for construction companies, as it fosters growth and customer loyalty. To gain an understanding of the sentiments and patterns within this feedback, text mining can be utilized. This study aims to extract such insights from textual data on apartment building defect complaints, using text mining and ensemble learning to develop models with high prediction accuracy. It analyzes the accuracy of the Word2Vec and term frequency–inverse document frequency (TF-IDF) models, as well as the individual performance of different classification models, including naïve Bayes, decision trees, logistic regression, k-nearest neighbors, support vector machines (SVMs), and random forests. This analysis was conducted to validate the effectiveness of ensemble learning. Data were collected from a total of 230 apartment building projects in South Korea between 2018 and 2023, resulting in a data set of 101,387 data points, which underwent analysis to validate the model. The validation results consistently showed that TF-IDF outperforms Word2Vec, with the SVM model achieving the highest performance, attaining an average F1 score of 0.7439. Ensemble learning models demonstrated an improvement in accuracy of up to 34% over single models, reaching an average accuracy of 97.47% after the removal of human error. While this study acknowledges its limitations, which include potential biases in the data set, the impact of language evolution on model precision, and difficulties in classifying complex defects, the ensemble model demonstrated substantial improvements in defect classification accuracy and provided practical insights for defect management in construction. Moving forward, future work could explore integrating multidimensional data, utilizing speech-to-text technology, prioritizing defects by severity, and employing artificial intelligence for real-time defect prediction to further enhance defect management practices.

Download: (1.051Mb)
Show Full MetaData Hide Full MetaData
Get RIS
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

Developing and Evaluating a Classification Model for Construction Defect Control: A Text Mining and Ensemble Learning Approach

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/4304326

Collections

Journal of Management in Engineering

Show full item record

contributor author	Inho Jo
contributor author	SangHyeok Han
contributor author	Lei Hou
contributor author	Sungkon Moon
contributor author	Jae-Jun Kim
date accessioned	2025-04-20T10:15:23Z
date available	2025-04-20T10:15:23Z
date copyright	12/4/2024 12:00:00 AM
date issued	2025
identifier other	JMENEA.MEENG-6296.pdf
identifier uri	http://yetl.yabesh.ir/yetl1/handle/yetl/4304326
description abstract	In the construction industry, customer satisfaction is of paramount importance, as it significantly impacts company success and reputation. In Korea’s competitive apartment market, customer satisfaction—particularly feedback on newly built apartments—is vital for construction companies, as it fosters growth and customer loyalty. To gain an understanding of the sentiments and patterns within this feedback, text mining can be utilized. This study aims to extract such insights from textual data on apartment building defect complaints, using text mining and ensemble learning to develop models with high prediction accuracy. It analyzes the accuracy of the Word2Vec and term frequency–inverse document frequency (TF-IDF) models, as well as the individual performance of different classification models, including naïve Bayes, decision trees, logistic regression, k-nearest neighbors, support vector machines (SVMs), and random forests. This analysis was conducted to validate the effectiveness of ensemble learning. Data were collected from a total of 230 apartment building projects in South Korea between 2018 and 2023, resulting in a data set of 101,387 data points, which underwent analysis to validate the model. The validation results consistently showed that TF-IDF outperforms Word2Vec, with the SVM model achieving the highest performance, attaining an average F1 score of 0.7439. Ensemble learning models demonstrated an improvement in accuracy of up to 34% over single models, reaching an average accuracy of 97.47% after the removal of human error. While this study acknowledges its limitations, which include potential biases in the data set, the impact of language evolution on model precision, and difficulties in classifying complex defects, the ensemble model demonstrated substantial improvements in defect classification accuracy and provided practical insights for defect management in construction. Moving forward, future work could explore integrating multidimensional data, utilizing speech-to-text technology, prioritizing defects by severity, and employing artificial intelligence for real-time defect prediction to further enhance defect management practices.
publisher	American Society of Civil Engineers
title	Developing and Evaluating a Classification Model for Construction Defect Control: A Text Mining and Ensemble Learning Approach
type	Journal Article
journal volume	41
journal issue	2
journal title	Journal of Management in Engineering
identifier doi	10.1061/JMENEA.MEENG-6296
journal fristpage	04024071-1
journal lastpage	04024071-15
page	15
tree	Journal of Management in Engineering:;2025:;Volume ( 041 ):;issue: 002
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive