Comparing Natural Language Processing Methods to Cluster Construction Schedules

Ying Hong; Haiyan Xie; Gary Bhumbra; Ioannis Brilakis

Source: Journal of Construction Engineering and Management:;2021:;Volume ( 147 ):;issue: 010::page 04021136-1

Author:

DOI: 10.1061/(ASCE)CO.1943-7862.0002165

Publisher: ASCE

Abstract: The names of construction activities are the only unstructured data attribute in construction schedules, and they often guide construction execution. Activity names are devised to communicate between stakeholders, and therefore often are written using inconsistent terminologies across repetitive activities with omitted contextual information. This presents a challenge for machine learning systems when learning patterns from construction schedules. This paper compared the performance of state-of-the-art text-related clustering methods in identifying repetitive activities. This was achieved by creating a ground truth data set on the basis of the standard construction work classification, and then comparing the precision, recall, and F1 score of latent semantic analysis (LSA), latent Dirichlet allocation (LDA), word2vec, and fastText algorithms to group activity names in 27 construction schedules. Results indicated that the F1 score of LSA outperformed LDA (0.84% versus 0.88%), whereas the results of language models–based clustering depended on the quality of word embedding and the paired clustering method. This study provides insight into how to preprocess activity names of construction schedules for further artificial intelligence (AI)-based quantitative analysis. Methodologies described in this study will help researchers who work on natural language–related research in construction (e.g., safety and contract management) to better capture the feature of words, rather than only counting the word frequencies.

Download: (641.7Kb)
Show Full MetaData Hide Full MetaData
Get RIS
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

Comparing Natural Language Processing Methods to Cluster Construction Schedules

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/4272010

Collections

Journal of Construction Engineering and Management

Show full item record

contributor author	Ying Hong
contributor author	Haiyan Xie
contributor author	Gary Bhumbra
contributor author	Ioannis Brilakis
date accessioned	2022-02-01T21:46:33Z
date available	2022-02-01T21:46:33Z
date issued	10/1/2021
identifier other	%28ASCE%29CO.1943-7862.0002165.pdf
identifier uri	http://yetl.yabesh.ir/yetl1/handle/yetl/4272010
description abstract	The names of construction activities are the only unstructured data attribute in construction schedules, and they often guide construction execution. Activity names are devised to communicate between stakeholders, and therefore often are written using inconsistent terminologies across repetitive activities with omitted contextual information. This presents a challenge for machine learning systems when learning patterns from construction schedules. This paper compared the performance of state-of-the-art text-related clustering methods in identifying repetitive activities. This was achieved by creating a ground truth data set on the basis of the standard construction work classification, and then comparing the precision, recall, and F1 score of latent semantic analysis (LSA), latent Dirichlet allocation (LDA), word2vec, and fastText algorithms to group activity names in 27 construction schedules. Results indicated that the F1 score of LSA outperformed LDA (0.84% versus 0.88%), whereas the results of language models–based clustering depended on the quality of word embedding and the paired clustering method. This study provides insight into how to preprocess activity names of construction schedules for further artificial intelligence (AI)-based quantitative analysis. Methodologies described in this study will help researchers who work on natural language–related research in construction (e.g., safety and contract management) to better capture the feature of words, rather than only counting the word frequencies.
publisher	ASCE
title	Comparing Natural Language Processing Methods to Cluster Construction Schedules
type	Journal Paper
journal volume	147
journal issue	10
journal title	Journal of Construction Engineering and Management
identifier doi	10.1061/(ASCE)CO.1943-7862.0002165
journal fristpage	04021136-1
journal lastpage	04021136-11
page	11
tree	Journal of Construction Engineering and Management:;2021:;Volume ( 147 ):;issue: 010
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive