Deep Learning Image Captioning in Construction Management: A Feasibility Study

Bo Xiao; Yiheng Wang; Shih-Chung Kang

Source: Journal of Construction Engineering and Management:;2022:;Volume ( 148 ):;issue: 007::page 04022049

Author:

Bo Xiao

Yiheng Wang

Shih-Chung Kang

DOI: 10.1061/(ASCE)CO.1943-7862.0002297

Publisher: ASCE

Abstract: Deep learning image captioning methods are able to generate one or several natural sentences to describe the contents of construction images. By deconstructing these sentences, the construction object and activity information can be retrieved integrally for automated scene analysis. However, the feasibility of deep learning image captioning in construction remains unclear. To fill this gap, this research investigates the feasibility of deep learning image captioning methods in construction management. First, a linguistic schema for annotating construction machine images was established, and a captioning data set was developed. Then, six deep learning image captioning methods from the computer vision community were selected and tested on the construction captioning data set. In the sentence-level evaluation, the transformer-self-critical sequence training (Tsfm-SCST) method has obtained the best performance among six methods with the bilingual evaluation (BLEU)-1 score of 0.606, BLEU-2 of 0.506, BLEU-3 of 0.427, BLEU-4 of 0.349, metric for evaluation of translation with explicit ordering (METEOR) of 0.287, recall-oriented understudy for gisting evaluation (ROUGE) of 0.585, consensus-based image description evaluation (CIDEr) of 1.715, and semantic propositional image caption evaluation (SPICE) score of 0.422. In the element-level evaluation, the Tsfm-SCST method achieved an average precision of 91.1%, recall of 83.3%, and an F1 score of 86.6% for recognition of construction machine objects by deconstructing the generated sentences. This research indicates that deep learning image captioning is feasible as a method of generating accurate and precise text descriptions from construction images, with potential applications in construction scene analysis and image documentation.

Download: (2.676Mb)
Show Full MetaData Hide Full MetaData
Get RIS
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

Deep Learning Image Captioning in Construction Management: A Feasibility Study

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/4286109

Collections

Journal of Construction Engineering and Management

Show full item record

contributor author	Bo Xiao
contributor author	Yiheng Wang
contributor author	Shih-Chung Kang
date accessioned	2022-08-18T12:09:41Z
date available	2022-08-18T12:09:41Z
date issued	2022/04/22
identifier other	%28ASCE%29CO.1943-7862.0002297.pdf
identifier uri	http://yetl.yabesh.ir/yetl1/handle/yetl/4286109
description abstract	Deep learning image captioning methods are able to generate one or several natural sentences to describe the contents of construction images. By deconstructing these sentences, the construction object and activity information can be retrieved integrally for automated scene analysis. However, the feasibility of deep learning image captioning in construction remains unclear. To fill this gap, this research investigates the feasibility of deep learning image captioning methods in construction management. First, a linguistic schema for annotating construction machine images was established, and a captioning data set was developed. Then, six deep learning image captioning methods from the computer vision community were selected and tested on the construction captioning data set. In the sentence-level evaluation, the transformer-self-critical sequence training (Tsfm-SCST) method has obtained the best performance among six methods with the bilingual evaluation (BLEU)-1 score of 0.606, BLEU-2 of 0.506, BLEU-3 of 0.427, BLEU-4 of 0.349, metric for evaluation of translation with explicit ordering (METEOR) of 0.287, recall-oriented understudy for gisting evaluation (ROUGE) of 0.585, consensus-based image description evaluation (CIDEr) of 1.715, and semantic propositional image caption evaluation (SPICE) score of 0.422. In the element-level evaluation, the Tsfm-SCST method achieved an average precision of 91.1%, recall of 83.3%, and an F1 score of 86.6% for recognition of construction machine objects by deconstructing the generated sentences. This research indicates that deep learning image captioning is feasible as a method of generating accurate and precise text descriptions from construction images, with potential applications in construction scene analysis and image documentation.
publisher	ASCE
title	Deep Learning Image Captioning in Construction Management: A Feasibility Study
type	Journal Article
journal volume	148
journal issue	7
journal title	Journal of Construction Engineering and Management
identifier doi	10.1061/(ASCE)CO.1943-7862.0002297
journal fristpage	04022049
journal lastpage	04022049-14
page	14
tree	Journal of Construction Engineering and Management:;2022:;Volume ( 148 ):;issue: 007
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive