Multitask Learning Method for Detecting the Visual Focus of Attention of Construction Workers

Jiannan Cai; Liu Yang; Yuxi Zhang; Shuai Li; Hubo Cai

contributor author	Jiannan Cai
contributor author	Liu Yang
contributor author	Yuxi Zhang
contributor author	Shuai Li
contributor author	Hubo Cai
date accessioned	2022-02-01T00:11:08Z
date available	2022-02-01T00:11:08Z
date issued	7/1/2021
identifier other	%28ASCE%29CO.1943-7862.0002071.pdf
identifier uri	http://yetl.yabesh.ir/yetl1/handle/yetl/4271043
description abstract	The visual focus of attention (VFOA) of construction workers is a critical cue for recognizing entity interactions, which in turn facilitates the interpretation of workers’ intentions, the prediction of movements, and the comprehension of the jobsite context. The increasing use of construction surveillance cameras provides a cost-efficient way to estimate workers’ VFOA from information-rich images. However, the low resolution of these images poses a great challenge to detecting the facial features and gaze directions. Recognizing that body and head orientations provide strong hints to infer workers’ VFOA, this study proposes to represent the VFOA as a collection of body orientations, body poses, head yaws, and head pitches and designs a convolutional neural network (CNN)-based multitask learning (MTL) framework to automatically estimate workers’ VFOA using low-resolution construction images. The framework is composed of two modules. In the first module, a Faster regional CNN (R-CNN) object detector is used to detect and extract workers’ full-body images, and the resulting full-body images serve as a single input to the CNN-MTL model in the second module. In the second module, the VFOA estimation is formulated as a multitask image classification problem where four classification tasks—body orientation, body pose, head yaw, and head pitch—are jointly learned by the newly designed CNN-MTL model. Construction videos were used to train and test the proposed framework. The results show that the proposed CNN-MTL model achieves an accuracy of 0.91, 0.95, 0.86, and 0.83 in body orientation, body pose, head yaw, and head pitch classification, respectively. Compared with the conventional single-task learning, the MTL method reduces training time by almost 50% without compromising accuracy.
publisher	ASCE
title	Multitask Learning Method for Detecting the Visual Focus of Attention of Construction Workers
type	Journal Paper
journal volume	147
journal issue	7
journal title	Journal of Construction Engineering and Management
identifier doi	10.1061/(ASCE)CO.1943-7862.0002071
journal fristpage	04021063-1
journal lastpage	04021063-12
page	12
tree	Journal of Construction Engineering and Management:;2021:;Volume ( 147 ):;issue: 007
contenttype	Fulltext

Files in this item

Name:: %28ASCE%29CO.1943-7862.0002071.pdf
Size:: 1.423Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Journal of Construction Engineering and Management

Show simple item record

YaBeSH Engineering and Technology Library

Archive

Multitask Learning Method for Detecting the Visual Focus of Attention of Construction Workers

Files in this item

This item appears in the following Collection(s)