Synthesizing Pose Sequences from 3D Assets for Vision-Based Activity Analysis

Wilfredo Torres Calderon; Dominic Roberts; Mani Golparvar-Fard

Source: Journal of Computing in Civil Engineering:;2021:;Volume ( 035 ):;issue: 001::page 04020052

Author:

Wilfredo Torres Calderon

Dominic Roberts

Mani Golparvar-Fard

DOI: 10.1061/(ASCE)CP.1943-5487.0000937

Publisher: ASCE

Abstract: In recent years, computer vision algorithms have shown to effectively leverage visual data from jobsites for video-based activity analysis of construction equipment. However, earthmoving operations are restricted to site work and surrounding terrain, and the presence of other structures, particularly in urban areas, limits the number of viewpoints from which operations can be recorded. These considerations lower the degree of intra-activity and interactivity category variability to which said algorithms are exposed, hindering their potential for generalizing effectively to new jobsites. Secondly, training computer vision algorithms is also typically reliant on large quantities of hand-annotated ground truth. These annotations are burdensome to obtain and can offset the cost-effectiveness incurred from automating activity analysis. The main contribution of this paper is a means of inexpensively generating synthetic data to improve the capabilities of vision-based activity analysis methods based on virtual, kinematically articulated three-dimensional (3D) models of construction equipment. The authors introduce an automated synthetic data generation method that outputs a two-dimensional (2D) pose corresponding to simulated excavator operations that vary according to camera position with respect to the excavator and activity length and behavior. The presented method is validated by training a deep learning–based method on the synthesized 2D pose sequences and testing on pose sequences corresponding to real-world excavator operations, achieving 75% precision and 71% recall. This exceeds the 66% precision and 65% recall obtained when training and testing the deep learning–based method on the real-world data via cross-validation. Limited access to reliable amounts of real-world data incentivizes using synthetically generated data for training vision-based activity analysis algorithms.

Download: (3.020Mb)
Show Full MetaData Hide Full MetaData
Get RIS
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

Synthesizing Pose Sequences from 3D Assets for Vision-Based Activity Analysis

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/4269712

Collections

Journal of Computing in Civil Engineering

Show full item record

contributor author	Wilfredo Torres Calderon
contributor author	Dominic Roberts
contributor author	Mani Golparvar-Fard
date accessioned	2022-01-30T22:50:08Z
date available	2022-01-30T22:50:08Z
date issued	1/1/2021
identifier other	(ASCE)CP.1943-5487.0000937.pdf
identifier uri	http://yetl.yabesh.ir/yetl1/handle/yetl/4269712
description abstract	In recent years, computer vision algorithms have shown to effectively leverage visual data from jobsites for video-based activity analysis of construction equipment. However, earthmoving operations are restricted to site work and surrounding terrain, and the presence of other structures, particularly in urban areas, limits the number of viewpoints from which operations can be recorded. These considerations lower the degree of intra-activity and interactivity category variability to which said algorithms are exposed, hindering their potential for generalizing effectively to new jobsites. Secondly, training computer vision algorithms is also typically reliant on large quantities of hand-annotated ground truth. These annotations are burdensome to obtain and can offset the cost-effectiveness incurred from automating activity analysis. The main contribution of this paper is a means of inexpensively generating synthetic data to improve the capabilities of vision-based activity analysis methods based on virtual, kinematically articulated three-dimensional (3D) models of construction equipment. The authors introduce an automated synthetic data generation method that outputs a two-dimensional (2D) pose corresponding to simulated excavator operations that vary according to camera position with respect to the excavator and activity length and behavior. The presented method is validated by training a deep learning–based method on the synthesized 2D pose sequences and testing on pose sequences corresponding to real-world excavator operations, achieving 75% precision and 71% recall. This exceeds the 66% precision and 65% recall obtained when training and testing the deep learning–based method on the real-world data via cross-validation. Limited access to reliable amounts of real-world data incentivizes using synthetically generated data for training vision-based activity analysis algorithms.
publisher	ASCE
title	Synthesizing Pose Sequences from 3D Assets for Vision-Based Activity Analysis
type	Journal Paper
journal volume	35
journal issue	1
journal title	Journal of Computing in Civil Engineering
identifier doi	10.1061/(ASCE)CP.1943-5487.0000937
journal fristpage	04020052
journal lastpage	04020052-17
page	17
tree	Journal of Computing in Civil Engineering:;2021:;Volume ( 035 ):;issue: 001
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive