description abstract | In this article, we revisit the problem of 3D human modeling from two orthogonal silhouettes of individuals (i.e., front and side views). Different from our previous work (Wang et al. (2003, “Virtual Human Modeling From Photographs for Garment Industry,” Comput. Aided Des., 35, pp. 577–589).), a supervised learning approach based on the convolutional neural network (CNN) is investigated to solve the problem by establishing a mapping function that can effectively extract features from two silhouettes and fuse them into coefficients in the shape space of human bodies. A new CNN structure is proposed in our work to extract not only the discriminative features of front and side views but also their mixed features for the mapping function. 3D human models with high accuracy are synthesized from coefficients generated by the mapping function. Existing CNN approaches for 3D human modeling usually learn a large number of parameters (from 8.5 M to 355.4 M) from two binary images. Differently, we investigate a new network architecture and conduct the samples on silhouettes as the input. As a consequence, more accurate models can be generated by our network with only 2.4 M coefficients. The training of our network is conducted on samples obtained by augmenting a publicly accessible dataset. Learning transfer by using datasets with a smaller number of scanned models is applied to our network to enable the function of generating results with gender-oriented (or geographical) patterns. | |