Left (top): Graphical structure of the body. (bottom) 3D parts. Right: 2D Deformable Structure model (top row), 3D Stitched Puppet model (middle row), model alignment to data exploiting the part-based representation (bottom row).
Human pose and shape estimation can be seen as a proxy for a wide range of problems in object representation and recognition. Humans are complex and articulated, appear in images in a variety of clothing, and come in a wide range of shapes. Teaching computers to understand people and their movements in images and videos is a great challenge of computer vision with manifold applications in entertainment, human-computer interaction, web search, medicine, and autonomous vehicles.
Most of the existing methods for human pose detection and tracking are based on part-based models, where the human body is represented as a set of “boxes” in two-dimensions (2D) or simple geometric primitives like cylinders or cones in three dimensions (3D) [ ]. These models map to probabilistic generative models where each body part is represented with a node in a graph, and edges represent connections between parts. Efficient inference for the models' parameters given data can be performed with message passing algorithms.
Traditional part-based models cannot reach the level of realism of global models, as they do not represent body shape deformations with pose. Moreover they do not parameterize intrinsic body shape, and have been only used so far to estimate body pose.
We have introduced part-based models that are parameterized for body pose and shape. The Deformable Structures model (DS) [ ] is a 2D model that is able to generate contours of human bodies with pose-dependent deformations. The Stitched Puppet model (SP) [ ] is a 3D model that can generate body meshes with different pose and intrinsic shape, and realistic pose-dependent deformations. We have also learned the part segmenation from scans [ ].
These models live in a higher dimensional space compared with models that do not represent shape. Furthermore, these shape parameters are represented by continuous random variables. To make inference practical in graphical models with high-dimensional continous parameters, we use a new particle-based belief propagation algorithm that mantains particle diversity [ ].