Top row: Poselets are used to condition a pictorial structures model, providing more contextual information, while maintaining efficient pose inference [ ]. Middle row: The Fields of Parts model reformulates pose using a binary variable for every possible part location, orientation and scale [ ]. Bottom row: The deformable structures model [ ] contains information about 2D body shape and how it varies with pose. Inference uses a new non-parametric belief propagation algorithm [ ].
Estimating 2D human pose is hard because people appear in a wide range of poses and have varying body shape. They wear varied clothing and the articulation results in significant self occlusion. We have developed several state-of-the-art methods to address these problems.
Poselets [ ] capture how human motions and activities simultaneously constrain the positions of multiple body parts. Our model incorporates higher-order part dependencies while retaining efficient inference. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structure model given image observations. In order to derive a set of conditioning variables we exploit the poselet-based features that capture extended spatial information about pose.
Our Fields of Parts model [ ] reformulates the problem as a binary Conditional Random Field that models local appearance and joint spatial configuration of the human body. Using a novel graph structure, we model the presence and absence of a body part at every possible position, orientation, and scale in an image with a binary random variable; this encodes the same appearance and spatial structure as Pictorial Structures. While the formulation results into a vast number of random variables, approximate inference is efficient. Fields of Parts can use evidence from the background, include local color information, and it is connected more densely than a kinematic chain structure.
Like pictorial structures, these models lack an explicit model of body shape. We learn a deformable structures body model that captures body shape and how it deforms with pose in 2D [ ]. The DS image likelihoods explicitly model image information at the boundaries of body parts, simplifying learning. The model is not much more complex than previous models but results in improved accuracy. Inference uses a new non-parametric method for max-product belief propagation that preserves particle diversity, models uncertainty, and estimates the pose of multiple bodies simultaneously [ ].