Flowing puppets. (a) Frame with a hypothesized human “puppet” model; (b) Dense flow between frame (a) and its neighboring frames; (c) The flow of the puppet is approximated by a part-based affine motion model; (d) The prediction of the puppet from (a) into the adjacent frames using the estimated flow.
We approach the problem of estimating the pose of charactres in TV shows video sequences.
Our approach is based on optical flow. People are moving entities, and their motion has distinctive characteristics. Often the whole body has a distinctive motion from the background; considering the upper human body, it is also likely that the fastest moving parts are the hands. The articulated structure of the body often generates different motion patterns for each body part. In the recent years the computation of dense optical flow has made large progress in terms of accuracy and computation speed. Based on these observations, we approach the pose estimation problem relying on the dense optical flow as a source of information for better pose estimation.
We precompute the dense optical flow between neighboring frames in the sequences, forward and backward in time. We consider the computed flow as an observation, and exploit it in three ways: a) to estimate hands locations b) to propagate good solutions across frames c) to "link" pose hypotheses in adjiacent frames through the flow to jointly evaluate per-frame image likelihoods.
We represent the upper body with the Deformable Structures model and we exploit its region-based body part representation to estimate how the body moves over time. We call the corresponding moving DS models Flowing Puppets.
We provide the code used for the experiments in the paper.
The code is written in Matlab and has been used with version R2013b.
The package includes images for one of the training clips and one of the test clips in the VideoPose2 dataset for demo purposes. The full VideoPose2 dataset can be downloaded from the VideoPose2 website. We used the VideoPose2-fullframes version.
We provide the hand detection maps for running our code on the whole VideoPose2 test set.