FlowCap. Left to right: image capture with Kinect RGB camera, optical flow, per pixel part labels, part centers with uncertainty (red circles) and motion vectors (10x actual magnitude), estimated kinematic structure of the part centers
We estimate 2D human pose from video using only optical flow. The key insight is that dense optical flow can provide information about 2D body pose. Like range data, flow is largely invariant to appearance but unlike depth it can be directly computed from monocular video. We demonstrate that body parts can be detected from dense flow using the same random forest approach used by the Microsoft Kinect. Unlike range data, however, when people stop moving, there is no optical flow and they effectively disappear. To address this, our FlowCap method uses a Kalman filter to propagate body part positions and velocities over time and a regression method to predict 2D body pose from part centers. No range sensor is required and FlowCap estimates 2D human pose from monocular video sources containing human motion. Such sources include hand-held phone cameras and archival television video. We demonstrate 2D body pose estimation in a range of scenarios and show that the method works with real-time optical flow. The results suggest that optical flow shares invariances with range data that, when complemented with tracking, make it valuable for pose estimation. Please visit the project webpage http://flowcap.is.tue.mpg.de for more information and to gain access to the training data.
Please visit http://flowcap.is.tue.mpg.de to access the training data.